Files
Abstract
Recent years have brought great advancement in 2D human pose estimation. However,bottom-up approaches that do not rely on external detectors to generate personcrops, tend to have large model sizes and intense computational requirements, makingthem ill-suited for applications where large computation costs can be prohibitive.Lightweight approaches are exceedingly rare and often come at the price of massiveaccuracy loss.This thesis presents EfficientHRNet, a family of lightweight 2D human pose estimatorsthat unifies the high-resolution structure of state-of-the-art HigherHRNet, amulti-scale high resolution network with the highly efficient model scaling principles ofEfficientNet to create high accuracy models with significantly reduced computationcosts. In addition, it provides a formulation for jointly scaling the backbone EfficientNetbelow the baseline B0 and the rest of EfficientHRNet with it. Ultimately,this work is able to create a family of highly accurate and efficient 2D human poseestimators that is flexible enough to provide a lightweight solution for a variety ofapplication and device requirements. The baseline H0 model achieves 64.8% accuracyon COCO dataset and overall, EfficientHRNet proves to be more computationally efficientthan other bottom-up 2D human pose estimation approaches, while achievinghighly competitive accuracy.Moreover, inspired by creating a family of EfficientHRNet based models for poseestimation, this work also provides a similar formulation for creating models in anotherpopular computer vision application, image segmentation. Pose estimation andimage segmentation models created using these methods are further used in the edgevideo analytics pipeline as a front-end to evaluate the performance of an end-to-endreal time system. This thesis also carries out simulation of pose estimation and segmentationmodel into the real-time vision pipeline.