Files
Abstract
There has been an ever growing interest in leveraging state of the art deep learning techniques for tracking objects in video frames. Such works primarily focus on using appearance based models which prove not to be effective in modelling the behaviour of objects in frame sequences. Moreover, not much work has been done to explore and exploit the sequence learning properties of Long Short Term Memory(LSTM) Neural Networks for tracking objects in video sequences. In this thesis, we propose a novel LSTM based tracker, Key-Track, which effectively learns the spatial and temporal behavior of pedestrians after analyzing movement patterns of human key-point features provided to it by the OpenPose\cite{cao2018openpose}. Key-Track is trained on Single-Object dataset containing a variety of human behaviours. These sequences have been wrangled and curated from the Duke Multi-Target Multi-Camera(Duke-MTMC)\cite{DukeMTMC} dataset. We further scale the model at inference time to track multiple people with effective batching. The results reported on the Duke-MTMC dataset show that the tracker is capable of maintaining a high degree of accuracy which is independent of the number of objects to be tracked in the given scene. Along with that, we try to critically analyze the scenarios complexity and classify it according to the best performing configuration of the model. Batching ,in turn, helps in the effective GPU allocation of resources yielding high FPS scores for offline tracking. The total observed size of Key-Track is under 1 megabytes which paves its way into mobile devices for the purpose of tracking in real-time.