Files
Abstract
Single image depth estimation has always been a key interest in computer visioncommunity. Although depth estimation from a single monocular image still is anill-posed problem, stereo images are in rescue. Deep-learning based approaches todepth estimation are rapidly advancing, offering better performance over traditionalcomputer vision approaches across many domains. However, for many critical ap-plications, cutting-edge deep-learning based approaches require too much compu-tational overhead to be operationally feasible. This is especially true for depth-estimations methods that leverage adversarial learning, such as Generative Adver-sarial Networks(GANs). I propose a computationally efficient GAN for unsupervisedmonocular depth estimation using factorized convolutions and an attention mecha-nism. Specifically, I leverage the Extremely Efficient Spatial Pyramid of Depth-wiseDilated Separable Convolutions(EESP) module of ESPNetv2 inside the network, lead-ing to a total reduction of 25.6%, 33.82% and 31% in the number of model parameters,FLOPs and inference time respectively, as compared to the previous unsupervisedGAN approach. Finally, I propose a context-aware attention architecture to generatedetail-oriented depth images. I demonstrate the performance of our proposed modelon two benchmark datasets, KITTI and Cityscapes.