This thesis proposes a novel method to recover depth of scene objects using an acoustic source and a calibrated neuromorphic camera or event camera sensor. The proposed system is a non-contact, monocular depth estimation method that observes subtle mechanical vibrations in scene objects induced by sound waves and uses geometric image formation models to recover depth. Neuromorphic cameras are high speed cameras that are capable of capturing subtle motion and vibrations on the surface of scene objects caused by sound waves. The neuromorphic camera observes a change in intensity at every pixel which triggers an asynchronous output of an event characterized by its pixel coordinates, $(x, y)$, polarity $(p)$ (i.e positive or negative change in intensity) and the time stamp ($t_s$). Using this event data in conjunction with the geometric setup of the optical system, we recover depth by estimating the time of flight for an emitted sound wave to strike an object's surface. The method proposed in this thesis to estimate depth using an acoustic excitation signal and a neuromorphic camera is the first of its kind.Experiments were conducted by subjecting a sheet of paper to an impulse-like sound wave and a sinusoidal sound wave. The results show that the proposed method is able to estimate depth with an error of $\pm1 cm$. Further, we demonstrate how we can reconstruct a sinusoidal excitation signal that was emitted by analyzing the vibrations of the scene object and estimating the signal's frequency. Results indicate that our proposed method is able to estimate the signal's frequency with an error of 2.2 Hz. The proposed acoustic-optical sensing mechanism shows potential uses cases in estimating the structural properties of the object, vibration analysis, robotics, etc.