Files
Abstract
Automatic identification and pose estimation of a target object through robotic perception is necessary for many robotic tasks, such as object manipulation, scene reconstruction and view planning. However, general object recognition and pose estimation in cluttered 3D environments is still an unsolved and challenging problem, due to occlusions, complicated background, and great variation in object appearance caused by different illuminations or viewpoints.In this dissertation, an appearance-based approach to general 3D object detection and pose estimation is introduced based on segmented 3D surfaces and their features, taking full advantage of RGB-D information. Our approach can detect and estimate the poses of occluded objects effectively, including occluded multiple instances of the same object, in cluttered environments. The results of the detected objects and their poses with respect to the current RGB-D camera frame can be used directly for robotic manipulation, and the reconstructed 3D scene can also be used directly for robot motion planning. Leveraging the scene reconstruction results of our surface-based approach, a learning-based approach is further developed for evaluating scene recognizability of a cluttered scene with objects occluding one another from a single view and ranking views based on their scene recognizability. Our approach of evaluating scene recognizability provides a more accurate assessment of how good a view is for autonomous robotic tasks in cluttered environments than conventional evaluation based only on object visibility.Last but not the least, this dissertation also explores interleaving RGB-D perception and robotic manipulation for automatic modeling and handling of unknown objects. Using a fixed RGB-D camera and starting from the first view of the object, our approach gradually builds and extends a partial model (based on what has been visible) into a complete object model. In the process, the partial model is also used to guide a robot manipulator to change the pose of the object to make more surfaces visible for continued model building. The alternation of perception-based model building and pose changing continues until a complete object model is built with all object surfaces covered. Our approach provides more flexibility to enable observing all object surfaces and building a complete object model, and can be further developed to facilitate manipulation of unknown objects in cluttered environments.