Unsupervised Representation Learning for Autonomous Driving


At a glance

Recent deep learning methods have leveraged large datasets of millions of labeled examples to learn rich, high-performance visual representations. Yet efforts to scale these methods to truly Internet-scale datasets (i.e. hundreds of billions of images) are hampered by the sheer expense of the human annotation required. A natural way to address this difficulty would be to employ unsupervised learning, which aims to use data without any annotation.

Unfortunately, despite several decades of sustained effort, unsupervised methods have not yet been shown to extract useful information from large collections of full-sized, real images. After all, without labels, it is not even clear what should be represented.

In this project, we propose to employ "self-supervision": using the data as its own supervisory signal. The team will also explore the use of temporal and spatial context as a source of free and plentiful supervisory signal for training a rich visual representation. This will be achieved in two ways: predict the relative arrangement of pairs of patches and predicting the actual content of patches from their context. The team will build upon and extend preliminary work to not only consider the arrangement prediction within a single image, but more broadly predict the spatial and temporal arrangement of patches within entire scenes. Outdoor street data used in many driving applications would be a great source of imagery for such a training approach. This will make the arrangement prediction harder, leading to both a better and more task (driving) specific representation.

In a second part of the project, the team will work to predict the content of parts of the scene directly from its surrounding. This tasks is much more challenging than predicting the spatial arrangement, and potentially provides a much stronger supervisory signal. In order to succeed in this task, a model will need to both understand the content of the image, as well as reproduce a plausible hypothesis for the missing parts.

Alexei (Alyosha) EfrosCarl Doersch
Abhinav Gupta
Phillip Isola
Philipp Krahenbuhl
Autonomous Vehicles
Deep Learning


BAIR/CPAR/BDD Internal Weekly Seminar

Event Location: 
250 Sutardja Dai Hall

The Berkeley Artificial Intelligence Research Lab co-hosts a weekly internal seminar series with the CITRIS People and Robots Initiative and the Berkeley Deep Drive Consortium. The seminars are every Friday afternoon in room 250 Sutardja Dai Hall from 3:10-4:10 PM, and are open to BAIR/BDD faculty, students, and sponsors. Seminars will be webcast live and recorded talks will be available online following the seminar.