Learning from Maps: Visual Common Sense for Autonomous Driving


Abstract

Today's autonomous vehicles rely extensively on high-definition 3D maps to navigate the environment. While this approach works well when these maps are completely up-to-date, safe autonomous vehicles must be able to corroborate the map's information via a real time sensor-based system. Our goal in this work is to develop a model for road layout inference given imagery from on-board cameras, without any reliance on high-definition maps. However, no sufficient dataset for training such a model exists. Here, we leverage the availability of standard navigation maps and corresponding street-level images to construct an automatically labeled, large-scale dataset for this complex scene understanding problem. By matching road vectors and metadata from navigation maps with Google Street View images, we can assign ground truth road layout attributes (e.g., distance to an intersection, one-way vs. two-way street) to the images. We then train deep convolutional networks to predict these road layout attributes given a single monocular RGB image. Experimental evaluation demonstrates that our model learns to correctly infer the road layout using only panoramas captured by car-mounted cameras as input. Additionally, our results indicate that this method may be suitable to the novel application of recommending safety improvements to infrastructure (e.g., suggesting an alternative speed limit for a street).

Paper

Ari Seff and Jianxiong Xiao
Learning from Maps: Visual Common Sense for Autonomous Driving
arxiv:1611.08583

Press

MIT Tech Review
Singularity Hub

Code & Data

All code for downloading Google Street View images, establishing correspondence with OpenStreetMap, and training models for road attribute estimation can be found here.

Initial San Francisco GSV dataset used in paper: sf_gsv.zip (22 GB)
OSM extract and computed road attributes: osm_corr.zip (42 MB)

Full San Francisco GSV dataset: sf_gsv.zip (150 GB)
OSM extract and computed road attributes: osm_corr.zip (142 MB)

Marvin train/test tensors

Trained models: models.zip (900 MB)

Example Results


Intersection detection



Driveable headings



Heading angle



Intersection distance

results visualizer

Bike lanes

results visualizer

Speed limit

results visualizer

One-way vs. two-way

results visualizer

Wrong way

results visualizer