Microsoft has announced the availability of approximately 125 million building footprint polygon geometries in all 50 US States in an open source GeoJSON format. Using a two step process centered around the use of artificial intelligence (AI), deep learning, and computer vision, the Microsoft Maps team extracted 124,885,597 footprints in the United States. In OpenStreetMap there are currently 30,567,953 building footprints in the US (at last count) both from editor contributions and various city or county wide imports. Bing is making this data available for download free of charge. MapShaper is a good tool for importing the data.
The Maps team relied on the Open Source CNTK Unified Toolkit which was developed by Microsoft. Using CNTK Microsoft applied a Deep Neural Network and the ResNet34 with RefineNet up-sampling layers to detect building footprints from the Bing imagery.
The building extraction was done in two stages:
- Semantic Segmentation – Recognizing building pixels on the aerial image using DNNs
- Polygonization – Converting building pixel blobs into polygons
1. Semantic Segmentation
The training set consisted of 5 million labeled images. A majority of the satellite images covered diverse residential areas in US. For the sake of good set representation, the dataset was enriched with samples from various areas covering mountains, glaciers, forests, deserts, beaches, coasts, etc. Images in the set are of 256×256 pixel size with 1 ft/pixel resolution. The training was done with CNTK toolkit using 32 GPUs.
Microsoft developed a method that approximates the prediction pixels into polygons making decisions based on the whole prediction feature space. This is very different from standard approaches, e.g. Douglas-Pecker algorithm, which are greedy in nature. The method tries to impose some of a priory building properties, which are, at the moment, manually defined and automatically tuned. Some of these a priory properties are:
- The building edge must be of at least some length, both relative and absolute, e.g. 3 meters
- Consecutive edge angles are likely to be 90 degrees
- Consecutive angles cannot be very sharp, smaller by some auto-tuned threshold, e.g. 30 degrees
- Building angles likely have very few dominant angles, meaning all building edges are forming angle of (dominant angle ± nπ/2)
Microsoft plans to deduce this information automatically in the near future using existing building information.
The vintage of the footprints depends on the vintage of the underlying imagery. Because Bing Imagery is a composite of multiple sources it is difficult to know the exact dates for individual pieces of data.
How good is the data?
Metrics show that in the vast majority of cases the quality is at least as good as data hand digitized buildings in OpenStreetMap. It is not perfect, particularly in dense urban areas but it is still awesome.