This blog post covers GeoPandas, a Python package that makes working with geospatial data easier.
What is GeoPandas?
GeoPandas is an open source project for working with geospatial data in Python. It is designed to work with existing tools, such as desktop GIS, geospatial databases, web maps and Python data tools. Scientific Software Developer Kelsey Jordahl, who started the GeoPandas project, decided to combine geospatial data with the pandas Python library that is used widely for data analysis and manipulation. Pandas itself combines different functionality found in different R language packages. As pandas itself cannot deal with spatial data, Jordahl created subclasses to pandas data objects, combined with the functionality from geographical Python packages, resulting in GeoPandas.
Installation and Dependencies
GeoPandas can be installed through pip or Anaconda, or directly through GitHub. The most common ways are either through pip or Anaconda through a terminal window:
pip install geopandas
conda install -c conda-forge geopandas
GeoPandas depends on the following Python libraries, that are either updated or installed when you first install the GeoPandas library:
Pandas: a Python package for data manipulation and analysis and part of the SciPy stack.
Shapely: a Python package for manipulation and analysis of planar geometric objects.
Fiona: the OGR API for Python programmers that can read and write real-world data using multi-layered GIS formats and zipped virtual file systems.
Pyproj: a Python package that performs cartographic transformations and geodetic computations.
NumPy: Python package that adds support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
Six: this is a Python 2 and 3 compatibility library, intended to support codebases that work on both Python 2 and 3 without modification
Plotting functionality is realized through dependencies on Matplotlib, Descartes and Pysal, while three more packages prove additional functionality: geopy for geocoding, psycopg2 for PostGIS connection and rtree for improving spatial index performance. These packages have to be installed independently by the user if not already installed.
GeoPandas adds subclasses to existing pandas data objects: a 1D pandas Series object is called a GeoSeries, while a 2D DataFrame is now a GeoDataFrame, containing a geometry column for each row (or feature). GeoPandas extends these pandas datatypes to allow spatial operations on geometric types and do operations in python that would otherwise require a spatial database such as PostGIS.
GeoPandas inherits the standard pandas methods for indexing/selecting data, such as label based indexing with .loc and integer position based indexing with .iloc. GeoPandas can also merge and join data as with normal pandas Series or DataFrame objects, as well as performing spatial joins based on spatial joins between GeoSeries or GeoDataFrames. GeoPandas can read almost any vector-based spatial data format including Esri shapefile and GeoJSON files. GeoDataFrames can be exported to many different standard formats, including Esri Shapefiles and GeoJSON.
Whereas pandas itself offers no mapping tools, GeoPandas provides a high-level interface to the matplotlib library for making maps. This can be done using the plot method on GeoPandas data objects. This way of plotting data is similar to working with maps in R. It is recommended to use Jupyter Notebooks when using the plot method, meaning you have to use Python 3. Supported map types are layered and choropleth maps. GeoPandas can handle different mapping projects, re-project data and offers a set of geometric manipulations, as well as overlay operations on multiple spatial datasets, attribute joins, spatial joins, merge and geocoding functionality.