The new 1.5 release of the ArcGIS API for Python finally allows users to read spatial data as pandas DataFrame objects. As a result, manipulating geometric and attribute information of spatial data has become a lot easier.
Although previous versions of Esri’s ArcGIS API for Python made use of pandas DataFrames to display attribute information from Feature Layers or Imagery Layers, you could not use them to manipulate geometric and attribute data. At last, the latest release of the API now includes a new data object that allows you to read spatial data directly as pandas DataFrames, which makes it very easy to do spatial data manipulation using Python’s pandas library.
For those unfamiliar with the pandas library, this is a Python library that is part of the SciPy stack and is used for data manipulation. The nice thing about pandas is that it is perfect for performing operations on tabular data, for example sub-setting data through data slicing operations. Think of selecting certain columns and rearranging these based on their values (for example, from high to low values). The problem with pandas was that it couldn’t be used for spatial data. A possible workaround was to use the open source GeoPandas library, that would allow you do perform such operations on a variety of spatial data types. It’s also possible to plot spatial data using GeoPandas, that relies on the matplotlib library.
However, the downside of GeoPandas was its performance, especially when large or multiple files are loaded. The good news is that now Esri’s latest version of its Python API for ArcGIS introduces a new data object that allows you to read various spatial data types directly as pandas DataFrame objects. This allows you to reference and operate directly on the features stored in the attribute tables of these files. For those familiar with GeoPandas, these new data objects (called Spatially Enabled DataFrames, shortened to SEDF), will look very similar, also because they add a geometry column to your DataFrame. However, what´s new is that unlike GeoPandas, there no performance issues.
According to the API Documentation, SEDFs can be created from shapefiles, Pandas DataFrames, feature classes, GeoJSON, and Feature Layers. This is because SEDFs also integrate with pyshp, shapely and fiona packages. These are all open source Python packages that either manage, read, write from or to shapefiles. SEDF’s can be created with the Python API for ArcGIS by using either the .sdf or .spatial notation (for example: pd.DataFrame.spatial.fromLayer()). The cool thing is that SEDFs are not exclusively available with the API: if you have ArcGIS Pro installed as well, you can use them on all file types mentioned here using the arcpy site package. A local arcpy install allows you to export in these data types too, so you’re not limited to shapefiles output.
After storing your spatial data in a SEDF, you’re able to display the attribute data as spreadsheets, so using rows and columns. You can do attribute using familiar pandas methods, where spatial indexing enables you to perform geographical operations from the shapely library (buffer, intersect, union and more), as well as spatial joins and merges. Visualizing data works a little different, compared to GeoPandas: whereas this library relies completely on matplotlib, the Python API for ArcGIS offers a visually more attractive approach using its own basemaps, and adds customization capabilities with regards to symbology and color maps through renderer that use Arcade expressions. However, the API plots the data using matplotlib and pandas methods.