Tutorial – Using Pandas DataFrames with the ArcGIS API for Python

by | Jan 29, 2018

This short tutorial covers how to use the ArcGIS API for Python and pandas DataFrame objects for displaying tabular data inside of your Jupyter Notebook application. After searching and referencing spatial data, you the pandas library enables you to subset, describe and plot attribute data.

pandas is a Python package for data manipulation and analysis. It works particularly well with Jupyter Notebooks, where you can also use bash commands, magic commands, plotting capabilities and take advantage of a nice overall presentation of code, visuals and comments. The ArcGIS API for Python uses the pandas library to display and edit attribute info. Specifically, it uses pandas DataFrame objects that present data in a tabular form, comparable to Excel spreadsheets.

To follow the instructions, you can open a new Jupyter Notebook. Make sure you have the latest available version of the API installed, which is version 1.3.

  1. Import the module and GIS class

First, we´ll login to ArcGIS Online using the Jupyter Notebook app:

In:    import arcgis

from arcgis.gis import GIS

gis = GIS()

  1. Search for feature layer content

Feature layers are collections of layers containing geographical features as vectors. We´ll search for a feature layer called “Bruce Trail” inside of ArcGIS Online:

In:    search_result = gis.content.search(query=”bruce trail”, item_type=”Feature Layer”, max_items = 5)

search_result

The item we’re interested in is the following item, returned as the first search result: out:   [<Item title:”Bruce Trail” type:Feature Layer Collection owner:DufferinGIS>, …

  1. Reference the item and create the DataFrame object

We can reference this item as follows, in order to see how many layers it contains. Python returns only one item, so there´s only one layer (it is not displayed here to save space):

In:    bruce_trail_item = search_result[0]

bruce_trail_item.layers

We´ll now create a variable that holds the DataFrame object from the layer we´re interested in. Using the head function in the second line, we´ll only print the first five rows.

In:    btl_df = bruce_trail_item.layers[0].query().df

btl_df.head()

It is not necessary to import the pandas library, as it one of the dependencies of the arcgis package, imported in the first line of code. The following pandas dataframe will be shown inside your Jupyter Notebook:

  1. Describe the item´s attribute data

There are many functions to describe the data inside pandas DataFrame objects. For example, the shape function returns the amount of rows and columns of the entire DataFrame as a tuple:

In:    btl_df.shape

Out:   (367, 5)

We can print the different column names as follows:

In:    btl_df.columns

Out:   Index([‘Name’, ‘OBJECTID’, ‘PopupInfo’, ‘Shape__Length’, ‘SHAPE’], dtype=’object’)

The .loc property can be used to subset entire rows, using the row´s index number, starting from zero. Here, we print the column names and values of the first item:

You can also access a separate cell value. For example, the PopupInfo value of the first item can be accessed as follows:

In:    bt_df.loc[0][‘PopupInfo’]

The outcome looks like XML data inside a HTML file and is not very readable for humans. We can use the HTML library to return the same output in a more readable way:

In:    from IPython.display import HTML

HTML(bt_df.loc[0][‘PopupInfo’])

  1. Create a histogram

We can also create a histogram using the shape length field as input. The first line of the code below is a magic command that enable the use of plotting capabilities inside the Jupyter Notebook application:

In:    %matplotlib inline

import matplotlib.pyplot as pd

bt_df[‘Shape__Length’].hist()

The following histogram is displayed next:

Picture1

Categories

Recent Posts

Eric van Rees
Eric van Rees is a freelance writer and editor. His specialty is GIS technology. He has more than eight years of proven expertise in editing, writing and interviewing as editor and editor-in-chief for the international geospatial publication GeoInformatics, as well as GIS Magazine and CAD Magazine, both published in Dutch. Currently, he writes about geospatial technology for various clients, publications and blogs.

Sign up for our weekly newsletter
to receive content like this in your email box.