This short tutorial covers how to use the ArcGIS API for Python and pandas DataFrame objects for displaying tabular data inside of your Jupyter Notebook application. After searching and referencing spatial data, you the pandas library enables you to subset, describe and plot attribute data.
pandas is a Python package for data manipulation and analysis. It works particularly well with Jupyter Notebooks, where you can also use bash commands, magic commands, plotting capabilities and take advantage of a nice overall presentation of code, visuals and comments. The ArcGIS API for Python uses the pandas library to display and edit attribute info. Specifically, it uses pandas DataFrame objects that present data in a tabular form, comparable to Excel spreadsheets.
To follow the instructions, you can open a new Jupyter Notebook. Make sure you have the latest available version of the API installed, which is version 1.3.
- Import the module and GIS class
First, we´ll login to ArcGIS Online using the Jupyter Notebook app:
In: import arcgis
from arcgis.gis import GIS
gis = GIS()
- Search for feature layer content
Feature layers are collections of layers containing geographical features as vectors. We´ll search for a feature layer called “Bruce Trail” inside of ArcGIS Online:
In: search_result = gis.content.search(query=”bruce trail”, item_type=”Feature Layer”, max_items = 5)
search_result
The item we’re interested in is the following item, returned as the first search result: out: [<Item title:”Bruce Trail” type:Feature Layer Collection owner:DufferinGIS>, …
- Reference the item and create the DataFrame object
We can reference this item as follows, in order to see how many layers it contains. Python returns only one item, so there´s only one layer (it is not displayed here to save space):
In: bruce_trail_item = search_result[0]
bruce_trail_item.layers
We´ll now create a variable that holds the DataFrame object from the layer we´re interested in. Using the head function in the second line, we´ll only print the first five rows.
In: btl_df = bruce_trail_item.layers[0].query().df
btl_df.head()
It is not necessary to import the pandas library, as it one of the dependencies of the arcgis package, imported in the first line of code. The following pandas dataframe will be shown inside your Jupyter Notebook:
- Describe the item´s attribute data
There are many functions to describe the data inside pandas DataFrame objects. For example, the shape function returns the amount of rows and columns of the entire DataFrame as a tuple:
In: btl_df.shape
Out: (367, 5)
We can print the different column names as follows:
In: btl_df.columns
Out: Index([‘Name’, ‘OBJECTID’, ‘PopupInfo’, ‘Shape__Length’, ‘SHAPE’], dtype=’object’)
The .loc property can be used to subset entire rows, using the row´s index number, starting from zero. Here, we print the column names and values of the first item:
You can also access a separate cell value. For example, the PopupInfo value of the first item can be accessed as follows:
In: bt_df.loc[0][‘PopupInfo’]
The outcome looks like XML data inside a HTML file and is not very readable for humans. We can use the HTML library to return the same output in a more readable way:
In: from IPython.display import HTML
HTML(bt_df.loc[0][‘PopupInfo’])
- Create a histogram
We can also create a histogram using the shape length field as input. The first line of the code below is a magic command that enable the use of plotting capabilities inside the Jupyter Notebook application:
In: %matplotlib inline
import matplotlib.pyplot as pd
bt_df[‘Shape__Length’].hist()
The following histogram is displayed next: