This is a short beginners tutorial for ArcGIS users who are interested in learning about basic NumPy functionality. It describes how to create a NumPy array from a shapefile and interact with the attribute table data inside of it, making simple data queries. The goal is to get familiar with NumPy arrays and related Python syntax and functionality.
This tutotial is built around an online dataset that can be downloaded here. Although the dataset includes multiple files, the only file that’s used in this tutorial is an Esri shapefile called LondonBoroughs.shp. It´s recommended to download a copy of this dataset in a separate folder on your hard drive and inspect the shapefile in ArcMap/ArcCatalog, especially the contents of its attribute table. As you can see, the geometry consists of a number of polygons that together form the different communities of London. The attribute file consists of all borough names and different columns with numerical data. Next, we´ll introduce some Python syntax to work with this data in NumPy.
- Importing the modules
First, open an IDE and an new script and name it something like“NumPy tutorial.py”. Start your script with importing the required Python modules:
import arcpy
import numpy
- Set the environment and create a NumPy array
Now we are going to create a NumPy array from the file´s attribute column data. This array will conserve all column data of the attribute data, but converted to a NumPy array so you can work with it in Python. To do this, we now need to refer to the local file on disk and specify the fields that are of interest for the creation of the array. We´ll select a few columns with numerical values. Next, a function of Arcpy´s Data Access module is used to transform the attribute table data into a NumPy array. We´ll look at this array next.
input = “C:/data/Creating-maps-in-R-master/data/londonBoroughs.shp”
arr = arcpy.da.TableToNumPyArray(input, (‘name’, ‘Pop_2001’, ‘PopDensity’, ‘AREA’, ‘PERIMETER’))
The NumPy package primarily focuses on working with n-dimensional array objects. If you´re using Python´s standard libraries, you can only use one-dimensional arrays, meaning an array (or list) with only one row. NumPy lets you work with multidimensional data, making it possible to work with data in a better structure and let you manipulate that structure as well. For example, our new NumPy array can be used to list all column values, by specifying a specific header name. The result is a Python list containing all column values with header Pop_2001.
print arr[“Pop_2001”]
(output not listed)
- Doing basic calculations and queries with NumPy array data
You can also calculate the sum and mean values for a specific column, and more:
# Sum the total population for 2001, total area size, mean area size
# and total population of the Westminster area:
a = arr[“Pop_2001”].sum()
b = arr[“AREA”].sum()
mas = arr[“AREA”].mean()
mpopden = arr[‘PopDensity’].mean()
c = arr[arr[‘name’] == “Westminster”][‘Pop_2001’].sum()
print ‘The total population of 2001 is {}.\n’.format(a)
print ‘The total area size is {}.\n’.format(b)
print ‘The mean area size is {}.\n’.format(mas)
print ‘The mean (rounded) population density is {}.\n’.format(int(mpopden))
print ‘The total population of the Westminster area is {}. \n’.format(c)
>>The total population of 2001 is 7172057.
>>The total area size is 1595319551.24.
>>The mean area size is 48343016.7042
>>The mean (rounded) population density is 5866.
> The total population of the Westminster area is 181284.
- Listing column names separately
If you´re interested in how ArcPy stores the attribute table as a Python object, use the “print arr” statement and notice that all rows from the attribute table are preservd as tuples that are separated by a comma and a new line. The column names are printed seprately at the end of the array, along with their specific NumPy data type. You can also list these column names separately, that Python returns as a tuple:
# Print the column names of the Numpy Array:
colnames = arr.dtype.names
print ‘The Numpy array has the following column names: \n’ + str(colnames)
>>The Numpy array has the following column names:
>>(‘name’, ‘Pop_2001’, ‘PopDensity’, ‘AREA’, ‘PERIMETER’)
- Accessing a particular cell value in a NumPy array
The following code can be used to access a particular cell in our NumPy array:
d = arr[1][4]
print ‘The perimeter value for Richmond upon Thames is {}\n’.format(d)
>>The perimeter value for Richmond upon Thames is 47941.015783
You can also use a for loop to list all values in a specific column. This does exactly the same as our print arr[“Pop_2001”] statement listed earlier, although here the results are listed as separate items and from top to bottom:
for i in arr:
print i[1]
(output not listed)
The same code written as a list comprehension and returned as a list:
n = [i[1] for i in arr]
print n
(output not listed)
- Select by attribute
The NumPy array makes it possible to perform data queries, for example a making a selection of all Borough names where population density exceeds 8000 (the query will return in nine results):
for t in arr:
if t[2] > 8000:
print t[0]
(output not listed)
This is the same query, but now written as a list comprehension:
m = [t[0] for t in arr if t[2] > 8000]
print m
(output not listed)
- Array management: creating subarrays and selecting its elements
NumPy arrays can easily be sliced to create new arrays. For example:
# The following code returns a subarray of row nrs. 1 to 4:
subarray = arr[1:5]
print subarray
(output not listed)
You can select elements from this subarray, for example returning Borough names where population density exceeds 2000:
for i in subarray:
if subarray[2] > 2000:
print b[0]
(output not listed)
- Convert a NumPy array to a NumPy matrix
A NumPy array can be easily converted to a NumPy matrix. The main advantage of numpy matrices is that they provide a convenient notation for matrix multiplication. The conversion is performed as follows:
matrix = numpy.asmatrix(arr)
print matrix
(output not listed)