Tutorial: Catalog and Find Geospatial Data with Python

by | Nov 3, 2022

In this tutorial, you’ll learn how to catalog data using Python. Specifically, we’ll use the arcpy and os modules to move through a directory tree to catalog and find data.

The os module is used to interact with your operating system, move through a directory and find data. This comes in handy if you want to see what data you have on disk without using a file explorer, or manipulate it directly after referencing it using Python. In an earlier tutorial, we used the os module to find shapefiles and add them to the map window in Pro.

Here, we’ll show two ways of moving through a directory tree and list the files inside it. The first approach uses os.walk(), a built-in Python method that generates the file names in the file index  tree by walking either top-down or bottom-up. a 3-tuple, containing dirpath, dirnames, filenames. Next, we’ll look at a similar function in arcpy with the same name, that is based on the os method but comes with something extra: unlike os.walk, arcpy.da.walk recognizes database content such as geodatabase feature classes, tables, or rasters. This means you can use it as a filter to list only the database content you want to list, such as points, lines or polygons. We’ll now demonstrate how to use both methods.

STEP 1: Download the data and run the code In this tutorial, we’ll be using the Natural Earth quick start kit. Download the dataset, unzip the files to your hard drive, and open a new, empty project in Pro. Open a Python Notebook and use the following code that uses os.walk to list all directories and files inside the downloaded dataset:

This code snippet uses two for loops that print the output in an uncluttered way, unlike a os.walk object. Here’s a fragment of the output:

As you can see, the output contains all files found in the file folders. We’ll now show how to catalog only spatial data using arcpy.da.walk.

STEP 2: Run the code

Just like os.walk, arcpy.da.walk returns data names in directory and database structures by moving through the tree from the top down or the bottom up. Each directory or workspace yields a tuple of three: directory path, directory names, and file names. Use the following code to create a Walk object that lists only the point feature classes in the Natural Earth Dataset. The output is also listed below:

As you can see, the second method that uses arcpy.da.walk is more specific in that it recognizes spatial data types, whereas os.walk is strictly file-based, meaning it returns everything encounters in a file folder. The downside of arcpy.da.walk is that os.walk has a better performance for file-based formats.

Learn more about programming ArcGIS Pro with Python:

Introduction to Programming ArcGIS Pro with Python

Intermediate ArcGIS Pro Programming with Python

Categories

Recent Posts

Eric van Rees
Eric van Rees is a freelance writer and editor. His specialty is GIS technology. He has more than eight years of proven expertise in editing, writing and interviewing as editor and editor-in-chief for the international geospatial publication GeoInformatics, as well as GIS Magazine and CAD Magazine, both published in Dutch. Currently, he writes about geospatial technology for various clients, publications and blogs.

Sign up for our weekly newsletter
to receive content like this in your email box.