Tutorial: Catalog and Find Geospatial Data with Python

by | Nov 3, 2022

In this tutorial, you’ll learn how to catalog data using Python. Specifically, we’ll use the arcpy and os modules to move through a directory tree to catalog and find data.

The os module is used to interact with your operating system, move through a directory and find data. This comes in handy if you want to see what data you have on disk without using a file explorer, or manipulate it directly after referencing it using Python. In an earlier tutorial, we used the os module to find shapefiles and add them to the map window in Pro.

Here, we’ll show two ways of moving through a directory tree and list the files inside it. The first approach uses os.walk(), a built-in Python method that generates the file names in the file index  tree by walking either top-down or bottom-up. a 3-tuple, containing dirpath, dirnames, filenames. Next, we’ll look at a similar function in arcpy with the same name, that is based on the os method but comes with something extra: unlike os.walk, arcpy.da.walk recognizes database content such as geodatabase feature classes, tables, or rasters. This means you can use it as a filter to list only the database content you want to list, such as points, lines or polygons. We’ll now demonstrate how to use both methods.

STEP 1: Download the data and run the code In this tutorial, we’ll be using the Natural Earth quick start kit. Download the dataset, unzip the files to your hard drive, and open a new, empty project in Pro. Open a Python Notebook and use the following code that uses os.walk to list all directories and files inside the downloaded dataset:

This code snippet uses two for loops that print the output in an uncluttered way, unlike a os.walk object. Here’s a fragment of the output:

As you can see, the output contains all files found in the file folders. We’ll now show how to catalog only spatial data using arcpy.da.walk.

STEP 2: Run the code

Just like os.walk, arcpy.da.walk returns data names in directory and database structures by moving through the tree from the top down or the bottom up. Each directory or workspace yields a tuple of three: directory path, directory names, and file names. Use the following code to create a Walk object that lists only the point feature classes in the Natural Earth Dataset. The output is also listed below:

As you can see, the second method that uses arcpy.da.walk is more specific in that it recognizes spatial data types, whereas os.walk is strictly file-based, meaning it returns everything encounters in a file folder. The downside of arcpy.da.walk is that os.walk has a better performance for file-based formats.

Learn more about programming ArcGIS Pro with Python:

Introduction to Programming ArcGIS Pro with Python

Intermediate ArcGIS Pro Programming with Python

Categories

Recent Posts

Eric Pimpler
Eric is the founder and owner of GeoSpatial Training Services (geospatialtraining.com) and has over 25 years of experience implementing and teaching GIS solutions using ESRI, Google Earth/Maps, Open Source technology. Currently Eric focuses on ArcGIS scripting with Python, and the development of custom ArcGIS Server web and mobile applications using JavaScript. Eric is the author of Programming ArcGIS with Python Cookbook - 1st and 2nd Edition, Building Web and Mobile ArcGIS Server Applications with JavaScript, Spatial Analytics with ArcGIS, and ArcGIS Blueprints. Eric has a Bachelor’s degree in Geography from Texas A&M University and a Master's of Applied Geography degree with a concentration in GIS from Texas State University.

Sign up for our weekly newsletter
to receive content like this in your email box.