In this tutorial, you’ll learn how to catalog data using Python. Specifically, we’ll use the arcpy and os modules to move through a directory tree to catalog and find data.
The os module is used to interact with your operating system, move through a directory and find data. This comes in handy if you want to see what data you have on disk without using a file explorer, or manipulate it directly after referencing it using Python. In an earlier tutorial, we used the os module to find shapefiles and add them to the map window in Pro.
Here, we’ll show two ways of moving through a directory tree and list the files inside it. The first approach uses os.walk(), a built-in Python method that generates the file names in the file index tree by walking either top-down or bottom-up. a 3-tuple, containing dirpath, dirnames, filenames. Next, we’ll look at a similar function in arcpy with the same name, that is based on the os method but comes with something extra: unlike os.walk, arcpy.da.walk recognizes database content such as geodatabase feature classes, tables, or rasters. This means you can use it as a filter to list only the database content you want to list, such as points, lines or polygons. We’ll now demonstrate how to use both methods.
STEP 1: Download the data and run the code In this tutorial, we’ll be using the Natural Earth quick start kit. Download the dataset, unzip the files to your hard drive, and open a new, empty project in Pro. Open a Python Notebook and use the following code that uses os.walk to list all directories and files inside the downloaded dataset:
This code snippet uses two for loops that print the output in an uncluttered way, unlike a os.walk object. Here’s a fragment of the output:
As you can see, the output contains all files found in the file folders. We’ll now show how to catalog only spatial data using arcpy.da.walk.
STEP 2: Run the code
Just like os.walk, arcpy.da.walk returns data names in directory and database structures by moving through the tree from the top down or the bottom up. Each directory or workspace yields a tuple of three: directory path, directory names, and file names. Use the following code to create a Walk object that lists only the point feature classes in the Natural Earth Dataset. The output is also listed below:
As you can see, the second method that uses arcpy.da.walk is more specific in that it recognizes spatial data types, whereas os.walk is strictly file-based, meaning it returns everything encounters in a file folder. The downside of arcpy.da.walk is that os.walk has a better performance for file-based formats.
Learn more about programming ArcGIS Pro with Python: