Tutorial: Using Python to Find Identical Features in a GIS Dataset

by | Jun 21, 2022

In this tutorial you’ll learn how to identify duplicate entries in a GIS dataset using ArcGIS Pro Python Notebooks.

If you want to use Python to find duplicate entries in a feature dataset, there are several ways to do so. This tutorial features two possible solutions. Both methods use the pandas and arcgis library. The second one uses a Counter object from the collections module.

Step 1: Download the data

We will use the Natural Earth quick start kit. In the tutorial covering bar charts using the matplotlib Python library, we found a feature that appeared twice in the attribute table, causing an outlier in the accompanying bar chart. First, download the data, unzip the file on your hard drive, open up Pro and create a new, empty project. Create a folder connection to the unzipped Natural Earth dataset and add the ne_110m_lakes.shp file (found in the “110m_physical” subfolder) to the map window so that it is listed in the maps contents pane on the right of the screen.

Step 2: Create a new Python notebook and import the libraries

Create a new Python notebook by selecting the Insert tab on the ribbon interface in Pro and select “New notebook”. A new notebook is opened. Use the following code to import the necessary libraries and create a spatially enabled dataframe of the feature set’s attribute table:

Step 3: Find duplicated rows based on a selected column

The pandas library Dataframe class provides a member function to find duplicate rows based on all or given column names only. We already know from earlier tutorials that there is one duplicate entry for this specific dataset, which is FID#19: a repeated entry named “Lake Onega”. We used Pro’s Find Identical tool to find this entry for us in an earlier tutorial, but we will now use Python’s pandas library to do this. Use the following code to find and list the duplicate row, which is indeed the feature with ID 19, as the output shows:

The output shows only the first and last column of the selected feature, but most importantly, it does more or less the same as Pro’s Find Identical tool. It does not list the first entry, only the second, repeated entry. However, this is enough to inspect the data and verify if there is a duplicate entry for this dataset.

Step 4: Identify duplicated field values using list objects

We’ll now show how you can identify repeated entries for a single column using Python list objects. It does exactly the same as the Dataframe.duplicated member function, but only using the column values which are converted to a list object and then counted individually. Note that this solution does not return feature IDs or entire rows: it only tells you if there are duplicate entries for a specified column. It is also a more involved way of finding duplicate values using more code, but it is nonetheless an approach that uses Python data types (lists), which are very useful in many cases. Use the following code to check for duplicated field values in the “name” column:

This code converts a column from a pandas dataframe to a list, including duplicates. This is possible as a spatially enabled dataframe is based on a pandas dataframe. In the output, you can see there two entries of Lake Onega. To count individual entries and check for duplicate entries, use a Counter object, as specified below. Make sure you import the collections module and Counter object, as specified above:

Learn more about using Python in ArcGIS Pro in our Introduction to Programming ArcGIS Pro with Python and Intermediate ArcGIS Pro Programming with Python classes.

Categories

Recent Posts

Eric van Rees
Eric van Rees is a freelance writer and editor. His specialty is GIS technology. He has more than eight years of proven expertise in editing, writing and interviewing as editor and editor-in-chief for the international geospatial publication GeoInformatics, as well as GIS Magazine and CAD Magazine, both published in Dutch. Currently, he writes about geospatial technology for various clients, publications and blogs.

Sign up for our weekly newsletter
to receive content like this in your email box.