Unlocking the Power of the Data Engineering Tool in ArcGIS Pro

by | Aug 19, 2024

When working with geospatial data, ensuring its accuracy and readiness for analysis is crucial. This is where ArcGIS Pro’s Data Engineering tool comes into play. Designed to help GIS professionals explore, clean, and prepare their data efficiently, the Data Engineering tool streamlines the initial stages of any geospatial project, setting a solid foundation for analysis and visualization. Here’s how the Data Engineering tool can enhance your workflow.

Data Exploration and Profiling

The Data Engineering tool in ArcGIS Pro allows you to quickly explore and understand your datasets. By providing detailed summary statistics on individual fields—such as mean, median, standard deviation, the count of null values, unique entries, and frequency distributions—it helps you identify potential data quality issues right from the start. This early insight is invaluable for making informed decisions about how to clean and prepare your data.

The Data Engineering tool can be used with the current selection set or all records in a table.  To initiate the Data Engineering tool, right click a layer from the Contents pane and select Data Engineering.

To add a field to the Data Engineering canvas and calculate summary statistics right click and select Add To Statistics And Calculate.

In the screenshot below we’ve added several fields and calculated the summary statistics for each.

Across the top of the Data Engineering display are tools for opening the associated attribute table, displaying the field view for the table, toggling the display of field types (Numeric, Text, Date), and a Calculate button that can be used to calculate the statistics in the display.  This can be used at any time to recalculate the statistics and is useful for situations where the selection set has changed.

Right clicking any of the columns will display a context menu containing options for sorting, hiding, and freezing/unfreezing columns.

With the Data Engineering tab active there are many other operations that can be performed including updating the symbology of the layer based on an attribute, creating charts, opening the associated attribute table, exporting statistics as a table, and more!

Field Calculations and Transformations

Transforming and calculating field values is straightforward with the Data Engineering tool. Whether you need to create new fields, perform arithmetic operations, or apply conditional logic, this tool gives you the flexibility to manipulate your data directly within ArcGIS Pro. It supports both Python and Arcade scripting, allowing for advanced field calculations tailored to your specific needs.  These tools can be accessed from the Data Engineering context tab and the Construct tools seen below.

Data Cleaning and Preprocessing

The Data Engineering tool simplifies the process of cleaning your data. You can easily handle missing or null values, remove or replace them, and detect outliers that might skew your analysis. Additionally, the tool supports data normalization and standardization, making it easier to prepare datasets from different sources for combined analysis.  These tools are generally found on the Data Engineering tab under the Clean button seen below.

Data Integration

A subset of data integration tools is provided for integrating data from another source.  These include geoprocessing tools such as Append, Spatial Join, Near, Summarize Within, Summarize Nearby, Enrich, and others.  These tools are found on the Data Engineering tab under Integrate.

Data Formatting

Also included on the Data Engineering tab are a set of formatting tools including Convert Temporal Field, Convert Time Zone, Pivot Table, Transpose Field, Reclassify Field, and Encode Field.

Visualizing Data

One of the standout features of the Data Engineering tool is its ability to visualize data quality. You can generate histograms, scatter plots, and other visualizations to understand the distribution of your data and spot patterns or anomalies. This visual approach to data profiling helps ensure that your data is ready for reliable analysis.

Efficient Workflow Integration

ArcGIS Pro’s Data Engineering tool is seamlessly integrated into the broader ArcGIS ecosystem, allowing you to incorporate your data preparation steps into larger geospatial workflows. Whether you’re working on a single dataset or managing multiple sources of data, this tool provides a cohesive environment for ensuring data quality and consistency.

Conclusion

The Data Engineering tool in ArcGIS Pro is an essential resource for anyone looking to optimize their geospatial data. By offering robust capabilities for data exploration, cleaning, and transformation, it enables you to prepare high-quality data that is ready for in-depth analysis and visualization. Incorporating the Data Engineering tool into your workflow ensures that your projects start with a solid foundation, leading to more accurate and meaningful results.

Categories

Recent Posts

Eric Pimpler
Eric is the founder and owner of GeoSpatial Training Services (geospatialtraining.com) and has over 25 years of experience implementing and teaching GIS solutions using ESRI, Google Earth/Maps, Open Source technology. Currently Eric focuses on ArcGIS scripting with Python, and the development of custom ArcGIS Server web and mobile applications using JavaScript. Eric is the author of Programming ArcGIS with Python Cookbook - 1st and 2nd Edition, Building Web and Mobile ArcGIS Server Applications with JavaScript, Spatial Analytics with ArcGIS, and ArcGIS Blueprints. Eric has a Bachelor’s degree in Geography from Texas A&M University and a Master's of Applied Geography degree with a concentration in GIS from Texas State University.

Sign up for our weekly newsletter
to receive content like this in your email box.