How to Add Data Science to your GIS Skills

by | Mar 27, 2018

GIS professionals interested in data science can start their journey into this field by exploring various spatial data analysist tools offered by ArcGIS and learning how to code.

Current GIS software offers many tools that fall into the data science category. As such, they´re a great introduction to the data science field. In addition, coding skills are required doing more advanced data analysis.

Why GIS professionals need to learn how to code

Data science is all about analyzing, manipulating and visualizing data on a computer. To be able do this, you need to be able to code, preferably in R, Python and/or SQL. “Coding” refers to scripting rather than writing a software application in this context. With some solid knowledge of how to write loops, functions and work with various data types, it´s possible to start using popular Python data science tools such as NumPy and pandas.

Users of ArcGIS can start with reading ArcGIS Help entries such as “A quick tour of Python” and “What is Python?”, that links to online Python tutorials. Next, there´s the Python window where ArcGIS users can experiment with short scripting scripts to manipulate data in arcpy, the Python site package that offers all geoprocessing functionality and more. Be sure to look for NumPy entries in the ArcGIS Desktop help and experiment with it in ArcGIS – NumPy is a Python library that is used in many data science projects and has been part of ArcGIS for a long time for doing scientific computing. 

Learn how to use the following spatial analysis tools

ArcGIS Desktop offers a lot of data science tools that don´t require coding, but will improve your data science skills. Be sure to read the documentation (ArcGIS Help) before using them.

  1. ArcGIS Desktop Spatial Statistics toolbox

In addition the Statistics toolset found under the Analysis toolbox, the Spatial Statistics toolbox contains additional statistical tools for analyzing spatial distributions, patterns, processes, and relationships. Look for an overview of all available tools in the ArcGIS Desktop Help section under “contents” and choose “tools” -> “spatial statistics toolbox”. A lot of information can be found about not only the different tools inside the toolbox, but also the concepts behind them, such as statistical tests, a hypothesis, p-values and z-scores. This is essential information for any data scientist.

  1. ArcGIS Extensions for performing spatial analysis

ArcGIS Desktop offers a number of extensions for doing advanced spatial analysis that could be labeled as data science, such as Geostatistical Analyst, Spatial Analyst and Network Analyst. These are documented under the ArcGIS Desktop under the ”Extensions” tab found under ”Contents”. Other helpful entries are “the geostatistical workflow”, “what is geostatistics?” and “Introduction to the ArcGIS Geostatistical Analyst Tutorial”.

  1. Other spatial data science tools

ArcGIS users can extend their work using R and/or Python. The ArcGIS-R Bridge is an add-in for ArcGIS that enables you to convert GIS data to an R programming environment, whereas the Python windows lets you do simple Python scripting inside of ArcGIS (but you rather might want to use and IDE for this to save and run larger scripts). R itself offers many libraries for working with spatial data and the RStudio software offers plotting functionality.

However, the Jupyter Notebook is the preferred application for data science workflows in Python. As this application requires Python 3, it´s not compatible with ArcGIS Desktop’s arcpy – as a solution, ArcGIS Pro can be used, that requires Python 3 for Python scripting. GIS and data science workflows are becoming more and more web-based, for example by tapping into big data tools that run in the cloud. Python is currently the best programming language to be able to tap into related disciplines such as big data, machine learning and IoT. There are currently many tools for data science and spatial data, such as cartoframes and the Python API for ArcGIS: both enable users to tap into cloud-based datasets for extended data science workflows, combined with mapping widgets to display analysis results in real-time.

Our classes on these subjects include:

Introduction to Spatial Statistics using ArcGIS and R

Introduction to Python

Python for Data Science I: Programming and Efficient Data Management (Pandas)

R for Data Science: Programming and Efficient Data Management

Introduction to Programming ArcGIS Pro with Python

Programming ArcGIS with Python Workshop – Introduction


Recent Posts

Eric van Rees
Eric van Rees is a freelance writer and editor. His specialty is GIS technology. He has more than eight years of proven expertise in editing, writing and interviewing as editor and editor-in-chief for the international geospatial publication GeoInformatics, as well as GIS Magazine and CAD Magazine, both published in Dutch. Currently, he writes about geospatial technology for various clients, publications and blogs.

Sign up for our weekly newsletter
to receive content like this in your email box.