GIS professionals interested in data science can start their journey into this field by exploring various spatial data analysist tools offered by ArcGIS and learning how to code.
Current GIS software offers many tools that fall into the data science category. As such, they´re a great introduction to the data science field. In addition, coding skills are required doing more advanced data analysis.
Why GIS professionals need to learn how to code
Data science is all about analyzing, manipulating and visualizing data on a computer. To be able do this, you need to be able to code, preferably in R, Python and/or SQL. “Coding” refers to scripting rather than writing a software application in this context. With some solid knowledge of how to write loops, functions and work with various data types, it´s possible to start using popular Python data science tools such as NumPy and pandas.
Users of ArcGIS can start with reading ArcGIS Help entries such as “A quick tour of Python” and “What is Python?”, that links to online Python tutorials. Next, there´s the Python window where ArcGIS users can experiment with short scripting scripts to manipulate data in arcpy, the Python site package that offers all geoprocessing functionality and more. Be sure to look for NumPy entries in the ArcGIS Desktop help and experiment with it in ArcGIS – NumPy is a Python library that is used in many data science projects and has been part of ArcGIS for a long time for doing scientific computing.
Learn how to use the following spatial analysis tools
ArcGIS Desktop offers a lot of data science tools that don´t require coding, but will improve your data science skills. Be sure to read the documentation (ArcGIS Help) before using them.
- ArcGIS Desktop Spatial Statistics toolbox
In addition the Statistics toolset found under the Analysis toolbox, the Spatial Statistics toolbox contains additional statistical tools for analyzing spatial distributions, patterns, processes, and relationships. Look for an overview of all available tools in the ArcGIS Desktop Help section under “contents” and choose “tools” -> “spatial statistics toolbox”. A lot of information can be found about not only the different tools inside the toolbox, but also the concepts behind them, such as statistical tests, a hypothesis, p-values and z-scores. This is essential information for any data scientist.
- ArcGIS Extensions for performing spatial analysis
ArcGIS Desktop offers a number of extensions for doing advanced spatial analysis that could be labeled as data science, such as Geostatistical Analyst, Spatial Analyst and Network Analyst. These are documented under the ArcGIS Desktop under the ”Extensions” tab found under ”Contents”. Other helpful entries are “the geostatistical workflow”, “what is geostatistics?” and “Introduction to the ArcGIS Geostatistical Analyst Tutorial”.
- Other spatial data science tools
ArcGIS users can extend their work using R and/or Python. The ArcGIS-R Bridge is an add-in for ArcGIS that enables you to convert GIS data to an R programming environment, whereas the Python windows lets you do simple Python scripting inside of ArcGIS (but you rather might want to use and IDE for this to save and run larger scripts). R itself offers many libraries for working with spatial data and the RStudio software offers plotting functionality.
However, the Jupyter Notebook is the preferred application for data science workflows in Python. As this application requires Python 3, it´s not compatible with ArcGIS Desktop’s arcpy – as a solution, ArcGIS Pro can be used, that requires Python 3 for Python scripting. GIS and data science workflows are becoming more and more web-based, for example by tapping into big data tools that run in the cloud. Python is currently the best programming language to be able to tap into related disciplines such as big data, machine learning and IoT. There are currently many tools for data science and spatial data, such as cartoframes and the Python API for ArcGIS: both enable users to tap into cloud-based datasets for extended data science workflows, combined with mapping widgets to display analysis results in real-time.
Our classes on these subjects include: