Now that machine learning algorithms are available for everyone, they can be used to solve spatial problems. ArcGIS Pro offers different Spatial Machine Learning tools that enable classification, clustering and prediction of spatial data.
Traditional Machine Learning and Spatial Machine Learning
Machine learning (ML) is a general term for data-driven algorithms and techniques that automate prediction, classification and clustering of data. While traditional ML algorithms have been around for decades, they can be used by millions today as a result of improved processing power of microcomputers. Broadly speaking, traditional ML can be used to solve a wide range of spatial problems, while geography often acts as the “key” for disparate data. In the context of ArcGIS, one might speak of Spatial ML when geography is incorporated in the computation of data-driven algorithms and techniques. Let’s now have a look at some of the various clustering, classification and prediction tools in ArcGIS and ArcGIS Pro in particular.
Clustering tools in ArcGIS Pro
Clustering is the grouping of observations based on similarities of values or locations. ArcGIS offers many clustering tools, such as Spatially Constrained Multivariate Clustering, Multivariate Clustering, Density-based Clustering, Image Segmentation, Hot Spot Analysis and Cluster and Outlier Analysis.
The Density-based Clustering tool found inArcGIS Pro’s Spatial Statistics toolbox extracts clusters from Input Point Features and identifies any surrounding noise. The tool has three clustering method options: a defined distance, self-adjusting and multi-scale algorithm. The tool returns an output feature class where clusters are assigned a color. The new output feature class has a new feature named cluster_id that shows the cluster each feature falls into.
The Spatially Constrained Multivariate Clustering tool in ArcGIS Pro is a more complex clustering tool that tries to find a solution where all the features within each cluster are as similar as possible, and all the clusters themselves are as different as possible. Feature similarity is based on the set of attributes that are specified by the user for the Analysis Fields parameter. The Spatially Constrained Multivariate Clustering can be regarded as an exploratory tool to learn more about underlying structures in spatial data.
Classification tools in ArcGIS Pro
Classification is the process of deciding to which category an object should be assigned on a training set, for example to effectively help prepare for storm and flood events based on the latest high-resolution imagery of an area. ArcGIS tools for classification include Maximum Likelihood Classification, Random Trees, Support Vector Machine and Forest-based Classification and Regression.
ArcGIS Pro’s Forest-based Classification and Regression tool is a version of the random forest algorithm that is used widely in traditional ML. This particular tool trains a model based on known values provided as part of a training dataset. Then, this prediction model is used to predict unknown values in a prediction dataset that has the same associated explanatory variables. For example, the tool can be used to predict the probability a rare plant species will grow in a study area, using point data that shows where the species has been found. The tool creates many decision trees (also known as “a forest”), that are used for prediction. Each tree generates its own prediction and is used as part of a voting scheme to make final predictions, that are based on the entire forest rather than on any single tree.
Prediction tools in ArcGIS Pro
Prediction tools use the known to estimate the unknown, in the form of a continuous variable. Prediction tools in ArcGIS include Empirical Bayesian Kriging, Areal Interpolation, EBK Regression Prediction, Ordinary Least Squares Regression and Exploratory Regression, Geographically Weighted Regression and Generalized Linear Regression.
The Generalized Linear Regression (GLR) tool in ArcGIS Pro generates predictions or models a dependent variable in terms of its relationship to a set of explanatory variables. In general, regression is used to evaluate relationships between two or more feature attributes, while the GLR tool creates a model of the variable or process that can be used to examine and quantify relationships among features. For example, the tool can be used to explain what demographic characteristics contribute to high rates of public transportation usage.
The Geographically Weighted Regression (GWR) tool is a local form of linear regression that is used to model spatially varying relationships. It provides a local model of the variable or process by fitting a regression equation to every feature in the dataset. These separate equations are constructed by incorporating the dependent and explanatory variables of features within the neighborhood of each target feature. For the best results, the tool should be applied to datasets with several hundred features. For example, both the GLR and GWR tool can be used to predict housing sales in a region based on large datasets of that same region with many different variables, ranging from current housing prices to the amount of rooms per house.