Big data was a major topic during this year´s Esri UC. This blog post discusses a number of new and upcoming Esri products that deal with big data.
By Eric van Rees
What is big data?
The term big data refers to datasets that are so large or complex that traditional data processing applications are inadequate. Often, the term refers simply to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set. Early on, Esri recognized the potential of big data analytics and developed a big data strategy for both hardware and software products. A number of these products are discussed below.
Esri’s big data strategy
Esri now considers big data computing important in the development and vision of the ArcGIS platform. This is reflected in their big data strategy, which is two-fold: on the one hand, the company strives to incorporate big data tools inside the ArcGIS platform, while on the other hand offering solutions that pair with many familiar big data platforms from recognized technology vendors like IBM, Microsoft, and SAP.
An example of this is GIS Tools for Hadoop, an open-source toolkit intended for big spatial data analytics, allowing users to leverage the Hadoop Framework to do spatial analysis on spatial data. On the other hand, by offering big data tools that are designed to be used within the ArcGIS platform, users are able to find answers in raw big datasets, such as geographic patterns, spatial relationships, perform predictive modeling or get geographical insights from social media data.
Current big data tools in the ArcGIS Platform
A number of recent ArcGIS product releases deal with big data and more will follow soon. These are released as stand-alone apps or additions to ArcGIS Server. Insights for ArcGIS for example is a new web app that enables iterative and exploratory analysis with user data in a geographic context. The application integrates a variety of beg data sources, such as big data files, Hive/Hadoop and the Spatiotemporal Big Data store.
This Spatiotemporal Big Data store is an enhancement of the ArcGIS Data Store, available with ArcGIS for Server, that makes it possible to work with observational data such as moving objects, changing attributes of stationary sensors, or both. The spatiotemporal big data store enables archival of high volume observation data, sustains high velocity write throughput, and can run across multiple machines (nodes). It is available with ArcGIS 10.4.
Finally, the ArcGIS GeoEvent Extension for Server allows users to incorporate real-time information streams with their existing GIS data and IT infrastructure.
ArcGIS 10.5 and big data
With the release of ArcGIS 10.5, to be released in late 2016, will come three new products that support big data. First, the GeoAnalytics Server will focus on vector-based analytics using a library of spatial and temporal analytics. Second, the Image Analytics Server provides a new capability for scalable distributed processing of massive image and raster collections. Third, the real-time ArcGIS GeoEvent Extension for Server will be enhanced to significantly increase the ingestion of real-time data.
These three releases are meant to enhance each other; the combination of ArcGIS GeoEvent Extension for Server and GeoAnalytics functionality for example will support high-velocity, real-time data ingestion. The combination of imagery and GeoAnalytics functionality will support data dissemination, on-the-fly analysis, and batch analysis for large collections of imagery gathered by drone, aerial, and satellite sensors.
These new tools can be called and run from ArcGIS Pro and ArcGIS for Server. Tasks can be programmed with ArcPy for batch analytics. In the case of the GeoEvent Extension, tasks can be visually designed and distributed for execution. For the user, everything is installed and run the way standard tools are, but behind the scenes the changes are quite significant: less expensive hardware is used to store large amounts of geospatial data and data is scaled horizontally to handle more volume. On top of that, advantage is taken of in-browser advanced capabilities (3D and local GPU processing) and geoprocessing tasks are converted to parallel processing on a multiple-nodes model.