This article provides a rapid introduction to some important BI concepts. It then highlights the need for geospatial BI software and deals with the integration of the spatial component in a BI software stack in order to consistently enable geo-analytical tools.
By Thierry Badard and Etienne Dubé
GeoSOA Research Group, Laval University, Canada
“About eighty percent of all data stored in
corporate databases has a spatial component.”
- Carl
Franklin
Recently,
interest in the huge potential of Geo-spatial BI has increased. It aims at
combining geographic information system (GIS) and
business intelligence (BI) technologies. Geo-spatial BI combines spatial
analysis and map visualization with proven BI tools in order to better support
the corporate data analysis process and to help companies make more informed
decisions.
BI is a
business management term which refers to applications and technologies that are
used to gather, provide access to, and analyze data and information about
company operations. BI applications are usually used to better understand
historical, current and future aspects of business operations. BI applications
typically offer ways to mine database- and spreadsheet-centric data to produce
graphical, table-based and other types of analytics regarding business
operations. BI systems give companies a more comprehensive knowledge of the
factors affecting their business, such as metrics on sales, production, and
internal operations, in order to to make better business decisions.
This article
provides a rapid introduction to some important BI concepts. It then highlights
the need for geo-spatial BI software and deals with the integration of the
spatial component in a BI software stack in order to consistently enable
geo-analytical tools. We then present different works performed and tools
designed by the GeoSOA research group.
A Rapid
Introduction to BI
BI
applications rely on a complex architecture of software that is usually
composed of:
- An extract/transform/load
(ETL) tool to extract data
from different heterogeneous sources, provide integration and data
cleansing according to a target schema or data structure, and load the
data in a data warehouse.
- A data warehouse which
stores the organization’s historical data for analysis purposes.
- An online analytical
processing (OLAP) server
which enables the rapid and flexible exploration and analysis of the large
amount of data stored in the data warehouse.
- On the client side, some
reporting tools, dashboards and/or different OLAP clients to display
information in a graphical and summarized form to decision makers and
managers. These tools offer capabilities to explore data interactively and
support the analysis process.
- Optionally, some data
mining tools to automatically retrieve trends, patterns and phenomena in
the data.
Figure 1
illustrates the typical infrastructure on which BI applications rely.
Figure 1:
Classical Architecture for Deploying BI Applications
The data
warehouse plays a central and crucial role in this architecture. It is the
repository of an organization’s historical data. It is separate from
operational data sources but is often stored in relational database management
systems. Data warehouses are optimized for handling large volumes of data,
providing fast response during the analysis process, and handling complex
analytical queries. They rely on de-normalized data schemas which introduce
some redundancy to provide very fast replies to time consuming queries involved
in analytical requests.
A data
warehouse focuses more on the analysis and the correlation of large amounts of
data than on retrieving or updating a precise set of data. This is
fundamentally different to the functions of the transactional database systems
used in the day to day activities of a company.
Contents of
the data warehouse are often presented in a summarized form primarily for
analysts and decision makers. Figure 2 illustrates different tools from Pentaho used to present, explore and analyse
data.
Figure 2:
Dashboards, Reporting and Data Mining Tools
To query the
data warehouse, these tools generally use the MultiDimensional eXpressions (MDX) query
language implemented by the OLAP server. MDX is a de facto standard from
Microsoft which is also implemented by other OLAP servers and clients. MDX is
for OLAP data cubes what the structured query language (SQL) is for relational databases.
Queries are similar to SQL but rely on a model closer to the one used in
spreadsheets.
OLAP client
software propose alternate representation modes, such as pie charts and
diagrams, and different tools to refine queries and to explore data. These
tools are based on operators provided by the MDX query language and on a
complex logic implemented in the client. The spatial component of data can be
used to enhance the BI user experience with map displays and spatial analysis
tools to better support the analysis and decision processes.
Merging BI
and GIS Software
It is
difficult for a decision maker to answer complex questions like: where are the
urban spots that are more sensitive to heat waves, intense rain, flooding or
droughts in a specific geographic area? How many people with cardiovascular,
respiratory, neurological and psychological diseases will there be in 2025 and
2050 in a specific geographic area? How many people with low income live alone
in a building requiring major repairs in a specific geographic area?
To answer
such questions, you can use:
1. GIS:
implies writing very complex SQL queries and dedicated human resources.
Moreover, this job needs to be done anew every time data change or new analyses
have to be achieved.
2. Classical
BI tools: are often unable to handle the spatial dimension of data or only
provide a very basic support. Some phenomena can only be adequately observed
and interpreted by representing them on a map. This is especially true when you
want to observe the spatial distribution of a phenomenon or its spatiotemporal
evolution.
Geo-spatial
BI has recently stirred marked interest for the huge potential of combining
spatial analysis and map visualization with proven BI tools.
Tools
recently made available on the market rely on a loose coupling between existing
GIS software and some proven BI components. They provide first solutions to
display maps with summarized and aggregated information stemming from the BI
infrastructure while GIS data have to be stored and managed in a separate and
transactional database system or GIS data file. These solutions manage
geo-spatial and corporate data in different systems which require additional
efforts, resources and costs to consistently feed and maintain them. They also
do not fully take advantage of the powerful analytical capabilities of a
classical BI infrastructure and usually are not able to handle very large data
volumes. This loose coupling often requires the development of dedicated
applications each time a new analytical need emerges in the company.
The geometry
data type on which geo-spatial data relies is not handled as any other data type
in the BI infrastructure and connections with the GIS have to be carefully
initiated and maintained. Drill down and roll-up capabilities in the analytical
data to observe data at different levels of detail, time or scale are often not
supported by the map display because they are not intrinsic operators available
in GIS. This is mainly due to the transactional structure of geo-spatial data in
the underlying GIS software. Dimensional data structures on which BI tools rely
are more efficient to quickly reply to complex analytical queries which would
have involved numerous time consuming join queries in a transactional system.
Consistently
integrating the geo-spatial component in all parts of the BI architecture is
required. Figure 3 illustrates that all components of the BI infrastructure
have to be spatially-enabled.
Figure 3:
Integrating the Spatial Component into a Classical BI Infrastructure
Some spatial
capabilities such as support for reading and writing GIS file formats,
coordinate transformations, and spatial reference systems need to be injected
into ETL tools. OLAP servers should be extended to become actual Spatial
On-Line Analytical Processing (SOLAP) servers. SOLAP should bring the
consistent handling of geo-spatial features, map displays and spatial analysis
capabilities. SOLAP servers and clients should
“allow a rapid and easy navigation within spatial data warehouses and offer
many levels of information granularity, many themes, many epochs and many
display modes of information that are synchronized or not: maps, tables and
diagrams”.
In this
perspective and in order to not reinvent the wheel, the GeoSOA Research Group at Laval
University, Quebec, Canada started to consistently and completely integrate the
geo-spatial functionalities into an existing, mature, efficient and reputed open
source BI software stack.
A complete
open source BI software stack is offered by Pentaho. It includes:
- an ETL tool to integrate
data from heterogeneous sources to a data warehouse
- an OLAP server which
provides multidimensional query facilities on top of the data warehouse
- reporting and dashboard
tools, used to present data to analysts
The
integration of the Pentaho software suite with open source GIS components has
been investigated to create a complete spatially-enabled BI solution. This work
has led to the implementation of GeoKettle,
GeoMondrian and SOLAPLayers.
GeoKettle
GeoKettle is
a spatially-enabled version of Pentaho Data Integration (PDI), formerly known as Kettle. It is a
powerful, metadata-driven spatial ETL tool dedicated to the integration of
different spatial data sources for building and updating geo-spatial data
warehouses. GeoKettle enables the transparent handling of the geometry data
type as any other classical data type to all transformations available in
Kettle. It is possible to access geometry objects in JavaScript and to define
custom transformation steps. Topological predicates have all been implemented.
GeoKettle
has been released under the LGPL. Figure 4
illustrates the GeoKettle user interface showing a basic geo-spatial data
transformation.
Figure 4:
GeoKettle Interface
At present,
Oracle spatial, PostGIS, and
MySQL with ESRI shapefiles are natively supported in read and write modes. At
present, Microsoft SQL Server 2008, Ingres,
and IBM DB2 can be used with some modification. It is possible to build and
feed complex and very large geo-spatial data warehouses with GeoKettle. Spatial
reference systems management and coordinate transformations have been fully
implemented. Native support for unsupported geo-spatial databases and raster and
vector based data formats will be implemented in the near future as an active
and growing community has federated around the project.
GeoKettle
releases are aligned with PDI, allowing GeoKettle to benefit from all the new
features provided by PDI. For instance, Kettle is natively designed to be
deployed in cluster and web service environments. This makes GeoKettle suitable
for deployment as a service in cloud computing environments. It enables the
scalable, distributed and on demand processing of large and complex volumes of
geo-spatial data in minutes for critical applications, without requiring a
company to invest in an expensive infrastructure of servers, networks and software.
Upcoming
features to be implemented in GeoKettle include:
- cartographic preview
- implementation of data
matching steps to allow geometric data cleansing and comparison of
geo-spatial datasets
- read/write support for
other database, GIS file formats and geo-spatial web services
- native support for MS SQL
Server 2008 and Ingres
- implementation of a
spatial analysis step through a graphical interface
GeoMondrian
GeoMondrian
is a spatially-enabled version of Pentaho Analysis Services (Mondrian). It has been released under
the EPL.
As far as we
know, GeoMondrian is the first implementation of a true SOLAP server. It
provides a consistent integration of spatial objects into the OLAP data cube
structure, instead of fetching them from a separate spatial database, web
service or GIS file. To make a simple analogy, GeoMondrian brings to the
Mondrian OLAP server what PostGIS brings to the PostgreSQL database management
system. It implements a native geometry data type and provides spatial
extensions to the MDX query language, allowing embedding spatial analysis
capabilities into analytical queries.
These
geo-spatial extensions to the MDX query language provide many more
possibilities, such as:
- in-line geometry
constructors
- member filters based on
topological predicates
- spatial calculated
members and measures
- calculations based on
scalar attributes derived from spatial features
At present,
GeoMondrian only supports PostGIS based data warehouses but other databases
should be supported soon.
SOLAPLayers
Formerly
known as Spatialytics, SOLAPLayers is a lightweight web cartographic component
which enables navigation in SOLAP data cubes. It aims to be integrated into
existing dashboard frameworks in order to produce interactive geo-analytical
dashboards. The first version of SOLAPLayers stems from a Google Summer of Code
(GSoC) 2008 project performed under the umbrella of OSGeo. The client is released under the BSD license and
the server under the EPL.
SOLAPLayers
is based on the OpenLayers web mapping
client and uses olap4j for connection to
OLAP data sources. For now, it requires GeoMondrian to display members of a
geo-spatial dimension on a map. SOLAPLayers allows the:
- connection with a spatial
OLAP server such as GeoMondrian
- navigation in geo-spatial
data cubes
- cartographic
representation of some measures and members of a geo-spatial dimension as
static or dynamic choropleth
maps and proportional symbols
A demo
application is available online.
It demonstrates the interaction with GeoMondrian and how the cartographic
navigation in the geo-spatial data cube is performed.
Upcoming
features in the development for SOLAPLayers include:
- more map-driven OLAP
navigation operators
- dimension member
selection and navigation controls
- legend display
- new choropleth and
graphics mapping styles
- styles for other geometry
types
- multi maps
Conclusion
This article
has highlighted the need for geo-spatial BI software and has emphasized that
spatially-enabling a BI software stack requires the consistent integration of
the spatial component and its functionalities into each component of the BI
infrastructure. Works performed by the GeoSOA research group have led to the
release of three open source building blocks of a consistent and powerful
geo-BI software stack.
Based on
these key software components, future works deal with the design of a
geo-analytical dashboard framework. In order to easily design and deliver
dashboards which embed some geo-spatial components and representations, a highly
customisable and flexible geo-analytical dashboard framework is required. A
first integration of SOLAPLayers with JasperServer and iReport has recently been
performed in the GeoSOA research group. The result of this integration allows
displaying information in different ways and the synchronisation between the
different representations when the user drills down or rolls up on the map or
the charts.
More
recently, some experiments dealing with the integration of SOLAPLayers into the
Pentaho Community Dashboard Framework (CDF) have been performed
in the context of a GSoC 2009, under the umbrella of OSGeo.
The integration
work performed by the student during this period allows the display of the
SOLAPLayers cartographic component together with a pivot table component in a
CDF dashboard. Synchronisation between the map and the pivot table has been
implemented. Further work is required in order to more properly and
consistently integrate the SOLAPLayers component into CDF, but it represents a
good and promising first step towards the design of a highly customisable and
flexible geo-analytical dashboard framework. A live demo of the integration
work performed by the student will be available
shortly. The source code will also be available in the GSoC 2009 repository.
The reader
is invited to consult the presentation
about the research challenges dealing with the integration of the spatial
component in BI tools and the design of intelligent mobile applications for
better decision support. These research challenges are currently part of the
research agenda of the GeoSOA research group.
This article
is an abridged version of the original paper,
the full version can be freely downloaded here.