BIG DATA | ICT & Smart Cities : Design & Implementation of a Generic Cloud-based Analytics Service

BIG DATA | ICT & Smart Cities : Design & Implementation of a Generic Cloud-based Analytics Service

By Zaheer Khan, Ashiq Anjum, Kamran Soomro and Muhammad Atif

Approximately 50% of world’s population live in urban areas, a number which is expected to increase to nearly 60% by 2030. High levels of urbanization are even more evident in Europe where today over 70% of Europeans live in urban areas, with projections that this will increase to nearly 80% by 2030. A continuous increase in urban population strains the limited resources of a city, affects its resilience to the increasing demands on resources and urban governance faces ever increasing challenges. 

BIG DATA | ICT & Smart Cities : Design & implementation of a Generic Cloud-based Analytics Service

Furthermore, sustainable urban development, economic growth and management of natural resources such as energy and water require better planning and collaborative decision making at the local level. In this regard, the innovation in ICT can provide integrated information intelligence for better urban management and governance, sustainable socioeconomic growth and policy development using participatory processes.

Smart cities use a variety of ICT solutions to deal with real life urban challenges. Some of these challenges include environmental sustainability, socioeconomic innovation, participatory governance, better public services, planning and collaborative decision-making. In addition to creating a sustainable futuristic smart infrastructure, overcoming these challenges can empower the citizens in terms of having a personal stake in the wellbeing and betterment of their civic life. Consequently, city administrations can get new information and knowledge that is hidden in large-scale data to provide better urban governance and management by applying these ICT solutions. Such ICT enabled solutions thus enable efficient transport planning, better water management, improved waste management, new energy efficiency strategies, new constructions and structural methods for health of buildings and effective environment and risk management policies for the citizens. Moreover, other important aspects of the urban life such as public security, air quality and pollution, public health, urban sprawl and bio-diversity loss can also benefit from these ICT solutions. ICT as prime enabler for smart cities transforms application specific data into useful information and knowledge that can help in city planning and decision-making. 

From the ICT perspective, the possibility of realization of smart cities is being enabled by smarter hardware and software e.g. Internet of Things (IoTs) i.e. Radio Frequency Identification (RFIDs), smart phones, sensor nets, smart household appliances, and capacity to manage and process large scale data using cloud computing without compromising data security and citizens privacy. With the passage of time, the volume of data generated from these IoTs is bound to increase exponentially and classified as Big data. In addition, cities already possess land use, transport, census and environmental monitoring data which is collected from various local, often not interconnected sources and used by application specific systems but is rarely used as collective source of information (i.e. system of systems) for urban governance and planning decisions. Many local governments are making such data available for public use as "open data". Managing such large amount of data and analyzing for various applications e.g. future city models, visualization, simulations, provision of quality public services and information to citizens and decision making becomes challenging without developing and applying appropriate tools and techniques.

In the above context, recent emergence of Cloud computing promises solutions to such challenges by facilitating big data storage and delivering the capacity to process, visualize and analyze city data for information and knowledge generation. Such a solution can also facilitate the decision makers in meeting the QoS requirements by providing an integrated information processing and analytic infrastructure for variety of smart cities applications to support decision-making for urban governance.

Cross-thematic data management and analysis for variety of smart city applications in cloud environment

Figure 1 depicts our view of the main thematic pillars of smart cities: smart people, smart economy, smart environment, smart governance and smart mobility which contribute towards the sustainability of resources and resilience against increasing urban demands. The main motive towards developing such a view is to consider a holistic approach for smart cities by providing data acquisition, integration, processing and analysis mechanisms to synthesize the needed information that can help in enhancing resilience and sustainability of a city. Managing data for these thematic domains in a Cloud environment provides the opportunity to integrate data acquired from various sources, process and analyze it in acceptable time frames. However, it is not straightforward to adopt cloud computing to deal with smart city applications due to a number of challenges and requirements [20]. Our aim here is to discuss a perspective on how these challenges can be addressed in part by using ICT tools and software services to intelligently manage and analyze the complex big data of smart cities, by incorporating a suitable Cloud architecture.

An Abstract Architectural Design of the Cloud-based Big Data Analysis

This section of the article discusses the development of a cloud service for smart city related big data analysis. Here, we describe the design and implementation of a generic Cloud based Analytics Service.  

The system architecture, as shown in Figure 2, is divided into three tiers to enable the development of a unified knowledge base. Each layer represents the potential functionality that we need to meet the overall research objectives. The lowest layer in the architecture consists of distributed and heterogeneous repositories and various sensors that are subscribed to the system. The objective of this layer is data acquisition, cleansing and classification using standard approaches such as APIs or OGC (Open Geospatial Consortium) compliant web services. Existing tools like TheDataTanka and CKANb for data access, transformation and publishing (e.g. XML, CSV, JSON or binary structures such as SHP files or relational database) in a RESTful way can be utilised. For data storage Cassandra (un/semi-structured - no SQL), PostgreSQL (relational structured data) and Virtuoso RDF store are selected. However, detailed design and prototype of the bottom two tiers is not within the scope of this paper and is partly covered in and rest is a work in progress.

Proposed Architectural Design - Cloud-based Analytics Services

The resource data mapping and linking layer (middle layer) finds new scenarios and supports workflows to develop relations that were not possible in the isolated data repositories. However it is likely that collected data will be in a number of different formats and semantics due to heterogeneous data sources and hence can benefit from data linking. For example, linked data or open data where databases can be browsed to serve queries and find events of interest that were not possible without the availability of linked data. Furthermore, semantic data model can be developed as a layer on top of the linked data to make sense of everything. Once the meta-data of heterogeneous data sources has been populated into meta-data stores, mappings are established between the resources, links are generated and the data is made semantically relevant and browse-able. This data browsing can help end users to select different cross-thematic indicators and variables to perform analytics. Existing metadata formats (such as the European Data Model, Talis Aspire, the Open Library and DBLP as Linked Data) are preferable choices to describe and store meta-data extracted from different sources. The data is then mapped using standardised resource description semantics, e.g. via an RDF store (e.g. Virtuoso DB) which has all the necessary links established between artefacts and resources. In case of linked services, higher level services and mashups can be composed to browse and make use of this data for interesting scenarios. SPARQL, an RDF query language, then can be used to retrieve and manipulate data stored in Resource Description Framework format.

An analytic engine in top layer processes the data for application specific purposes. The engine utilizes the data that is available in the linked data layer and helps users in submitting queries, application specific algorithms and workflows to find information from the data repositories. In this respect, Big Data Mining is recently a new trend used to identify large data sets due to complexity, cardinality and continuality. Big Data Mining techniques are increasingly becoming an important and effective way in various data driven applications such as network traffic risk analysis, business data analysis etc. These techniques will be extremely useful to generate non-obvious relations and associations from huge data available from public services of smart future cities.

Since the main focus of this paper is smart city data analytics, we’ll mainly focused on the analytic engine and tried to explain in detail. For analytic engine, various statistical modeling, machine learning and data mining techniques can be applied. Also, existing tools such as RapidMiner and R in combination with Hadoop MapReduce can be utilized to mine the city data at scale. In literature Big data mining is considered much more and complex than traditional data mining currently in practice. This is true for smart city data analytics because multi-disciplinary nature of city data can help in formulating a variety of city application scenarios. In this regard, some of the possible components for cloud based big data mining or analysis can be;
  • Data processing / integration, classification,
  • Clustering, data reduction,
  • Visualization, and finding association rules as depicted in Figure 3.
Big Data analytic components for analytical engine

It is not necessary to use all these components. Depending upon the application, subsets of components may be needed for data analysis. For example, for the open data use case, algorithms from only two components i.e. data processing and finding association rules are needed. All these components are well known components in data mining. Furthermore, these components can benefit from state-of-the-art tools such as Apache Mahout and R for cluster based scalable machine learning.


Smart cities provide an opportunity to connect people and places using innovative technologies that helps in better city planning and management. At the core of smart cities are the collection, management, analysis and visualization of huge amount of data that is generated every minute in an urban environment due to socioeconomic, anthropogenic or natural environmental events or other activities. Smart cities data can be collected directly from variety of sensors, smart phones, citizens and integrated (or linked) with city data repositories to perform analytical reasoning and generate required information (e.g. for end users) or new knowledge for decision-making for better urban governance. Innovations in information and communication technological provide the opportunity to manage and process smart city data and provide timely and necessary information to relevant stakeholders for decision making.

About The Authors:

Zaheer Khan and Kamran Soomro - Faculty of Environment and Technology, Department of Computer Science and Creative Technologies, University of the West of England, Bristol, UK.

Ashiq Anjum - Faculty of Business, Computing and Law, School of Computing and Mathematics, University of Derby, Derby, UK.

Muhammad Atif Tahir- School of Computer Science and Digital Technologies, University of Northumbria, NE1 8ST, Newcastle upon Tyne, United Kingdom.

Publication Details:

This article is an extract from a technical paper -"Towards cloud based big data analytics for smart future cities by Zaheer Khan, Ashiq Anjum, Kamran Soomro and Muhammad Atif Tahir", originally published at Journal of Cloud Computing: Advances, Systems and Applications (2015) 4:2 , DOI 10.1186/s13677-015-0026-8

© 2015 Khan et al.; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons Attribution License 

Download The Paper - LINK
    Blogger Comment
    Facebook Comment