National Security. Volume of data has expanded exponentially with the internet, especially in areas affecting national security like counter-terrorism, network security, and counter-proliferation. Rapid influx of unstructured data relating to national security issues as cyber defense requires both real time and off line analysis.
By Rear Admiral Dr. S. Kulshrestha (retd.), Indian Navy
In 1997 two scientists of NASA, Michael Cox
and David Ellsworth used the term Big Data in their paper "Application-controlled
demand paging for out-of-core visualization". While spelling out the problem
they had with computer graphics they stated that "….provides an interesting
challenge for computer systems: data sets are generally quite large, taxing the
capacities of main memory, local disk, and even remote disk. We call this the
problem of big data."
Big Data has
been traditionally associated with issues of data analysis and data storage.
Gartner report proposed a definition encompassing 3V’s: Volume, Velocity, and
Variety. The report commented upon the increasing size of data, the increasing
rate at which it is produced, and the increasing range of formats and
representations employed. This definition has since been expanded to include a
fourth V: Veracity.
Oracle in its
definition states that big data is the derivation of value from traditional
relational database driven business decision-making, augmented with new sources
of unstructured data. By new sources, Oracle implies social media, blogs, image
data, sensor networks, and data forms, which vary in size, structure, format
etc. Thus, big data is the inclusion of additional data sources to augment
existing operations. However, like Gartner, Oracle also does not quantify the
term big data.
Intel has
connected big data to organizations “generating a median of 300 tera-bytes (TB)
of data weekly”. Intel has described big data by quantifying the experiences of
its business partners. Intel contends that analytics commonly deal with
business transactions. As per Microsoft: “Big data is the term increasingly
used to describe the process of applying serious computing power - the latest
in machine learning and artificial intelligence - to seriously massive and
often highly complex sets of information”.
From the above
definitions, it can be inferred that that big data is intrinsically related to
data analytics and there are a number of related technologies like NoSQL and
Apache Hadoop. In addition, it is clear that there are a number of
organizations, specifically industrial organizations that are linked with big
data. A contrary definition provided by The Method for an Integrated Knowledge
Environment (MIKE2.0) project, suggests: "Big Data can be very small and not
all large datasets are big". Thus, it is a high degree of permutations and
interactions within a dataset, which defines big data. Ward and Barker have
provided a comprehensive definition that- Big data is a term describing the
storage and analysis of large and or complex data sets using a series of
techniques including, but not limited to NoSQL, MapReduce and machine learning.
Big Data
Environment
To give a feel
of data generated every day it would suffice to state that in 2014, each day
approximately 294 bn emails were sent, 6 bn Google Searches were carried out, 3.5 bn
Facebook messages were posted and 40 mn Tweets were shared. With the likely
advent of internet of things, data generation is set to grow exponentially when
appliances, vehicles and wearable technologies start to communicate with each
other. The variety and complexity of sources and formats of data continue to
expand. Some of the sources are internet, social media, mobiles, mobile
applications; national, state & local databases; GPS & commercial
databases that collect individual data from commercial & personal
transactions & public records; surveys; geo-spatial data; and scanned
documents.
One thing is
clear that big data pertains to large-scale collection of data due to reduction
in costs of data collection and storage as well as surge in sources of data
like cameras, sensors, and geo-spatial location technologies. The ‘internet of
things’ (from web-enabled appliances, wearable technology, and advanced sensors
to monitor everything from vital signs to energy use to a jogger’s running
speed) will create exponential surge in data demand and its high performance
processing at rates unseen so far. Both types of data collection viz in analog
form (images from cameras, mobiles, video recorders, wearable devices, which
can be converted to digital) and digital form (emails, internet browsing and
capture, location devices) are currently available in various formats and
require data fusion. The collection and processing is demanding speeds, which
are near real time, thus pushing data analysis to its current limits. For
example mapping services, medical care and car operations require near
immediate responses for them to be safe and effective. Technologies for
handling of big data as well as their management are witnessing an
unprecedented demand. Commercial and customer interests have enabled a merger
of information and industrial economies.
A complete perspective
in to big data requires pinning down of a few important characteristics
peculiar to big data. Since big data originates from a tangible source, a large
part of it may be of no value. Further data in its raw form may not be amenable
to analysis. An information retrieving mechanism is required to extract
relevant information and convert it into a structured form before it can be
analyzed. This is a demanding technology as thoroughness and accuracy is
required to be ensured at this stage. Data analysis is not simply
identification, location, or citing of data, it is an interdisciplinary
endeavor involving pure mathematics, statistics, computer science etc. For
data to be of value to a decision maker it has to be made understandable and
interactive in near real time. The software needs designing in a manner that it
is user friendly since the user may not be having an in depth understanding of
big data systems and algorithms. Predictive analytics is an important area of
big data mining that deals with extracting information from data and using it
to predict trends and behavior patterns. Fundamentally, predictive analytics
depends upon capturing relationships between explanatory variables & the
predicted variables from past occurrences, and exploiting them to predict the
unknown. Predictive analytics tools have become sophisticated to dissect data
problems and present findings using simple charts, graphs, and numbers that
indicate the likelihood of possible outcomes. The accuracy and usability of
results however, depends on the level of data analysis and the quality of
assumptions.
Major
technological advances include Cluster Computer Systems, which comprise large
numbers of computers connected by high speed LANs for data storage, data
organization, analysis, and query response. The software design of clusters
enables use of COTS hardware for enhanced efficiency, reliability, and
scalability. The software algorithms build statistical models from huge data
sets by utilizing interdisciplinary approach, which makes it possible to draw
inferences for further decision making. Some note worthy software includes
Apache S4, which processes continuous data streams. S4 applications are
designed combining streams and processing elements in real time. Twitter’s
Storm also uses a similar approach. Hadoop Map Reduce is a programming model
and software framework for writing applications. It enables quick processing of
big data on large clusters of computer nodes in parallel. The input dataset is
divided into independent subsets that are processed in parallel by map tasks.
The final result is obtained by reduce tasks which use the output of the map
tasks.
Military
Information and Intelligence
In National
Security domain, the volume of data has expanded exponentially with the internet, especially
in areas affecting national security like counter-terrorism, network security,
and counter-proliferation. Rapid influx of unstructured data relating to
national security issues as cyber defense requires both real time and off line
analysis. The handling of this data needs quick advances from system
architecture to innovations in statistics. Large scale and in depth querying of
big data, visualization of big data in various forms like maps, graphs &
time lines, and near real time analyses of streaming data are some of the
essential requirements for national security issues.
The importance
of information and intelligence for the military is brought out from the fact
that it is fundamental to planning of any military operation in peace or war.
Reconnaissance and surveillance tasks have been carried out by militaries since
time immemorial prior to mounting any assault on the enemy. The success of the
mission depends upon the correct analysis of the available information during
the planning stage. The collection of information, its analysis, and further
dissemination is of prime importance for any military. Today the methods of
collecting information have changed due to quick availability of combat
information of high quality and reliability from a varied array of sensors and
sources. A definition of Intelligence, Surveillance and Reconnaissance (ISR)
components taken from US Army manuals lists Intelligence as firstly, the
product resulting from the collection, processing, integration, analysis,
evaluation, and interpretation of available information concerning foreign
countries or areas; and secondly, as information and knowledge about an
adversary obtained through observation, investigation, analysis, or
understanding. Surveillance is defined as the systematic observation of
aerospace, surface, or subsurface areas, places, persons, or things, by visual,
aural, electronic, photographic, or other means. Reconnaissance is a mission
undertaken to obtain by visual observation or other detection methods, information
about the activities and resources of an enemy or potential enemy, or to secure
data concerning the meteorological, hydro-graphic, or geographic characteristics
of a particular area. An optimal ISR plan is formulated by integrating the ISR
missions based upon capabilities of various assets available. The information
is thereafter collated and analyzed for feeding in to the planning of
operations.
The basic
information that is required by any commander in today’s networked warfare is
accurate position of his own units, location of the enemy and his reserves,
location of supporting units and placing of other miscellaneous assets. With
this knowledge, a commander today can effectively carry out his mission by
optimally utilizing the available firepower and resources. Thus crucial to any
mission is ‘situational awareness’, which comprises tasking, collection,
processing, exploitation, and dissemination. Embedded in the ISR is
Communication without which no mission can be accomplished. Digital
communication systems, internet, and mobile devices have revolutionized the
amount of data generation. The term Big Data refers to a whole gamut of
information available from sensors, video imagery, mobile phone, signals
intelligence, and electronic warfare interceptions to satellite images. Data is
collected at unprecedented levels.
Rapid
technological advances in sensor based, smart, and networked combat systems is
pushing the military to adopt commercially available emerging technologies and
adapt them for its use. This has led to a synergistic relationship with digital
industry where in military may no longer develops its own hardware and software DeNovo, but harness and modify the "Commercial of the Shelf" (COTS) items. The advent of big data
is driving the armed forces to shift the integrated decision making support
systems to architecture and analytics of big data. The financial crunch faced
by militaries in leading countries implies even more dependence upon technology
by the reduced manpower. This in turn has led other nations to adopt a wait and
watch strategy by which they would go in for the best available solution
adopted by leading armies. To understand and react to real time tactical
situations commanders have to manage and control big data environment comprising
of, historical or point-in-time data, transactional data, optimized data for
inquiry, unpredictable pattern of data, and ad-hoc use of the system. The
military has been collecting data at humongous levels since the induction of
unmanned vehicles with sensors. The data heaps cannot be analyzed in
traditional manner. They require dedicated data scientists and development of
different software tools to exploit extracted information for mission planning.
It is understood that in
Afghanistan, the Defense Advanced Research Projects Agency (DARPA) had sent in
data scientists and visualizers under a program called Nexus 7. They operated
directly with military units and assisted commanders in solving specific
operational challenges. In some cases surveillance and satellite data was fused
to visualize traffic flow through road networks to detect and destroy
improvised explosive devices.
Major issues
faced by the military today involve availability of ever increasing volumes of
sensor data from integral sources like Unmanned Aerial Vehicles (UAVs) and other national assets. A
simple full day UAV mission can provide upwards of 10 tera-bytes of data of
which only about 5% is analyzed and the rest stored. Analysts are restricted by
the download speeds of data depending upon their locations. Untagged data leads
to downloading of similar data from other sources by the analyst to firm up
their conclusions. Many times the communication lines are shared or may not be
continuously available thereby increasing delays in analysis. Providing a
comprehensive situational awareness is dependent upon the accuracy and
integration of data received from multiple types of sensors as well as
intelligence sources. The screens and software tools do not have
interoperability as of now. Due to security considerations, ISR data from
different sources is stored in different locations with varying access levels,
this leads to incomplete analysis. Single network domain providing access to
data at multiple levels of security classification is not yet available.
Analysts currently spend only 20 percent of their time looking at correct data,
whereas 80 percent of the time is spent looking for the correct data.
Some of the
companies working in this field with the US military which provide a common
operating picture or COP are given in succeeding examples.
Modus Operandi
takes big data, infuses it with expert knowledge, and creates a common
framework for easy identification of patterns. The data is embedded in to an
underlying graphic structure and is amenable to complex queries. It can detect
patterns and output different types of visualizations, like maps and time lines
etc.
Image Attribute: Modus Operandi’s Wave-EF semantically
enriches multi-source intelligence data through direct feeds or from
net-centric services for fusion, mediation, metadata tagging, alerting and
dissemination within applications and services.
Source: Semantic Enrichment and Fusion Of Multi-Intelligence
Data by Dr. Richard D. Hull, Don Jenkins, Alan McCutchen, Modus Operandi, Inc.
(Download)
Palantir
Technologies, is known for its software Palantir Gotham that is used by
counter-terrorism analysts at offices in the United States Intelligence Community
and United States Department of Defense, fraud investigators at the Recovery
Accountability and Transparency Board, and cyber analysts at Information
Warfare Monitor (responsible for the Ghost Net and the Shadow Network
investigation).
Video Attribute: Railgun: Leveraging Palantir Gotham as a Command and
Control Platform / Source: Palantir Technologies Official Youtube Channel
SAP’s HANA platform,
provides a real-time analytics and applications platform for real-time big data
that offers varying layers of security. It offers predictive, forecasting and
calculation solutions and stitches together maintenance failure codes and
document notes.
Image Attribute: The combination of LuciadLightspeed and SAP HANA provides a visual analysis of violent events in Africa over the last 20 years.
Image Attribute: The combination of LuciadLightspeed and SAP HANA provides a visual analysis of violent events in Africa over the last 20 years.
To tackle the
problem and analyze data in real time, Oracle has created a new-engineered
system to handle big data operations. The company brought together its hardware
with Cloudera’s Hadoop, enabling patching of multiple layers of the big data
architecture.
Image Attribute:
Cloudera’s Hadoop integration with Oracle Advanced Analytics
Unified Data
Architecture of Teradata is a comprehensive big data solution, which aims to
bring data needed for analytics across the entire organization into one place
to create a single version of enterprise data. For example, capturing
minute-by-minute maintenance data in the field including potential new sources
of big data.
Image Attribute: TERADATA Unified Data Architecture Concept
DigitalEdge by
Leidos is a scalable, pre-integrated, flexible, and pluggable data management
platform that allows rapid creation and management of near real-time big data
applications. Leidos’s Scale2Insight (S2i) is a solution that supports large
complex data environments with multiple disparate sensors collecting
information on different parts of the data ecosystem.
Image Attribute: The modular architecture of DigitalEdge
preserves investments made in analytic, visualization and work flow solutions.
Its design provides the necessary functions to build a powerful, real-time big
data analytic environment. / Source: DigitalEdge White-paper by Leidos
(Download)
SYNTASA
delivers analytical applications focused on the behavior of visitors to
internal government web sites. The analytics, which are built on an open source
big data platform, determine the behavioral trends of visitors in order to
improve the use of information by government analysts.
Image Attribute:
SYNTASA Analytics Services for Government / Source: website screen shot (link)
Indian Context
“Knowledge of
the mechanics and details of big data analytics are not only important to
exploit it but also to collapse the distances between policy, planning, and
operations.”
- Lt Gen Anil
Bhalla, DGDIA, and DCIDS (Int), 2015
The
applicability of big data analytics in context of Indian defense forces is very
much in line with the developed forces in the world. This has been borne during
the deliberations of a national seminar organized by CLAWS in Feb 2015 titled
Big Data–Applicability in the Defense Forces. The outcome of the seminar
resulted in highlighting the requirement of big data analytics in the fields of
intelligence, operations, logistics, mobilization, medical, human resources,
cyber security and counter insurgency/ counter terrorism for the Indian armed
forces. It was highlighted that in the arena of intelligence examples of inputs
likely to be received and analyzed during peace are related to terrorism, proxy
war, left wing extremism, conduct of training by opponents, deployment of
forces, and sponsoring of non-state actors. The seminar brought out the need
for development of algorithms to analyze millions of open-source documents
generated and compare them with the gathered human intelligence. There is a
requirement to acquire predictive capability to anticipate specific incidents
and suggest measures by analyzing historical events.
However, fact
remains that due to nascent nature of big data analytics its awareness is
limited to a small number of involved agencies. The benefits of big data in
operational scenario decision making while safe guarding accuracy and
reliability have not yet been internalized. Big data projects even at pilot
scales may not be available currently. In the present situation, decision
makers are not clear about capability of big data, costs, benefits,
applicability or the perils if any of not adopting big data.
Conclusion:
It is apparent
that the era of Big Data is already here and its impact is being felt on all
aspects of modern day life. The challenges at all stages of the analysis include
scaling, heterogeneity, lack of structure, privacy, error handling, timeliness,
origins, and visualization. Big data holds enormous potential to make the
operations of armed forces more efficient across the entire spectrum of their
activity. The research and development necessary for the analysis of big data
is not restricted to a single discipline, and requires an interdisciplinary
approach. Computer scientists need to tackle issues pertaining to inferences,
statisticians have to deal with algorithms, scalability and near real time
decision making. Involvement of mathematicians, visualizers, social scientists,
psychologists, domain experts and most important of all the final users is
paramount for optimal utilization of big data analytics. The involvement and
active participation of national agencies, private sector, public sector, and
armed forces would ensure full exploitation of the potential of big data for
the country.
The need today
is to start feasibility studies and research programs in select fields in order
of desired priorities, followed by pilot studies and thereafter adapting COTS
hardware and available big data analytic software suites.
“Without big
data, you are blind and deaf and in the middle of a freeway.”
- Geoffrey Moore,
author and consultant.
About The Author:
The author RADM Dr. S. Kulshrestha (Retd.),
INDIAN NAVY, holds expertise in quality assurance of naval armament and
ammunition. He is an alumnus of the National Defence College and a PhD from Jawaharlal Nehru University. He superannuated
from the post of Director General Naval Armament Inspection in 2011. He is
unaffiliated and writes in defence journals on issues related to Armament
technology and indigenisation.
Cite This Article:
Kulshrestha, Sanatan. "FEATURED | Big Data in
Military Information & Intelligence" IndraStra Global 02, no. 01 (2016):
0067.
http://www.indrastra.com/2016/01/FEATURED-Big-Data-in-Military-Info-and-Intel-002-01-2016-0067.html | ISSN 2381-3652 | https://dx.doi.org/10.6084/m9.figshare.2066640
http://www.indrastra.com/2016/01/FEATURED-Big-Data-in-Military-Info-and-Intel-002-01-2016-0067.html | ISSN 2381-3652 | https://dx.doi.org/10.6084/m9.figshare.2066640