FEATURED | Big Data in Military Information & Intelligence
IndraStra Open Journal Systems
IndraStra Global

FEATURED | Big Data in Military Information & Intelligence

By  Rear Admiral Dr. S. Kulshrestha (retd.), Indian Navy

"Data really powers everything that we do." - Jeff Weiner, Chief Executive at LinkedIn

FEATURED | Big Data in Military Information & Intelligence

In 1997 two scientists of NASA, Michael Cox and David Ellsworth used the term Big Data in their paper "Application-controlled demand paging for out-of-core visualization". While spelling out the problem they had with computer graphics they stated that "….provides an interesting challenge for computer systems: data sets are generally quite large, taxing the capacities of main memory, local disk, and even remote disk. We call this the problem of big data."

Big Data has been traditionally associated with issues of data analysis and data storage. Gartner report proposed a definition encompassing 3V’s: Volume, Velocity, and Variety. The report commented upon the increasing size of data, the increasing rate at which it is produced, and the increasing range of formats and representations employed. This definition has since been expanded to include a fourth V: Veracity.

Oracle in its definition states that big data is the derivation of value from traditional relational database driven business decision-making, augmented with new sources of unstructured data. By new sources, Oracle implies social media, blogs, image data, sensor networks, and data forms, which vary in size, structure, format etc. Thus, big data is the inclusion of additional data sources to augment existing operations. However, like Gartner, Oracle also does not quantify the term big data.

Intel has connected big data to organizations “generating a median of 300 tera-bytes (TB) of data weekly”. Intel has described big data by quantifying the experiences of its business partners. Intel contends that analytics commonly deal with business transactions. As per Microsoft: “Big data is the term increasingly used to describe the process of applying serious computing power - the latest in machine learning and artificial intelligence - to seriously massive and often highly complex sets of information”.

From the above definitions, it can be inferred that that big data is intrinsically related to data analytics and there are a number of related technologies like NoSQL and Apache Hadoop. In addition, it is clear that there are a number of organizations, specifically industrial organizations that are linked with big data. A contrary definition provided by The Method for an Integrated Knowledge Environment (MIKE2.0) project, suggests: "Big Data can be very small and not all large datasets are big". Thus, it is a high degree of permutations and interactions within a dataset, which defines big data. Ward and Barker have provided a comprehensive definition that- Big data is a term describing the storage and analysis of large and or complex data sets using a series of techniques including, but not limited to NoSQL, MapReduce and machine learning.

Big Data Environment

To give a feel of data generated every day it would suffice to state that in 2014, each day approximately 294 bn emails were sent, 6 bn Google Searches were carried out, 3.5 bn Facebook messages were posted and 40 mn Tweets were shared. With the likely advent of internet of things, data generation is set to grow exponentially when appliances, vehicles and wearable technologies start to communicate with each other. The variety and complexity of sources and formats of data continue to expand. Some of the sources are internet, social media, mobiles, mobile applications; national, state & local databases; GPS & commercial databases that collect individual data from commercial & personal transactions & public records; surveys; geo-spatial data; and scanned documents.

One thing is clear that big data pertains to large-scale collection of data due to reduction in costs of data collection and storage as well as surge in sources of data like cameras, sensors, and geo-spatial location technologies. The ‘internet of things’ (from web-enabled appliances, wearable technology, and advanced sensors to monitor everything from vital signs to energy use to a jogger’s running speed) will create exponential surge in data demand and its high performance processing at rates unseen so far. Both types of data collection viz in analog form (images from cameras, mobiles, video recorders, wearable devices, which can be converted to digital) and digital form (emails, internet browsing and capture, location devices) are currently available in various formats and require data fusion. The collection and processing is demanding speeds, which are near real time, thus pushing data analysis to its current limits. For example mapping services, medical care and car operations require near immediate responses for them to be safe and effective. Technologies for handling of big data as well as their management are witnessing an unprecedented demand. Commercial and customer interests have enabled a merger of information and industrial economies.

A complete perspective in to big data requires pinning down of a few important characteristics peculiar to big data. Since big data originates from a tangible source, a large part of it may be of no value. Further data in its raw form may not be amenable to analysis. An information retrieving mechanism is required to extract relevant information and convert it into a structured form before it can be analyzed. This is a demanding technology as thoroughness and accuracy is required to be ensured at this stage. Data analysis is not simply identification, location, or citing of data, it is an interdisciplinary endeavor involving pure mathematics, statistics, computer science etc. For data to be of value to a decision maker it has to be made understandable and interactive in near real time. The software needs designing in a manner that it is user friendly since the user may not be having an in depth understanding of big data systems and algorithms. Predictive analytics is an important area of big data mining that deals with extracting information from data and using it to predict trends and behavior patterns. Fundamentally, predictive analytics depends upon capturing relationships between explanatory variables & the predicted variables from past occurrences, and exploiting them to predict the unknown. Predictive analytics tools have become sophisticated to dissect data problems and present findings using simple charts, graphs, and numbers that indicate the likelihood of possible outcomes. The accuracy and usability of results however, depends on the level of data analysis and the quality of assumptions.

Major technological advances include Cluster Computer Systems, which comprise large numbers of computers connected by high speed LANs for data storage, data organization, analysis, and query response. The software design of clusters enables use of COTS hardware for enhanced efficiency, reliability, and scalability. The software algorithms build statistical models from huge data sets by utilizing interdisciplinary approach, which makes it possible to draw inferences for further decision making. Some note worthy software includes Apache S4, which processes continuous data streams. S4 applications are designed combining streams and processing elements in real time. Twitter’s Storm also uses a similar approach. Hadoop Map Reduce is a programming model and software framework for writing applications. It enables quick processing of big data on large clusters of computer nodes in parallel. The input dataset is divided into independent subsets that are processed in parallel by map tasks. The final result is obtained by reduce tasks which use the output of the map tasks.

Military Information and Intelligence

In National Security domain, the volume of data has expanded exponentially with the internet, especially in areas affecting national security like counter-terrorism, network security, and counter-proliferation. Rapid influx of unstructured data relating to national security issues as cyber defense requires both real time and off line analysis. The handling of this data needs quick advances from system architecture to innovations in statistics. Large scale and in depth querying of big data, visualization of big data in various forms like maps, graphs & time lines, and near real time analyses of streaming data are some of the essential requirements for national security issues.

The importance of information and intelligence for the military is brought out from the fact that it is fundamental to planning of any military operation in peace or war. Reconnaissance and surveillance tasks have been carried out by militaries since time immemorial prior to mounting any assault on the enemy. The success of the mission depends upon the correct analysis of the available information during the planning stage. The collection of information, its analysis, and further dissemination is of prime importance for any military. Today the methods of collecting information have changed due to quick availability of combat information of high quality and reliability from a varied array of sensors and sources. A definition of Intelligence, Surveillance and Reconnaissance (ISR) components taken from US Army manuals lists Intelligence as firstly, the product resulting from the collection, processing, integration, analysis, evaluation, and interpretation of available information concerning foreign countries or areas; and secondly, as information and knowledge about an adversary obtained through observation, investigation, analysis, or understanding. Surveillance is defined as the systematic observation of aerospace, surface, or subsurface areas, places, persons, or things, by visual, aural, electronic, photographic, or other means. Reconnaissance is a mission undertaken to obtain by visual observation or other detection methods, information about the activities and resources of an enemy or potential enemy, or to secure data concerning the meteorological, hydro-graphic, or geographic characteristics of a particular area. An optimal ISR plan is formulated by integrating the ISR missions based upon capabilities of various assets available. The information is thereafter collated and analyzed for feeding in to the planning of operations.

The basic information that is required by any commander in today’s networked warfare is accurate position of his own units, location of the enemy and his reserves, location of supporting units and placing of other miscellaneous assets. With this knowledge, a commander today can effectively carry out his mission by optimally utilizing the available firepower and resources. Thus crucial to any mission is ‘situational awareness’, which comprises tasking, collection, processing, exploitation, and dissemination. Embedded in the ISR is Communication without which no mission can be accomplished. Digital communication systems, internet, and mobile devices have revolutionized the amount of data generation. The term Big Data refers to a whole gamut of information available from sensors, video imagery, mobile phone, signals intelligence, and electronic warfare interceptions to satellite images. Data is collected at unprecedented levels.

Rapid technological advances in sensor based, smart, and networked combat systems is pushing the military to adopt commercially available emerging technologies and adapt them for its use. This has led to a synergistic relationship with digital industry where in military may no longer develops its own hardware and software DeNovo, but harness and modify the "Commercial of the  Shelf" (COTS) items. The advent of big data is driving the armed forces to shift the integrated decision making support systems to architecture and analytics of big data. The financial crunch faced by militaries in leading countries implies even more dependence upon technology by the reduced manpower. This in turn has led other nations to adopt a wait and watch strategy by which they would go in for the best available solution adopted by leading armies. To understand and react to real time tactical situations commanders have to manage and control big data environment comprising of, historical or point-in-time data, transactional data, optimized data for inquiry, unpredictable pattern of data, and ad-hoc use of the system. The military has been collecting data at humongous levels since the induction of unmanned vehicles with sensors. The data heaps cannot be analyzed in traditional manner. They require dedicated data scientists and development of different software tools to exploit extracted information for mission planning. It is understood that in Afghanistan, the Defense Advanced Research Projects Agency (DARPA) had sent in data scientists and visualizers under a program called Nexus 7. They operated directly with military units and assisted commanders in solving specific operational challenges. In some cases surveillance and satellite data was fused to visualize traffic flow through road networks to detect and destroy improvised explosive devices.

Major issues faced by the military today involve availability of ever increasing volumes of sensor data from integral sources like Unmanned Aerial Vehicles (UAVs) and other national assets. A simple full day UAV mission can provide upwards of 10 tera-bytes of data of which only about 5% is analyzed and the rest stored. Analysts are restricted by the download speeds of data depending upon their locations. Untagged data leads to downloading of similar data from other sources by the analyst to firm up their conclusions. Many times the communication lines are shared or may not be continuously available thereby increasing delays in analysis. Providing a comprehensive situational awareness is dependent upon the accuracy and integration of data received from multiple types of sensors as well as intelligence sources. The screens and software tools do not have interoperability as of now. Due to security considerations, ISR data from different sources is stored in different locations with varying access levels, this leads to incomplete analysis. Single network domain providing access to data at multiple levels of security classification is not yet available. Analysts currently spend only 20 percent of their time looking at correct data, whereas 80 percent of the time is spent looking for the correct data.

Some of the companies working in this field with the US military which provide a common operating picture or COP are given in succeeding examples.

Modus Operandi takes big data, infuses it with expert knowledge, and creates a common framework for easy identification of patterns. The data is embedded in to an underlying graphic structure and is amenable to complex queries. It can detect patterns and output different types of visualizations, like maps and time lines etc.

Image Attribute: Modus Operandi’s Wave-EF semantically enriches multi-source intelligence data through direct feeds or from net-centric services for fusion, mediation, metadata tagging, alerting and dissemination within applications and services. 

Source: Semantic Enrichment and Fusion Of Multi-Intelligence Data by Dr. Richard D. Hull, Don Jenkins, Alan McCutchen, Modus Operandi, Inc. (Download)

Palantir Technologies, is known for its software Palantir Gotham that is used by counter-terrorism analysts at offices in the United States Intelligence Community and United States Department of Defense, fraud investigators at the Recovery Accountability and Transparency Board, and cyber analysts at Information Warfare Monitor (responsible for the Ghost Net and the Shadow Network investigation).

Video Attribute: Railgun: Leveraging Palantir Gotham as a Command and Control Platform / Source: Palantir Technologies Official Youtube Channel

SAP’s HANA platform, provides a real-time analytics and applications platform for real-time big data that offers varying layers of security. It offers predictive, forecasting and calculation solutions and stitches together maintenance failure codes and document notes.

 Image Attribute: The combination of LuciadLightspeed and SAP HANA provides a visual analysis of violent events in Africa over the last 20 years.

Image Attribute: The combination of LuciadLightspeed and SAP HANA provides a visual analysis of violent events in Africa over the last 20 years.

To tackle the problem and analyze data in real time, Oracle has created a new-engineered system to handle big data operations. The company brought together its hardware with Cloudera’s Hadoop, enabling patching of multiple layers of the big data architecture.

Image Attribute: Cloudera’s Hadoop integration with Oracle Advanced Analytics

Image Attribute: Cloudera’s Hadoop integration with Oracle Advanced Analytics

Unified Data Architecture of Teradata is a comprehensive big data solution, which aims to bring data needed for analytics across the entire organization into one place to create a single version of enterprise data. For example, capturing minute-by-minute maintenance data in the field including potential new sources of big data.

Image Attribute: TERADATA Unified Data Architecture Concept

Image Attribute: TERADATA Unified Data Architecture Concept

DigitalEdge by Leidos is a scalable, pre-integrated, flexible, and pluggable data management platform that allows rapid creation and management of near real-time big data applications. Leidos’s Scale2Insight (S2i) is a solution that supports large complex data environments with multiple disparate sensors collecting information on different parts of the data ecosystem.

Image Attribute: The modular architecture of DigitalEdge preserves investments made in analytic, visualization and work flow solutions. Its design provides the necessary functions to build a powerful, real-time big data analytic environment. / Source: DigitalEdge White-paper by Leidos (Download)

SYNTASA delivers analytical applications focused on the behavior of visitors to internal government web sites. The analytics, which are built on an open source big data platform, determine the behavioral trends of visitors in order to improve the use of information by government analysts.

Image Attribute: SYNTASA Analytics Services for Government / Source: website screen shot (link)

Indian Context

“Knowledge of the mechanics and details of big data analytics are not only important to exploit it but also to collapse the distances between policy, planning, and operations.”

- Lt Gen Anil Bhalla, DGDIA, and DCIDS (Int), 2015

The applicability of big data analytics in context of Indian defense forces is very much in line with the developed forces in the world. This has been borne during the deliberations of a national seminar organized by CLAWS in Feb 2015 titled Big Data–Applicability in the Defense Forces. The outcome of the seminar resulted in highlighting the requirement of big data analytics in the fields of intelligence, operations, logistics, mobilization, medical, human resources, cyber security and counter insurgency/ counter terrorism for the Indian armed forces. It was highlighted that in the arena of intelligence examples of inputs likely to be received and analyzed during peace are related to terrorism, proxy war, left wing extremism, conduct of training by opponents, deployment of forces, and sponsoring of non-state actors. The seminar brought out the need for development of algorithms to analyze millions of open-source documents generated and compare them with the gathered human intelligence. There is a requirement to acquire predictive capability to anticipate specific incidents and suggest measures by analyzing historical events.

However, fact remains that due to nascent nature of big data analytics its awareness is limited to a small number of involved agencies. The benefits of big data in operational scenario decision making while safe guarding accuracy and reliability have not yet been internalized. Big data projects even at pilot scales may not be available currently. In the present situation, decision makers are not clear about capability of big data, costs, benefits, applicability or the perils if any of not adopting big data.


It is apparent that the era of Big Data is already here and its impact is being felt on all aspects of modern day life. The challenges at all stages of the analysis include scaling, heterogeneity, lack of structure, privacy, error handling, timeliness, origins, and visualization. Big data holds enormous potential to make the operations of armed forces more efficient across the entire spectrum of their activity. The research and development necessary for the analysis of big data is not restricted to a single discipline, and requires an interdisciplinary approach. Computer scientists need to tackle issues pertaining to inferences, statisticians have to deal with algorithms, scalability and near real time decision making. Involvement of mathematicians, visualizers, social scientists, psychologists, domain experts and most important of all the final users is paramount for optimal utilization of big data analytics. The involvement and active participation of national agencies, private sector, public sector, and armed forces would ensure full exploitation of the potential of big data for the country.

The need today is to start feasibility studies and research programs in select fields in order of desired priorities, followed by pilot studies and thereafter adapting COTS hardware and available big data analytic software suites. 

“Without big data, you are blind and deaf and in the middle of a freeway.” 
- Geoffrey Moore, author and consultant. 

About The Author:

The author RADM Dr. S. Kulshrestha (Retd.), INDIAN NAVY, holds expertise in quality assurance of naval armament and ammunition. He is an alumnus of the National Defence College and a PhD from Jawaharlal Nehru University. He superannuated from the post of Director General Naval Armament Inspection in 2011. He is unaffiliated and writes in defence journals on issues related to Armament technology and indigenisation.

Cite This Article:

Kulshrestha, Sanatan. "FEATURED | Big Data in Military Information & Intelligence" IndraStra Global 02, no. 01 (2016): 0067. 
http://www.indrastra.com/2016/01/FEATURED-Big-Data-in-Military-Info-and-Intel-002-01-2016-0067.html  | ISSN 2381-3652 | https://dx.doi.org/10.6084/m9.figshare.2066640