IT | Big Data & Decision Making : The Concept of Statistical Service Engine

IT | Big Data & Decision Making : The Concept of Statistical Service Engine

By Rich C. Lee

“The Big Data” is a short hand term for advancing technology trends that open the door to a new approach to disclose the meaning behind the business activities and then based on the findings to make decisions with quality. Enterprise performance relies on the responsiveness and the quality of decision-making.

IT | Big Data & Decision Making : The Concept of Statistical Service Engine

The Need for Statistical Model based Decision Support System:

Decision-making is based on rationale of the knowledge which enterprise has possessed. Enterprise knowledge is accumulated and compiled by the information, and the information is generated from the raw data by business activities. The decision makers have to develop a feasible approach to acquire sufficient knowledge and on the situation where makes a confident enough decision to support business needs. Today’s good decisions are driven by reliable data. Business managers and professionals are increasingly required to justify decisions based on data. They need statistical model-based decision support systems. Enterprise strives on process improvement and variation reduction; it requires a disciplined, data-driven approach to reach business goals. Statistics helps enterprise describing in quantitative way about how well the processes are performing, transform raw data to valuable information, and later become useful knowledge.

Analyzing data, deriving information and accumulating knowledge is not handy in most enterprise’s information environment currently. First analysts must know where the desired data is, and ask IT professionals to retrieve that data, save it into files, store them under a shared folder over network, and then develop statistical models. If data is considerably large and fast-updated, it will make the statistical process more inconveniently. If other analysts concern just a portion of the data or the data is not fully covered, IT professionals must repeat these resource-exhausting tasks again for each of them respectively. Another drawback is the poor re-usability if the statistical procedures are not shared. In fact, enterprise knowledge does not just cover the information but also the processes of how information was derived. These processes ensure the reproducibility of knowledge that is a part of the enterprise intelligent properties. To reach this objective, a software application—the Business intelligence (BI)-is used to analyze enterprise data; it requires expertise to design and manage those analytical models, it is usually rather expensive and complicated requiring IT professionals and software vendor’s intensive assistance. Not like commercial BI software, GNUR is a flexible and extend-able language and environment for statistical computing and graphics; it has been widely used in many applications for years. The analysts who are planning to use GNU-R can find abundant resources over the Internet to ease their learning curves. Thus, GNU-R is more cost-effective as a statistical engine automating analytical processes than commercial BI software. It is worthy to develop a statistical service engine solution using R to improve enterprise knowledge generating and re-usability.

However, developing such statistical service engine solution to meet time-to-market objective within budget, it is important to know why most software projects failed:

1) Unrealistic or unarticulated project goals;
2) Inaccurate estimates of needed resources;
3) Badly defined system requirements;
4) Poor reporting of the project’s status;
5) Unmanaged risks;
6) Poor communication among customers, developers; and users
7) Use of immature technology;
8) Inability to handle the project’s complexity;
9) Sloppy development practices;
10) Poor project management;
11) Stakeholder politics;
12) Commercial pressures.

The third factor attracts the most concerns; mainly because during the software development it has not considered the non-functional requirements during the development. These non-functional requirements are to ensure the software capability does meet business goal with less Total Cost of Ownership (TCO).

The Drivers of Statistical Service Engine:

The Statistical Service Engine solution helps analysts offer their business’s performance perspectives to enterprise, and enhance their daily decision making by reusing these statistical inferences, in a more effective way. Although GNU-R has the ability to access the databases directly from analysts’ desktops, but such an approach might cause security breach if database schemas are disclosed, and potentially jeopardize the performance of databases to serve the designated business activities if databases were inappropriately accessed. On the other hand, the analysts expect the solution will eventually execute the submitted GNU-R scripts on behalf of them at most appropriate timing. There for the solution must be under a unified flexible reliable robust mechanism integrating the backend processes—running GNU-R scripts on distributed servers.

To adopt such a unified flexible reliable robust mechanism, the Enterprise Service Bus (ESB) is a proven approach to meet the requirements; it is an infrastructure which underpins a fully integrated and flexible end-to-end Service-Oriented Architecture (SOA). The ESB enables SOA by providing the connectivity layer between services. The ESB combines event-driven and service oriented approaches to simplify the integration of business units, bridging heterogeneous platforms and environments. The ESB acts as an intermediary layer to enable communication between different application processes. A service deployed onto an ESB can be triggered by a consumer or an event. It supports synchronous and a synchronous, facilitating interactions between one or many applications (One-to-One or Many-to-Many communications), this is a vital scheme to the statistical service engine solution.

To maximize the re-usability and the influence of statistical inferences to enterprise, the user-friendliness of the solution is very crucial to the acquisition and dissemination of business knowledge. The user-friendliness increases user’s perceived ease-of-use against a system. Accumulating knowledge is a complex and dynamic process; it needs continuously reshaping the appearance of the knowledge by enhancing the solution’s usability. Usability, a synonym of user-friendliness, is a core term in human-computer interaction. The platform portability is another important consideration of technology selection for the solution. Java is well-known for better portability and its portal technology—a proven approach of better usability—can enable users to dynamically construct and reconstruct Web applications of information convergence in run time to resolving urgent and unplanned business requirements.


To increase business competitiveness requires continuous innovation and operation excellence. Re-examining the data business activities and finding patterns and trends in various statistical perspectives help enterprise making rational decisions swiftly. Knowledge Management offers a collaborative platform to acquire, compile, disseminating, and reuse the knowledge to elicit creative and improve operation efficiency. A more intense use of knowledge management platform has both a direct and indirect (innovation-mediated) positive effects on enterprise performance. 

To make knowledge management platform success, the users’ perceived usefulness and user satisfaction is the key. Based on useful statistical results, it disclosed the implications of business competitiveness improvement will stimulate and inspire employees’ further finding by reusing these statistical procedures and the data. This reinforced process makes employees use knowledge management platform more intensively and help them making business decision more rational and swiftly. To realize this goal, a more convenient solution is called to help analysts retrieve data, reuse statistical procedures, and disseminate the findings easier. 

An Example: Statistical Job Portal

Image Attribute: An example of "Statistical Service Engine Solution" with respect to a "Job Portal"

Image Attribute: An example of "Statistical Service Engine Solution" with respect to a "Job Portal"

The Statistical Job Portal server provides a number of predefined statistical procedures and ad-hoc analysis to users. The Enterprise Service Bus server receives messages, the statistical job requests, from the Statistical Job Portal server and dispatches them accordingly. GNU-R Engine is a set of Blade servers executing GNU-R scripts designated by the messages. The script may retrieve data from the Database servers or the File Repository server. The application servers are user process engines populating source data on the Database servers or the File Repository server. The File Repository server also stores the statistical results expecting users to retrieve later. The statistical job request is in XML format covering the following fields shown in the below table. 

Table Attribute: Statistical Job Request

Table Attribute: Statistical Job Request


Keywords: Software Engineering; SOA; Business Analytics; Big Data; GNU-R

About The Author:

Rich C. Lee, System Technology Group, IBM, Taipei, Chinese Taipei; Department of Information Management, National Sun Yat-Sen University, Taipei, Chinese Taipei. 

Publication Details: 

This article is an extract from technical paper titled - "A Service Oriented Analytics Framework for Multi-Level Marketing Business", Journal of Software Engineering and Applications. Vol. 5 No. 8 (2012), Article ID: 21938, 9 pages DOI:10.4236/jsea.2012.58061. Download the Paper - LINK
    Blogger Comment
    Facebook Comment