IT | Benefits and Challenges of Data Mining in E-Commerce
IndraStra Open Journal Systems
IndraStra Global

IT | Benefits and Challenges of Data Mining in E-Commerce

By Mustapha Ismail, Mohammed Mansur Ibrahim, Zayyan Mahmoud Sanusi and Muesser Nat
Management Information Systems Department 
Cyprus International University, Haspolat, Lefkoşa via Mersin, Turkey

IT | Benefits and Challenges of Data Mining in E-Commerce

Data mining in e-commerce is a vital way of repositioning the e-commerce company for supporting the enterprise with the required information concerning the business. Recently, most companies adopt e-commerce and being in possession of big data in their data repositories. The only way to get the most out of this data is to mine it to increase decision making or to enable business intelligence. In e-commerce data mining there are three important processes that data must pass before turning into knowledge or application.

The first and easier process of data mining is data pre-processing and it is actually a step before the data mining, whereby, the data is cleaned by removing the unwanted data that has no relation with the required analysis. Hence, the process will boost the performance of the entire data mining process and the accuracy of the data will also be high and the time needed for the actual mining will be minimize reasonably. Usually this happens if company already have an existing target data warehouse, but if not then the process will consume at least 80% of the selection, cleaning and transformation of data termed as pre-processing.

Mining pattern is the second step and it actually refers to techniques or approach used to develop a recommendation rules, or developing a model out of a large data set. It can also be referred as techniques or algorithms of data mining. The most common patterns used in e-commerce are prediction, clustering and association rules. The purpose of third step which is pattern analysis is to verify and shade more light on the discovered model in order to give a clear path for the startup up for applying of the data mining result. The analysis lay much emphasis on the statistics and rules of the pattern used, by observing them after multiple users have accessed them.

However all this has to do with how iterative the overall process is, and the interpretation of visual information you get at each sub step. Therefore, in general data mining process iterates from the following five basic steps, which are:

• Data selection: This step is all about identifying the kind of data to be mined, the goals for it and the necessary tool to enable the process. At the end of it the right input attributes and output information in order to represent the task are chosen.

• Data transformation: This step is all about organizing the data based on the requirements by removing noise, converting one type of data to another, normalizing the data if there is need to, and also defining the strategy to handle the missing data.

• Data mining step per se: Having mined the transformed data using any of the techniques to extract pattern of interest, the miner can also make data mining method by performing the proceeding steps correctly.

• Result interpretation and validation: For better understanding of data and it synthesized knowledge together with its validity span, the robustness is check by data mining application test. The information retrieved can also be evaluated by comparing it with the earlier expertise in the application domain.

• Incorporation of the discovered knowledge: This has to do with presenting the result of discovered knowledge to decision maker so that it is possible to compare or check/resolve for conflict with an earlier extracted knowledge where a new discovered pattern can be applied


Application of data mining in e-commerce refers to possible areas in the field of e-commerce where data mining can be utilised for the purpose of enhancements in business. As we all know while visiting an online store for shopping, users normally leave behind certain facts that companies can store in their database. These facts represent unstructured or structured data that can be mined to provide a competitive advantage to the company. The following areas are where data mining can be applied in the field of e-commerce for the benefits of companies:

1) Customer Profiling- This is also known as customer-oriented strategy in e-commerce. This allows companies to use business intelligence through the mining of customer’s data to plan their business activities and operations as well as develop new research on products or services for prosperous e-commerce. Classifying the customers of great purchasing potentially from the visiting data can help companies to lessen the sales cost.

Companies can use users’ browsing data to identify whether they purposefully shopping or just browsing or buying something they are familiar with or something new. This helps companies to plan and improve their infrastructure.

2) Personalization of Service- Personalization is the act to provide contents and services geared to individuals on the basis of information of their needs and behavior. Data mining research related to personalization has focused mostly on recommender systems and related subjects such as collaborative filtering. Recommender systems have been explored intensively in the data mining community. These systems can be divided into three groups: Content-based, social data mining and collaborative filtering. These systems are cultured and learned from explicit or implicit feedback of users and are usually represented as the user profile. Social data mining, in considering the source of data that are created by the group of individuals as part of their daily activities, can be important source of important information for companies. Contrarily, personalization can be achieved by the aid of collaborative filtering, where users are matched with particular interest and in the same vein the preferences of these users to make recommendations.

3) Basket Analysis - Every shopper’s basket has a story to tell and market basket analysis (MBA) is a common retail, analytic and business intelligence tool that helps retailers to know their customers better. There are different ways to get the best out of market basket analysis and these include:• Identification of product affinities; tracking not so apparent product affinities and leveraging on them is the real challenge in retail. Walmart customers purchasing Barbie dolls shows an affinity towards one of three candy bars, obscure connection such as this can be discovered with an advanced market basket analytics for planning more effective marketing efforts.

• Cross-sell and up-sell campaigns; these shows the products purchased together, so customers who purchase the printer can be persuaded to pick up high quality paper or premium cartridges.

• Planograms and product combos; are used for better inventory control based on product affinities, developing combo offers and design effective user friendly planograms in focusing on products that sells together.

Shoppers profile; in analyzing market basket with the aid of data mining over time to get a glimpse of who your shoppers really are, gaining insight to their ages, income range, buying habits, likes and dislikes, purchase preferences, levering this and giving the customer experience.

4) Sales Forecasting - Sales forecasting involves the aspect of the time an individual customer spend to buy an item and in this process trying to predict if the customer will buy again. This type of analysis can be used to determine a strategy of planned obsolescence or figure out complimentary products to sell. In sales forecasting, cash flow can be projected into three which include the pessimistic, optimistic and the realistic. This helps to have a plan on the adequate amount of capital available to endure the worst possible scenario that is if sales do not go actually as planned.

5) Merchandise Planning - Merchandise planning is useful for both online and offline retail companies. In the case of online business, merchandise planning will help to determine stocking options and the inventory warehousing, while in the case of offline companies, business that are looking to boost by adding stores can assess the required amount of merchandise they will be adequately needing by having a foresight at the exact layout of the current store. Using the right approach to merchandise planning will definitely lead to answers on what to do with:

• Pricing: the aspect of database mining will help determining the suited best price of products or services in the processes of revealing customer sensitivity.

• Deciding on products; data mining provides e-commerce businesses with the aspect of which products customers actually desire, which includes the aspect of intelligence on competitor’s merchandise.

• Balancing of stocks; in mining the retail database, it helps determine the right and specific amount of stocks needed i.e. not too much and not too less, throughout the business year and also during the buying seasons.

6) Market Segmentation - Customer segmentation is one of the best uses of data mining. From the lots of data gotten, it can be broken down into different and meaningful segments like income, age, gender, occupation of customers, and this can be used when either the companies are running email marketing campaigns or SEO strategies. The aspect of market segmentation can also help a company identify its own competitors. This provided information alone can help the retail company identify that the periodic respondents are usually not the only ones pointing the same customer money as the present company is.  

Segmenting the database of a retail company will improve the conversion rates as the company can focus their promotion on a close-fitted and highly wanted market. This also helps the retail company to understand the competitors that are involved in each and every segment in the process permitting the customization of products that will actually satisfy the target audience in a generic way.


Besides the benefits data mining provides challenges for e-commerce companies, which are as follows:

1) Spider Identification - As it is commonly known main aim of data mining is to convert data into useful knowledge. Main source of data for e-commerce companies is web pages. Therefore, it is critical for e-commerce companies to understand how search engines work to follow how quickly things happen, how they happen and when changes will show up in the search engines. Spiders are software programs that are sent out by the search engine to find new information. These spiders can also be called as bots or crawlers. It is a software program that search engine uses to request pages and download them, it comes as a surprise to some people, however what the search engine does is they use a link of an existing website to find a new website and request a copy of that page to download it to their server. This is what the search engines use to run the ranking algorithm against and that is what shows up in the search engine result page. Therefore, the challenge here is that the search engines need to download a correct copy of the website. E-commerce website needs to be readable and see able and the algorithm is applied to the search engines database. Tools are needed to have the mechanisms to enable them automatically remove unwanted data that will be transformed to information in order for data mining algorithm to provide reliable and sensible output.

2) Data Transformations - In this case data transformation pose a challenge for data mining tools. Today, the data needed to transform can only be gotten from two different sources, one of which an active and operational system for the data warehouse to be built and secondly it should include some activities that involves assigning new columns, binning data and also aggregating the data as well. In the first process, it is needed to be modified infrequently that is only when there is a change in the site and lastly the set of the transformed data gives a significantly great challenge in the data mining process.

3) Scalability of Data Mining Algorithms - With yahoo which has over 1.2 billion page views in a day with the presence of large amount of data, scalability arises with significant issues;• Due to the large amount of data size gathered from the website at a reasonable time, the data mining algorithm can handle or process it as much as it’s needed especially because of the scale nonlinearly.• The models that are generated tends to be too complicated for individuals to understand how it is interpreted.

4) Make Data Mining Models Comprehensible to Business Users - The results of data mining should be clearly understood by business users, from the merchandisers who are in charge of decision making to the creative designers that design the sites to marketers to spend advertising money. The challenge is to design and define extra model types and a strategic way to present them to business users, what regression models can we come up with and how can we present them? (Even linear regression is usually hard for business users to understand.) How can we present nearest-neighbour models, for example? How can we present the results of association rule algorithms without overwhelming users with tens of thousands of rules?

5) Support Slowly Changing Dimensions- The demographic aspect of visitors change, in that they may get married, there is an increase in salaries or income, the rapid growth of their children, needs which are the bases on which it is modelled changes. Thus, the products attributes also change, in terms of new choices may be available, the design and the way the products or service is packaged and also the increase or degrade of quality. These attribute that change over time are often known as “Slowly Changing Dimensions”. In this case the main challenge here is to keep track of those changes and in the same vein providing support for the identified change in the analysis.

6) Make Data Transformation and Model Building Accessible to Business Users - Having the ability to provide definite answers to questions by individual business users, this requires the aspects of data transformations but with the technical understanding of the tools used in the analysis. Many commercials report designers and also online analytical processing (OLAP) tools are basically hard to understand by business users. In this case, two preferred solutions are
  • Provision of templates, (e.g. online analytical processing cubes and recommended transformations for mining) for the expected questions and
  • Provision of the experts via consultation or even a service organization. This mentioned challenge basically is to find a way to enrich the business users to as to be able to analyze the information themselves without and hiccups.
Copyright © 2015 by authors and Scientific Research Publishing Inc.
This work is licensed under the Creative Commons Attribution International License (CC BY).