Data Mining

What is Data Mining ?

Meaning of Data Mining

The activity of analyzing various data from different aspects and presenting it into meaningful information which can be utilized to promote sales and revenues or reduce costs or both is referred to as data mining. It is also known as 'knowledge or data discovery'. The various analytical tools which are employed for the analysis of data are data mining software. A user can analyse the data from various perspectives, group them according to different classifications and identify various existing relationships with the help of these software. In technical terms, the process of identifying the various trends and correlations among the number of fields in huge relational database is known as data mining.

Definition of Data Mining

According to William J Frawley, Gregory Piatetsky-Shapiro and Christopher J Matheus :
"Data Mining, or Knowledge Discovery in Databases (KDD), is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. This encompasses a number of different technical approaches, such as clustering, data summarization, learning classification rules, finding dependency networks, analyzing changes, and detecting anomalies".

According to Marcel Holshemier & Arno Siebes (1994) :
"Data mining is the search for relationships and global patterns that exist in large databases but are 'hidden' among the vast amount of data, such as a relationship between patient data and their medical diagnosis. These relationships represent valuable knowledge about the database and the objects in the database and, if the database is a faithful mirror, of the real world registered by the database".

Analysis of data is the main concern of data mining along with the implementation of various software for determining the various trends and patterns in the available data. Determining various trends and patterns with the help of various underlying rules and characteristics of the data is the responsibility of a computer.

Data Mining

Major Components of Data Mining

The various components of a typical data mining system are described in figure and explained as below :

Diagram of data mining

1) Database, Data Warehouse, or Other Information Repository : 
It is constituted by the various data warehouses, individual or combination of databases, spreadsheets or various other types of information repositories. Data can be processed with the help of data integration or data cleaning techniques.

2) Database or Data Warehouse Server : 
The process of significant data fetching as per the data mining request of the user, is performed by data warehouse or database.

3) Knowledge Base : 
The various domain knowledge that are used to direct the search, analyse, the interesting trends are performed by knowledge base. Various concept hierarchies which are employed to arrange the various attributes or attribute's values into variety of abstraction levels are incorporated by this type of knowledge.

4) Data Mining Engine : 
This is very crucial for the data mining system and it contains a variety of operational modules for various activities such as association, cluster analysis, classification, deviation and evolution analysis and characterization.

5) Pattern Evaluation Module : 
Various interesting methods are typically implemented by pattern evaluation module and it has a direct interaction with the data mining modules so that some interesting trends and patterns can be extracted. For filtering the identified trends, various interesting thresholds can be utilized. In order to control the search only up to the interesting trends, it is very important to promote the analysis of trends interesting-ness deep into the data mining process.

6) Graphical User Interface : 
An effective interaction between the data mining system and user is facilitated by Graphical User Interface (GUI). By specifying certain data mining task or query, facilitating numerous important information to narrow down the search and examine various data mining depending upon the intermediate data mining, users can interact with the system through GUI. With the help graphical user interface, users are facilitated for analysing mined trends, visualizing various trends in different forms, browsing data warehouse schemes, data structures and database.

Need/Role of Data Mining in Business

The various reasons for which the process of data mining is essential for many organisations are explained as below : 

1) Operational : 
The various operations of a business organisation can be performed without any hindrance with the help of data mining. This can be done by correcting the various mistakes which are identified along with monitoring the overall operations activities. A high level of expertise and productivity can be ensured with the help of information derived from this process.

2) Decisional : 
Depending upon the real data and historical data, various critical decisions can be made by the managers with the help of data mining. Both the long terms objectives and short terms modifications can be accomplished by using various input data from the customers such as geographical or sales data.

3) Informational : 
The various information required by different individuals, can be facilitated in the various customised formats exactly at the time when it is really required. 
For example, office locations, company profile, training materials, organisational structure, service information, company profiles and organisational policies are such types of information.

4) Specific Applications : 
What are the various applications of data mining? Data mining can be implemented as "a model for forecasting the consumer behaviour (For example, the probability of satisfaction of customers) depending upon the past data related to the communication with certain organisation". This can be proved as a sure advantage to determine the chances of a customer to interact with the business organisation so that various modifications can be implemented.

Advantages of Data Mining 

Various Importance of data mining are described below :

1) Automated Forecasting of Trends and Behaviors :
The act of deriving the forecasted information in a huge database can be programmed with the help of data mining. The answers of many questions can be found quickly from data itself which otherwise requires a rigorous hand-on analysis. 

2) Automated Determination of Earlier Unknown Trends : 
Previously unknown trends can be derived by various data mining tools through the entire database in a single stage. 

3) Extensive Depth and Breadth of Database : 
There can be numerous rows and columns in a certain database. While performing the hands-on analysis due to limited availability of time, the number of variables which are analysed by the analyst, must be controlled. However, many other information and patterns can be hidden in data which are removed as they do not appear to be significant. 
The in-depth analysis of a database by users can be facilitated by the high performance data mining methods without choosing a variable subset. As limited errors and variance are obtained from the data mining database, they include huge samples (higher number of rows) and users are facilitated to conclude from vital yet small population segments.

Disadvantages of Data Mining

Various Limitations/disadvantages of data mining are described below :

1) Privacy : 
There has been a lot of discussion about the privacy in the country in recent time. This issue of privacy has become very critical due to rapid growth and coverage of internet. Privacy is the main issue in the online shopping. The customers are sensitive towards the unauthorized access of personal information and utilization of this vital personal information for creating some harm to them. Customers can also be hampered by selling the personal information as customers are not aware about the application of their personal information by other organisation.

2) Security : 
However a large amount of information about certain customers is available online despite that there are many flaws in the security of this vital personal information. For example, the vital information such as address, social security number, account number and payment history of almost 13000 customers of Ford Motor Credit company was hacked by the hackers recently who breached the security of database of Experian credit reporting agency. The willingness of sharing and disclosing of personal information from the business organisations are quite evident from this example but these organisations are neglecting the security of these information. Theft identification can be proved as the biggest issue due to the availability of huge information.

3) Misuse of Information/Inaccurate Information : 
Various marketing efforts require the determination of trends with the help of some ethical measures or through data mining which can be misused. Various unethical organisations can try to misuse the information of various individuals which are derived from the process of data mining. The accuracy of data mining is not 100 percent thus some fallout's can be resulted due to some incorrect information obtained from data mining.

Application of Data Mining

There are a variety of fields in which the data mining technique is implemented. Some of the areas are described below :

1) Retail/Marketing :
  • Determining the buying behaviors from the customers.
  • Determining the relationship among various demographic factors of a customer. 
  • Estimating the success of various marketing campaigns. 
  • Market basket analysis.

2) Banking :
  • Identifying the fake use of credit cards. 
  • Recognizing the loyalty of customers.
  • Estimating the likelihood of changing the credit card relationship of the customers. 
  • Predicting the spending of customers through credit card. 
  • Determining the hidden relation between various financial factors.
  • From past market data, determining the stock trading principles.

3) Insurance and Health Care :
  • Determining the various medical processes which are claimed simultaneously i.e. claim analysis.
  • Estimating the purchase of new policies by the customers. 
  • Analyzing the risky customer behavior trends.
  • Determining the fake practices.

4) Transportation :
  • Analysing the schedule of distribution among various outlets. 
  • Loading trend analysis.

5) Medicine :
  • For predicting the office visits, determining the patient behavior characteristics. 
  • For various diseases determining the effective medical therapies.