Developing and changing environmental conditions, a globalized world with the removal of borders, different marketing and R & D(research and development) methods reveal the importance of “information”, not “data”, in a way that increases every day. The expansion and facilitation of the internet makes it difficult for R & D teams to access “information”. Research conducted using search engines on the internet often results in a different way than desired. Information can be obtained by interpreting and analyzing the data obtained as a result of a medical research. The ability of a large retailer to identify customer trends from billing information and produce marketing tactics accordingly will allow it to outperform its competitors. If attention is paid to the examples given, it will be seen that the process of turning “data”into “information”is emphasized. Analyzing information by some methods and interpreting the results through the eyes of an expert, the process of making future predictions from past data can be referred to as data mining.
What Is Data Mining?
It is necessary to make a simple definition, data mining is the work of accessing information from large-scale data, mining information. Or, in a sense, looking for correlations using a computer program that can allow us to predict the future from large stacks of data. Since a data mining statement can be an incorrectly used statement, other equivalent uses have also been introduced into the literature. Data mining from databases, knowledge extraction, data/pattern anaysis, data archaeology, etc.
The most popular use among them is Knowledge Discovery From Databases (KDD). Alternatively, data mining is actually accepted as part of the information discovery process. These steps:
1 -) data cleanup (removing noisy and inconsistent data)
2- ) Data Integration (ability to combine many data sources)
3 -) data selection (determine the data related to the analysis to be performed )
4- ) data conversion (performing the conversion of data from data mining technique to usable)
5- ) Data Mining (applying smart methods to capture data patterns)
6- ) pattern evaluation (identify interesting patterns that represent information obtained according to some measurements)
7 -) Information Presentation (performing the presentation of the obtained information that has been mined to the user),
The data mining step interacts with the user and knowledge base. Interesting patterns are displayed to the user, and beyond that, they can also be saved in the knowledge base if desired. Accordingly, the data mining process continues until hidden patterns are found. A data mining system has the following basic components:
* Database, data warehouse and other storage techniques
* Database or data warehouse server
* Knowledge Base
* Data Mining Engine
* Pattern Evaluation
* User Interface
Data mining is the extraction of implicit, not very clear, previously unknown but potentially useful information from the data at hand. This includes a certain number of technical approaches such as clustering, data summarization, analysis of changes, and detection of deviations.
In other words, in data mining data patterns, relationships, changes, irregularities, rules and statistically important structures of semi-automatic exploration.
Basically, data mining is about the use of patterns or order between data sets, data analysis, and software techniques. The computer is responsible for determining the relationship, rules, and properties between the data. The goal is to detect previously undetected data patterns.
It may be possible to see data mining as a series of statistical methods. But data mining differs from traditional statistics in several directions. In data mining, the goal is to extract qualitative models that can be easily translated into logical rules or visual presentations. In this context, data mining is human – centered, and sometimes the human-computer interface is combined. The data mining field also includes fundamentals such as statistics, machine knowledge, databases, and high performance processing.
As for data mining, the word broad in large data refers to data sets that are too large to fit into the memory of a single workstation. High-volume data means too much data to fit on disks on a single workstation or on a group of workstations. Distributed data, on the other hand, describes data located in different geographical locations.