Management for All: DATA MINING PROJECT

Wednesday, July 23, 2014

DATA MINING PROJECT

Step 1: Define Business Objectives- This step is similar to any information system project. First of all, determine whether a data mining solution is really needed. State the objectives. Are we looking to improve our direct marketing campaigns? Do we want to detect fraud in credit card usage? Are we looking for associations between products that sell together? In this step, define expectations. Express how the final results will be presented and used.

Step 2: Prepare Data- This step consists of data selection, preprocessing of data, and data transformation. Select the data to be extracted from the data warehouse. Use the business objectives to determine what data has to be selected. Include appropriate metadata about the selected data. Select the appropriate data mining technique(s) and algorithm(s). The mining algorithm has a bearing on data selection.

Unless the data is extracted from the data warehouse, when it is assumed that the data is already cleansed, pre-processing may be required to cleanse the data. Preprocessing could also involve enriching the selected data with external data. In the preprocessing sub-step, remove noisy data, that is, data blatantly out of range. Also ensure that there are no missing values.

Rubber Internal Mixer Nigerian politics read more opposition apc government sbobet SATTA MATKA Manage your finance with best USA cash advance lender, which provide online cash advance loans Long term and low fee. Call: +1 855 243 8701

Step 3: Perform Data Mining- Obviously, this is the crucial step. The knowledge discovery engine applies the selected algorithm to the prepared data. The output from this step is a set of relationships or patterns. However, this step and the next step of evaluation may be performed in an iterative manner. After an initial evaluation, there may be need to adjust the data and redo this step. The duration and intensity of this step depend on the type of data mining application. If the database is being segmented not too many iterations are needed. If a predictive model is being created, the models are repeatedly set up and tested with sample data before testing with the real database.

Step 4: Evaluate Results- The aim is to discover interesting patterns or relationships that help in the understanding of customers, products, profits, and markets. In the selected data, there are potentially many patterns or relationships. In this step, all the resulting patterns are examined, and a filtering mechanism is applied so as to select only the promising patterns for presentation and use.

Step 5: Present Discoveries- Presentation of patterns / associations discovered may be in the form of visual navigation, charts, graphs, or free-form texts. Presentation also includes storing of interesting discoveries in the knowledge base for repeated use.

Step 6: Ensure Usage of Discoveries- The goal of any data mining operation is to understand the business, discern new patterns and possibilities, and also turn this understanding into actions. This step is for using the results to create actionable items in the business. The results of the discovery are disseminated so that action can be taken to improve the business.

Selecting Data Mining Software Tools

Before we get into a detailed list of criteria for selecting data mining tools, let us make a few general but important observations about tool selection.

• The tool must be able to integrate well with the data warehouse environment by accepting data from the warehouse and be compatible with the overall metadata framework.

• The patterns and relationships discovered must be as accurate as possible. Discovering erratic patterns is more dangerous than not discovering any patterns at all.

• In most cases, an explanation for the working of the model and how the results were produced is required. The tool must be able to explain the rules and how patterns were discovered.

Let us now analyse a list of criteria for evaluating data mining tools. The list is by no means exhaustive, but it covers the essential points.

Data Access: The data mining tool must be able to access data sources including the data warehouse and quickly bring over the required datasets to its environment. On many occasions data from other sources may be needed to augment the data extracted from the data warehouse. The tool must be capable of reading other data sources and input formats.

Data Selection: While selecting and extracting data for mining, the tool must be able to perform its operations according to a variety of criteria. Selection abilities must include filtering out of unwanted data and deriving new data items from existing ones.

Sensitivity to Data Quality: Because of its importance, data quality is worth mentioning again. The tool must be able to recognize missing or incomplete data and compensate for the problem. The tool must also be able to produce error reports.

Data Visualization: Data mining techniques process substantial data volumes and produce a wide range of results. Inability to display results graphically and diagrammatically diminishes the value of the tool severely.

Extensibility: The tool architecture must be able to integrate with the data warehouse administration and other functions such as data extraction and metadata management.

Performance: The tool must provide consistent performance irrespective of the amount of data to be mined, the specific algorithm applied, the number of variables specified, and the level of accuracy demanded.

Scalability: Data mining needs to work with large volumes of data to discover meaningful and useful patterns and relationships. Therefore, ensure that the tool scales up to handle huge data volumes.

Openness: This is a desirable feature. Openness refers to being able to integrate with the environment and other types of tools. The ability of the tool to connect to external applications where users could gain access to data mining algorithms from the applications, is desirable. The tool must be able to share the output with desktop tools such as graphical displays, spreadsheets, and database utilities. The feature of openness must also include availability of the tool on leading server platforms.

Suite of Algorithms: A tool that provides several different algorithms rather than one that supports only a few data mining algorithms, is more advantageous.

Multi-technique Support: A data mining tool supporting more than one technique is worth consideration. The organization may not presently need a composite tool with many techniques, but a multi-technique tool opens up more possibilities. Moreover, many data mining analysts desire to cross-validate discovered patterns using several techniques.

Linkbar

Management for All

Subscribe through E-mail