Categories
NSEC Blog Post

Data mining = Large data set + Algorithm?

Blog Written by Sumita Dasgupta

Email : dasguptasumita2005@gmail.com

(In remembrance of Algorithms + Data Structures = Programs: one of the most influential computer science book written by Niklaus Wirth)

Data mining is today’s buzz word. We are accustomed to hear and believe that in business scenario certain rule is considered true because it has come out from some data mining application as a newly found knowledge.  Data mining is a powerful new technology with great potential to discover information using large set of data that queries and reports alone can’t reveal.  Can the Data Mining technology be described, at least conceptually, as application of efficient algorithm in the large data set?

“Water, water, everywhere,

Nor any drop to drink.”

The Rime of the Ancient Mariner : Samuel Taylor Coleridge

The drop in price of data storage has given us a tremendous resource: The amount of raw data stored in world wide databases is exploding. From trillions of point-of-sale transactions and credit card purchases to pixel-by-pixel images of galaxies, databases are now measured in terabytes and petabytes. (One terabyte = one trillion bytes. A petabyte is equivalent to 1 quadrillion bytes!). These huge sets of data are stored in Data Warehouses. Data Warehouses are used to consolidate data located in distributed databases. As a warehouse stores large quantities of data by specific categories, so it can be more easily retrieved, interpreted, and sorted by users. But it is only the easier part of the story as merely storing data in a data warehouse contributes to only a little good. The huge data stored is so mindboggling that it is beyond the comprehension of human mind alone to find any useful knowledge or information from it.

“Here comes the sun

And I say, it’s all right”

Here Comes the Sun: The Beatles

Data Mining  or Knowledge Discovery from Data ( KDD) is the algorithm dominated process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. The results comes from the algorithm (data mining tools) predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. In all basic steps of data mining where powerful algorithms are used data cleaning (removing noise and inconsistent data), data integration (multiple data sources are combined), data selection (relevant data for the analysis task are retrieved from the database), data transformation (data are transformed and consolidated into forms appropriate for mining by performing summary or aggregation operations), data mining ( The essential process where intelligent methods are applied to extract data patterns), pattern evaluation ( identify the truly interesting patterns representing knowledge based on interestingness measures), Knowledge presentation (visualization and knowledge representation to present mined knowledge to users). All the technologies like Statistical Analysis, Artificial Intelligence, Machine Learning, Visualization, Pattern Recognition and High Speed Computing which borrowed their hands to data mining are algorithmic in nature.

“There are more things in heaven and earth, Horatio,

Than are dreamt of in your philosophy.”

Hamlet (1.5.167-8) : William Shakespeare

The user plays an important role in the data mining process.  In Interactive mining a user first sample a set of data, explore general characteristics of the data and estimate potential mining results. It enables users to dynamically change the focus of a search through the data and knowledge space interactively, dynamically exploring organized data space while mining.  These areas are where the human domain knowledge and expertise is still needed.  Similarly background knowledge, constraints, rules, and other information regarding the domain under study is required to be incorporated into the knowledge discovery process. Such knowledge can be used for pattern evaluation as well as to guide the search toward interesting patterns. As because the human role is somewhere actively needed in data mining, tomorrow’s advancement in Artificial intelligence may give birth to new technologies which may become an integral part of data mining in future.

Categories
NSEC Blog Post

Welcome to NSEC-Blog!

Technical & Social Blog writing by Faculty Members, Administration Staffs and Students.

NSEC-Blog Editors:

CHIRA RANJAN DATTA E-mail: crdatta@gmail.com Tel: (m)7278023997 ...
Prof. C R Datta
Head
Deptt. of Electronics & communication Engineering
Netaji Subhash Engineering College
Techno City, Panchpota
Garia, Kolkata-700152, WB, India.
Tel:033-24361285
E-mail: crdatta@gmail.com
DR. ANIRBAN KUNDU

Associate Professor, Information Technology Department
eMail Id: anirban.kundu@nsec.ac.in