Sponsors

Sponsors


Home

Association approaches address a class of problems typified by a market-basket analysis. Classic market-basket analysis treats the purchase of a number of items (for example, the contents of a shopping basket) as a single transaction. The goal is to find trends across large numbers of transactions that can be used to understand and exploit natural buying patterns. This information can be used to adjust inventories, modify floor or shelf layouts, or introduce targeted promotional activities to increase overall sales or move specific products. While these approaches had their origins in the retail industry, they can be applied equally well to services that develop targeted marketing campaigns or determine common (or uncommon) practices. In the financial sector, association approaches can be used to analyze customers' account portfolios and identify sets of financial services that people often purchase together. They may be used, for example, to create a service "bundle" as part of a promotional sales campaign.

Poll: Which of the following would you recommend as the best introductory book on data mining?
Data Mining: Concepts and Techniques - Han & Kamber
Data Preparation for Data Mining - Pyle
Introduction to Data Mining - Tan, Steinbach & Kumar
Principles of Data Mining - Hand, Mannila & Smyth
Machine Learning - Mitchell
The Elements of Statistical Learning - Hastie, Tibshirani & Friedman
Introduction to Business Data Mining - Olson & Shi
Predictive Data Mining: a practical guide - Weiss & Indurkhya
Other
Books are way too structured and expensive for me!
[View results]

Check out more information about these books here!
-->


Association approaches often express the resultant item affinities in terms of confidence-rated rules, such as, "80 percent of all transactions in which beer was purchased also included potato chips." Confidence thresholds can typically be set to eliminate all but the most common trends. The results of the association analysis (for example, the attributes involved in the rules themselves) may trigger additional analysis.

Associations are written as A |=> B, where A is called the antecedent or left-hand side (LHS), and B is called the consequent or right-hand side (RHS). For example, in the association rule “If people buy a hammer then they buy nails,” the antecedent is “buy a hammer” and the consequent is “buy nails.” It’s easy to determine the proportion of transactions that contain a particular item or item set: simply count them. The frequency with which a particular association (e.g., the item set “hammers and nails”) appears in the database is called its support or prevalence. If, say, 15 transactions out of 1,000 consist of “hammer and nails,” the support for this association would be 1.5%. A low level of support (say, one transaction out of a million) may indicate that the particular association isn’t very important or it may indicated the presence of bad data (e.g., “male and pregnant”).

To discover meaningful rules, however, we must also look at the relative frequency of occurrence of the items and their combinations. Given the occurrence of item A (the antecedent), how often does item B (the consequent) occur? That is, what is the conditional predictability of B, given A? Using the above example, this would mean asking “When people buy a hammer, how often do they also buy nails?” Another term for this conditional predictability is confidence. Confidence is calculated as a ratio: (frequency of A and B)/(frequency of A).

Association algorithms find these rules by doing the equivalent of sorting the data while counting occurrences so that they can calculate confidence and support. The efficiency with which they can do this is one of the differentiators among algorithms. This is especially important because of the combinatorial explosion that results in enormous numbers of rules, even for market baskets in the express lane. Some algorithms will create a database of rules, confidence factors, and support that can be queried (for example, “Show me all associations in which ice cream is the consequent, that have a confidence factor of over 80% and a support of 2% or more”).

 
Disclaimer
The content on this site is provided as information only and does not constitute an endorsement by the webmaster. It is your responsibility to check out suppliers thoroughly. Trademarks and Service Marks are the property of their respective companies. Note: If you think that a reference to  your work/site/tool should be added to this site or if you have any suggestions related to improvement of this site, please send an email to: admin@eruditionhome.com
This website is about data mining, data mining tutorial, data en language mining, data mining software, data mining tool, crm data mining, business data intelligence mining, data mining technique, application data mining, data mining web, data mining solution, data mining technology, data mining process, data mining warehouse, data definition mining, data mining science technology, data mining privacy, course data mining, data mining reason, data discovery knowledge mining, data data mining warehousing, data job mining, data introduction mining, data mining sas, data mining research, data mining news, concept data mining, data data mining warehouse, data mining text, data mining training, case data engineering in mining software study, consulting data mining, data decision mining thesis tree, data mining server tool, data knowledge management mining, data mining multimedia, data dmo mining sql, care data health mining, code data mining project, data mining olap, data define mining, article data mining, comparison data detection intrusion mining, data mining oracle, data mining pdf, data mining warehousing, data mining program, data mining services, application data mining statistical, association data mining, case data mining study, content data management mining, chennai data mining, data example mining, data it loc mining, data mining seminar, data government mining, audit data mining, classification data mining project report, data information mining, data mining technologies, company data mining, data mining resource, data disadvantage mining, data discovery journal knowledge mining, data marketing mining, data mining visual, data free mining software, career data mining, conference data mining, data mining model, article data data mining warehouse, benefit data mining, data faq mining, data library mining, data mining product, anova data mining, application data digital library mining, data data mining quality, data data mining reduction, data journal mining, analytic data kurt mining technologies.