Showing posts with label Data warehouse and Data mining. Show all posts

87 important data warehouse and data mining VIVA Questions.

Following are top 101 data ware house and data mining VIVA questions and answers
  1. What is data warehouse?

    • A data warehouse is a electronic storage of an Organization's historical data for the purpose of reporting, analysis and data mining or knowledge discovery.

  2. What is the benefits of data warehouse?

    •  A data warehouse helps to integrate data and store them historically so that we can analyze different aspects of business including, performance analysis, trend, prediction etc. over a given time frame and use the result of our analysis to improve the efficiency of business processes.

  3. What is the difference between OLTP and OLAP?

    • OLTP is the transaction system that collects business data. Whereas OLAP is the reporting and analysis system on that data.
      OLTP systems are optimized for INSERT, UPDATE operations and therefore highly normalized. On the other hand, OLAP systems are deliberately denormalized for fast data retrieval through SELECT operations.

  4. What is data mart?

    • Data marts are generally designed for a single subject area. An organization may have data pertaining to different departments like Finance, HR, Marketting etc. stored in data warehouse and each department may have separate data marts. These data marts can be built on top of the data warehouse.

  5. What is dimension?

    • A dimension is something that qualifies a quantity (measure).
      For an example, consider this: If I just say… “20kg”, it does not mean anything. But if I say, "20kg of Rice (Product) is sold to Ramesh (customer) on 5th April (date)", then that gives a meaningful sense. These product, customer and dates are some dimension that qualified the measure - 20kg.
      Dimensions are mutually independent. Technically speaking, a dimension is a data element that categorizes each item in a data set into non-overlapping regions.

  6. What is Fact?

    • A fact is something that is quantifiable (Or measurable). Facts are typically (but not always) numerical values that can be aggregated.

  7. Briefly state different between data ware house & data mart?

    • Dataware house is made up of many datamarts. DWH contain many subject areas. but data mart focuses on one subject area generally. e.g. If there will be DHW of bank then there can be one data mart for accounts, one for Loans etc. This is high level definitions. Metadata is data about data. e.g. if in data mart we are receving any file. then metadata will contain information like how many columns, file is fix width/elimted, ordering of fileds, dataypes of field etc...

  8. What is the difference between dependent data warehouse and independent data warehouse?

    • There is a third type of Datamart called Hybrid. The Hybrid datamart having source data from Operational systems or external files and central Datawarehouse as well. I will definitely check for Dependent and Independent Datawarehouses and update.

  9. What are the storage models of OLAP?

    •  ROLAP, MOLAP and HOLAP

  10. What are CUBES? 

    • A data cube stores data in a summarized version which helps in a faster analysis of data. The data is stored in such a way that it allows reporting easily.
    • E.g. using a data cube A user may want to analyze weekly, monthly performance of an employee. Here, month and week could be considered as the dimensions of the cube. 

  11. What is MODEL in Data mining world? 

    • Models in Data mining help the different algorithms in decision making or pattern matching. The second stage of data mining involves considering various models and choosing the best one based on their predictive performance.

  12. Explain how to mine an OLAP cube.

    • A data mining extension can be used to slice the data the source cube in the order as discovered by data mining. When a cube is mined the case table is a dimension.

  13. Explain how to use DMX-the data mining query language.

    • Data mining extension is based on the syntax of SQL. It is based on relational concepts and mainly used to create and manage the data mining models. DMX comprises of two types of statements: Data definition and Data manipulation. Data definition is used to define or create new models, structures. 

  14. Define Rollup and cube.

    • Custom rollup operators provide a simple way of controlling the process of rolling up a member to its parents values.The rollup uses the contents of the column as custom rollup operator for each member and is used to evaluate the value of the member’s parents.
      If a cube has multiple custom rollup formulas and custom rollup members, then the formulas are resolved in the order in which the dimensions have been added to the cube.

  15. Differentiate between Data Mining and Data warehousing.

    • Data warehousing is merely extracting data from different sources, cleaning the data and storing it in the warehouse. Where as data mining aims to examine or explore the data using queries. These queries can be fired on the data warehouse. Explore the data in data mining helps in reporting, planning strategies, finding meaningful patterns etc.
      E.g. a data warehouse of a company stores all the relevant information of projects and employees. Using Data mining, one can use this data to generate different reports like profits generated etc.

  16. What is Discrete and Continuous data in Data mining world?

    • Discreet data can be considered as defined or finite data. E.g. Mobile numbers, gender. Continuous data can be considered as data which changes continuously and in an ordered fashion. E.g. age

  17. What is a Decision Tree Algorithm?

    • A decision tree is a tree in which every node is either a leaf node or a decision node. This tree takes an input an object and outputs some decision. All Paths from root node to the leaf node are reached by either using AND or OR or BOTH. The tree is constructed using the regularities of the data. The decision tree is not affected by Automatic Data Preparation.

  18. What is Naïve Bayes Algorithm?

    • Naïve Bayes Algorithm is used to generate mining models. These models help to identify relationships between input columns and the predictable columns. This algorithm can be used in the initial stage of exploration. The algorithm calculates the probability of every state of each input column given predictable columns possible states. After the model is made, the results can be used for exploration and making predictions.

  19. Explain clustering algorithm.

    • Clustering algorithm is used to group sets of data with similar characteristics also called as clusters. These clusters help in making faster decisions, and exploring data. The algorithm first identifies relationships in a dataset following which it generates a series of clusters based on the relationships. The process of creating clusters is iterative. The algorithm redefines the groupings to create clusters that better represent the data.

  20. Explain Association algorithm in Data mining?

    • Association algorithm is used for recommendation engine that is based on a market based analysis. This engine suggests products to customers based on what they bought earlier. The model is built on a dataset containing identifiers. These identifiers are both for individual cases and for the items that cases contain. These groups of items in a data set are called as an item set. The algorithm traverses a data set to find items that appear in a case. MINIMUM_SUPPORT parameter is used any associated items that appear into an item set.

  21. What are the goals of data mining?

    • Prediction, identification, classification and optimization

  22. Is data mining independent subject?

    • No, it is interdisciplinary subject. includes, database technology, visualization, machine learning, pattern recognition, algorithm etc.

  23. What are different types of database?

    • Relational database, data warehouse and transactional database.

  24. What are data mining functionality?

    • Mining frequent pattern, association rules, classification and prediction, clustering, evolution analysis and outlier Analise

  25. What are issues in data mining?

    • Issues in mining methodology, performance issues, user interactive issues, different source of data types issues etc.

  26. List some applications of data mining.

    • Agriculture, biological data analysis, call record analysis, DSS, Business intelligence system etc

  27. What do you mean by interesting pattern?

    • A pattern is said to be interesting if it is 1. easily understood by human 2. valid 3. potentially useful 4. novel

  28. Why do we pre-process the data?

    • To ensure the data quality. [accuracy, completeness, consistency, timeliness, believability, interpret-ability]

  29. What are the steps involved in data pre-processing?

    • Data cleaning, data integration, data reduction, data transformation.

  30. What is distributed data warehouse?

    • Distributed data warehouse shares data across multiple data repositories for the purpose of OLAP operation.

  31. Define virtual data warehouse.

    • A virtual data warehouse provides a compact view of the data inventory. It contains meta data and uses middle-ware to establish connection between different data sources.

  32. What is are different data warehouse model?

    • Enterprise data ware houst
    • Data marts
    • Virtual Data warehouse

  33. List few roles of data warehouse manager.

    • Creation of data marts, handling users, concurrency control, updation etc,

  34. What are different types of cuboids?

    • 0-D cuboids are called as apex cuboids
    • n-D cuboids are called base cuboids
    • Middle cuboids

  35. What are the forms of multidimensional model?

    • Star schema
    • Snow flake schema
    • Fact constellation Schema

  36. What are frequent pattern?

    • A set of items that appear frequently together in a transaction data set.
    • eg milk, bread, sugar

  37. What are the issues regarding classification and prediction?

    • Preparing data for classification and prediction
    • Comparing classification and prediction

  38. Define model over fitting.

    • A model that fits training data well can have generalization errors. Such situation is called as model over fitting.

  39. What are the methods to remove model over fitting?

    • Pruning [Pre-pruning and post pruning)
    • Constraint in the size of decision tree
    • Making stopping criteria more flexible

  40. What is regression?

    • Regression can be used to model the relationship between one or more independent and dependent variables.
    • Linear regression and non-linear regression

  41. Compare K-mean and K-mediods algorithm.

    • K-mediods is more robust than k-mean in presence of noise and outliers. K-Mediods can be computationally costly.

  42. What is K-nearest neighbor  algorithm?

    • It is one of the lazy learner algorithm used in classification. It finds the k-nearest neighbor of the point of interest.

  43. What is Baye's Theorem?

    • P(H/X) = P(X/H)* P(H)/P(X)

  44. What is concept Hierarchy?

    • It defines a sequence of mapping from a set of low level concepts to higher -level, more general concepts.

  45. What are the causes of model over fitting?

    • Due to presence of noise
    • Due to lack of representative samples
    • Due to multiple comparison procedure

  46. What is decision tree classifier?

    • A decision tree is an hierarchically based classifier which compares data with a range of properly selected features.

  47. If there are n dimensions, how many cuboids are there?

    • There would be 2^n cuboids.

  48. What is spatial data mining?

    • Spatial data mining is the  process of discovering interesting, useful, non-trivial patterns from large spatial datasets.

      Spatial Data Mining = Mining Spatial Data Sets (i.e. Data Mining + Geographic Information Systems)


  49. What is multimedia data mining?

    • Multimedia Data Mining is a subfield of data mining that deals with an extraction of implicit knowledge, multimedia data relationships, or other patterns not explicitly stored in multimedia databases

  50. What are different types of multimedia data?

    • image, video, audio

  51. What is text mining?

    • Text mining is the procedure of synthesizing information, by analyzing relations, patterns, and rules among textual data. These procedures contains text summarization, text categorization, and text clustering.

  52. List some application of text mining.

    • Customer profile analysis
    • patent analysis
    • Information dissemination
    • Company resource planning

  53. What do you mean by web content mining?

    • Web content mining refers to the discovery of useful information from Web contents, including text, images, audio, video, etc.                                 

  54. Define web structure mining and web usage mining.

    • Web structure mining studies the model underlying the link structures of the Web. It has been used for search engine result ranking and other Web applications. 

      Web usage mining focuses on using data mining techniques to analyze search logs to find interesting patterns. One of the main applications of Web usage mining is its use to learn user profiles.

  55. What is data warehouse?
    • A data warehouse is a electronic storage of an Organization's historical data for the purpose of reporting, analysis and data mining or knowledge discovery.

  56. What is data warehouse?
    • A data warehouse is a electronic storage of an Organization's historical data for the purpose of reporting, analysis and data mining or knowledge discovery.

  57. What is data warehouse?
    • A data warehouse is a electronic storage of an Organization's historical data for the purpose of reporting, analysis and data mining or knowledge discovery.

  58. What is data warehouse?
    • A data warehouse is a electronic storage of an Organization's historical data for the purpose of reporting, analysis and data mining or knowledge discovery.

  59. What are frequent patterns?

    • These are the patterns that appear frequently in a data set. 
    • item-set, sub sequence, etc

  60. What is data warehouse?
    • A data warehouse is a electronic storage of an Organization's historical data for the purpose of reporting, analysis and data mining or knowledge discovery.

  61. What is data characterization?

    • Data Characterization is s summarization of the general features of a target class of data. Example, analyzing software product with sales increased by 10%

  62. What is data discrimination?

    • Data discrimination is the comparison of the general features of the target class objects against one or more contrasting objects.

  63. What can business analysts gain from having a data warehouse?

    • First, having a data warehouse may provide a competitive advantage by presenting relevant information from which to measure performance and make critical adjustments in order to help win over competitors.
      Second, a data warehouse can enhance business productivity because it is able to quickly and efficiently gather information that accurately describes the organization.
      Third, a data warehouse facilitates customer relationship management because it provides a consistent view of  customers and item across all lines of business, all departments and all markets.
      Finally, a data warehouse may bring about cost reduction by tracking trends, patterns, and exceptions over long periods in a consistent and reliable manner.

  64. Why is association rule necessary?

    • In data mining, association rule learning is a popular and well researched method for discovering interesting relations between variables in large databases. 
    • It is intended to identify strong rules discovered in database using different measures of interesting.

  65. What are two types of data mining tasks?

    • Descriptive task
    • Predictive task

  66. Define classification.

    • Classification is the process of finding a model (or function) that describes and distinguishes data classes or concepts.

  67. What are outliers?

    • A database may contain data objects that do not comply with the general behavior or model of the data. These data objects are called outliers.

  68. What do you mean by evolution analysis?

    • Data evolution analysis describes and models regularities or trends for objects whose behavior change over time.
      Although this may include characterization, discrimination, association and correlation analysis, classification, prediction, or clustering of time related data.
      Distinct features of such as analysis include time-series data analysis, sequence or periodicity pattern matching, and similarity-based data analysis.

  69. Define KDD.

    • The process of finding useful information and patterns in data.

  70. What are the components of data mining?

    • Database, Data Warehouse, World Wide Web, or other information repository
      ØDatabase or Data Warehouse Server
      ØKnowledge Based
      ØData Mining Engine
      ØPattern Evaluation Module
      ØUser Interface

  71. Define metadata.

    • A database that describes various aspects of data in the warehouse is called metadata.

  72. What are the usage of metadata?

    • ØMap source system data to data warehouse tables
      ØGenerate data extract, transform, and load procedures for import jobs
      ØHelp users discover what data are in the data warehouse
      ØHelp users structure queries to access data they need

  73. List the demerits of distributed data warehouse.

    • ØThere is no metadata, no summary data or no individual DSS (Decision Support System) integration or history. All queries must be repeated, causing additional burden on the system.
      ØSince compete with production data transactions, performance can be degraded.
      ØThere is no refreshing process, causing the queries to be very complex.


  74. Define HOLAP.

    • The hybrid OLAP approach combines ROLAP and MOLAP technology.

  75. What are data mining techniques?

    • Association rules
    • Classification and prediction
    • Clustering
    • Deviation detection
    • Similarity search
    • Sequence Mining

  76. List different data mining tools.

    • Traditional data mining tools
    • Dashboards
    • Text mining tools

  77. Define sub sequence.

    • A subsequence, such as buying first a PC, the a digital camera, and then a memory card, if it occurs frequently in a shopping history database, is a (frequent) sequential pattern.

  78. What is data warehouse?
    • A data warehouse is a electronic storage of an Organization's historical data for the purpose of reporting, analysis and data mining or knowledge discovery.

  79. What is the main goal of data mining?

    • Prediction

  80. List the typical OLAP operations.

    • Roll UP
    • DRILL DOWN
    • ROTATE
    • SLICE AND DICE
    • DRILL trough and drill across

  81. If there are 3 dimensions, how many cuboids are there in cube?

    • 2^3 = 8 cuboids

  82. Differentiate between star schema and snowflake schema.

    • Star Schema is a multi-dimension model where each of its disjoint dimension is represented in single table.
      Snow-flake is normalized multi-dimension schema when each of disjoint dimension is represent in multiple tables.
      Star schema can become a snow-flake
      Both star and snowflake schemas are dimensional models; the difference is in their physical implementations.
      Snowflake schemas support ease of dimension maintenance because they are more normalized.
      Star schemas are easier for direct user access and often support simpler and more efficient queries.
      It may be better to create a star version of the snowflaked dimension for presentation to the users

  83. List the advantages of star schema.

    • Star Schema is very easy to understand, even for non technical business manager.
      Star Schema provides better performance and smaller query times
      Star Schema is easily extensible and will handle future changes easily

  84. What are the characteristics of data warehouse?

    • Integrated
    • Non-volatile
    • Subject oriented
    • Time varient

  85. Define support and confidence.

    • The support for a rule R is the ratio of the number of occurrences of R, given all occurrences of all rules.
      The confidence of a rule X->Y, is the ratio of the number of occurrences of Y given X, among all other occurrences given X

  86. What are the criteria on the basic of which classification and prediction can be compared?

    • speed, accuracy, robustness, scalability, goodness of rules, interpret-ability

  87. What is Data purging?

    • The process of cleaning junk data is termed as data purging. Purging data would mean getting rid of unnecessary NULL values of columns. This usually happens when the size of the database gets too large.
Learn more »

What are the roles of data warehouse manager?


ROLES OF DATA WAREHOUSE MANAGER

-Overall responsibility for the data warehouse.

-Defines and plans the project.

-Creates a schedule.

-Recruits the best possible team.

-Coordinates the team activities.

-Estimates the cost and benefits.

-Acts as a quality reviewer of the deliverables.

-Produces or ensures the production of project deliverables.

-Measures the costs and benefits.

-Makes improvement recommendations.

-Communicates with sponsor, IT management.

-Collects data inputs from a variety of sources, including legacy operational systems, third-party data suppliers, and informal sources.

-Assures the quality of these data inputs by correcting spelling, removing mistakes, eliminating null data, and combining multiple sources

-Applies broad data stewardship over the nature of the published data and assures the use of conformed dimensions and facts across the disparate data marts (which can be thought of as separate publications).

-Releases the data from the data staging area to the individual data marts on a regular schedule.

-Relies on and respects the trust of the end users.

-Is named prominently on the organizational chart to serve as a clear communication as to where the buck stops.

-Is driven by the continuously changing business requirements of the organization and the increasingly available sources of data.

-Is driven by rapidly changing media technologies, especially the current Internet revolution .

-Is very aware of the business significance of the data warehouse and consciously "captures" and takes credit for the business decisions made as a result of using the data warehouse.

Learn more »

What is data mining?

Definition

Data mining is the extraction of interesting (non-trivial, implicit, previously unknown, potentially useful) information or pattern from data of large database.

Example

It is a well-known secret that the competition in the telecommunication industry is fierce. The acquisition of new customers is difficult and often very expensive. Subsequently customer retention has become more and more important. Data Mining can determine characteristic customer clusters on the basis of collected historic data points from customers - such as for instance the frequency and timely distribution of customers' usage of services (calls, text messages, MMS, navigation, mail exchange,...). For each of these customer patterns the company can then offer tailored customer-life-cycle messages and offers.

The example shows how data mining can help a telecommunication service provider to customise their offers. This leads to higher customer satisfaction as well as to an increase in turnover and profit by risen sales over the whole customer life cycle (lifetime value).

Goals of data mining

Following are the goals of data mining:
  • prediction
  • identification
  • classification
  • optimization

Data mining as interdisciplinary subject

figure: data mining as interdisciplinary subject

 Data mining tasks

There are mainly two types of data mining tasks. They are:
  1. Predictive tasks: It performs inferences of the data in the database.
  2. Descriptive tasks: It describes the general features of data in the database.

Applications of data mining

Following are the applications of data mining
  • Retail Business
  • Financial data analysis
  • Telecommunication
  • Fraud detection and unusual pattern detection
  • Corporate analysis and risk managmement
  • Biomedical data engineering
  • Intelligent query system
  • Research  etc...
 
Learn more »

Explain about SQL extensions for OLAP.

SQL Extensions for OLAP

OLAP requires almost invariably data aggregations, and SQL does support such aggregations through its Group-by instruction. However, during OLAP, each data analysis session usually requires repeated aggregations of similar type over the same data. Formulating many similar but distinct queries is not only tedious for the analyst but also quite expensive in execution time, as all these similar queries pass over the same data over and over again.

It thus seems worthwhile to try to find a way of requesting several levels of aggregation in a single query, thereby offering the implementation the opportunity to compute all of those aggregations more efficiently (i.e., in a single pass). Such considerations are the motivation behind the currently available options of the Group-by clause, namely,
Ø  Grouping Sets
Ø  Rollup
Ø  Cube

These options are supported in several commercial products and they are also included in the current versions of SQL. We illustrate their use below, assuming the table SP(S#, P#, QTY), where attributes stand for “Supplier Number”, “Product Number”, and “Quantity”, respectively.

The Grouping sets option allows the user to specify exactly which particular groupings are to be performed:

Select         S#, P#, Sum (QTY) As TOTQTY
From          SP
Group By   Grouping Sets ((S#), (P#))

This instruction is equivalent to two Group-by instructions, one in which the grouping is by S# and one in which the grouping is by P#. Changing the order in which the groupings are written does not affect the result. The remaining two options, Rollup and Cube, are actually shorthand for certain Grouping Sets combinations.

Consider first the following example of Rollup:

Select         S#, P#, Sum (QTY) As TOTQTY
From          SP
Group By   Rollup (S#, P#)

This instruction is equivalent to (or shorthand for) the following instruction:

Select         S#, P#, Sum (QTY) As TOTQTY
From          SP
Group By   Grouping Sets ((S#, P#), (S#), ( ))

Note that, in the case of Rollup, changing the order in which the attributes are written affects the result.

Finally, consider the following example of Cube:

Select         S#, P#, Sum (QTY) As TOTQTY
From          SP
Group ByCube (S#, P#)

This instruction is equivalent to the following one:

Select         S#, P#, Sum (QTY) As TOTQTY
From          SP
Group By   Grouping Sets ((S#, P#), (S#), (P#), ( ))

In other words, the Cube option forms all possible groupings of the attributes listed in the Group-by clause.Therefore, in the case of Cube, changing the order in which the attributes are written does not affect the result.

We note that, although the result of each of the above Group-by options usually consists of two or more distinct answer-tables, SQL bundles them (unfortunately) into a single table, using nulls.

We also note that OLAP products often display query results not as SQL tables but as cross tabulations of SQL tables. The cross tabulation of a SQL table is a multi-dimensional table indexed by the values of the key attributes in the SQL table and in which the entries are the values of the dependent attributes. Cross tabulation - together with various visualization techniques - is especially useful for producing reports out of query results, and several report generating tools are available today in the market.

Finally, we note that the decision making process in an enterprise usually requires a number of reports produced periodically from the answers to a specific, fixed set of queries (such as monthly average sales per store, or per region, etc.). Such queries are usually called “continuous queries” or “temporal queries”. It is important to note here that a temporal query does not change over time; what changes over time is the answer to the query and, as a result, the report produced from the answer changes as well.

Learn more »

Write short note on Market Basket Analysis.

-->
Write short note on Market Basket Analysis.
Ans
1.     Frequent itemset mining leads to the discovery of associations and correlations among items in large transactional or relational data sets. The discovery of interesting correlation relationships among huge amounts of business transaction records can help in many business decision-making processes, such as catalog design, cross-marketing, and customer shopping behavior analysis.
2.     A typical example of frequent itemset mining is market basket analysis. This process analyzes customer buying habits by finding associations between the different items that customers place in their “shopping baskets” as shown in the figure.
1.     The discovery of such associations can help retailers develop marketing strategies by gaining insight into which items are frequently purchased together by customers.
2.     For instance, if customers are buying milk, how likely are they to also buy bread (and what kind of bread) on the same trip to the supermarket? Such information can lead to increased sales by helping retailers do selective marketing and plan their shelf space.
3.     Example: Suppose, as manager of a store, one would like to learn more about the buying habits of the customers like which groups or sets of items are customers likely to purchase on a given trip to the store.
4.     Market basket analysis may be performed on the retail data of customer transactions at the store. The results can be used to plan marketing or advertising strategies, or in the design of a new catalog.
5.     For instance, market basket analysis may help design different store layouts.
6.     In one strategy, items that are frequently purchased together can be placed in proximity in order to further encourage the sale of such items together.
7.     In an alternative strategy, placing hardware and software at opposite ends of the store may entice customers who purchase such items to pick up other items along the way.
8.     Market basket analysis can also help retailers plan which items to put on sale at reduced prices.
9.     Each item in the store has a Boolean variable representing the presence or absence of that item. Each basket can then be represented by a Boolean vector of values assigned to these variables.
10.  The Boolean vectors can be analyzed for buying patterns that reflect items that are frequently associated or purchased together. These patterns can be represented in the form of association rules.
Learn more »

What is Frequent Pattern Mining?

-->
What is Frequent Pattern Mining?
Ans.
1.     Frequent pattern mining searches for recurring relationships in a given data set.
2.     Frequent itemset mining leads to the discovery of associations and correlations among items in large transactional or relational data sets.
3.     The discovery of interesting correlation relationships among huge amounts of business transaction records can help in many business decision-making processes, such as catalog design, cross-marketing, and customer shopping behavior analysis.
4.     Market basket analysis is just one form of frequent pattern mining.
5.     Classification of Frequent pattern mining:
·       Based on the completeness of patterns to be mined:
o   We can mine the complete set of frequent itemsets, the closed frequent itemsets, and the maximal frequent itemsets, given a minimum support threshold.We can also mine constrained frequent itemsets ,approximate frequent itemsets , near-match frequent itemsets , top-k frequent itemsets.
o   Different applications may have different requirements regarding the completeness of the patterns to be mined, which in turn can lead to different evaluation and optimization methods.
·       Based on the levels of abstraction involved in the rule set: 
o   Some methods for association rule mining can find rules at differing levels of abstraction.
o   For example,
§  buys(X, “computer”)buys(X, HP printer)
§  buys(X, “laptop computer”)buys(X, HP printer)
o   In these Rules, the items bought are referenced at different levels of abstraction.
o   The rule set mined is consisting of multilevel association rules. If, instead, the rules within a given set do not reference items or attributes at different levels of abstraction, then the set contains single-level association rules.
·       Based on the number of data dimensions involved in the rule: 
o   If the items or attributes in an association rule reference only one dimension, then it is a single-dimensional association rule
o   For example,
§  buys(X, “computer”)buys(X, antivirus software)
o   If a rule references two or more dimensions, then it is a multidimensional association rule. For example,
§  age(X, “30…39”)^income(X, “42K…48K”)buys(X, high resolution TV):
·       Based on the types of values handled in the rule:
o    If a rule involves associations between the presence or absence of items, it is a Boolean association rule.
o    For example,
§  Rules buys(X, “computer”))buys(X, “HP printer”) are Boolean association rules.
o   If a rule describes associations between quantitative items or attributes, then it is a quantitative association rule.
§  age(X, “30…39”)^income(X, “42K…48K”)buys(X, high resolution TV):
§  Age and income, have been discretized.
·       Based on the kinds of rules to be mined: 
o   Frequent pattern analysis can generate various kinds of rules and other interesting relationships. Many of the rules are redundant or do not indicate a correlation relationship among itemsets. Thus, the discovered associations can be further analyzed to uncover statistical correlations, leading to correlation rules.
o   We can also mine strong gradient relationships among items. Example “The average sales from Sony Digital Camera increase over 16% when sold together with Sony Laptop Computer”: both Sony Digital Camera and Sony Laptop Computer are siblings, where the parent itemset is Sony.
·       Based on the kinds of patterns to be mined: 
Many kinds of frequent patterns can be mined from different kinds of data sets like frequent itemset mining, Sequential pattern mining, Structured pattern mining.
Learn more »