Data Mining by KP

Data Mining by KP https://padlet.com/kpherng/pouehjp444iq en-us 2018-04-12 12:34:13 UTC 2023-02-28 10:23:20 UTC hello@padlet.com https://padlet-assets.s3.amazonaws.com/icons/Balance.png What data mining function does a department store need to assist with its target marketing mail campaign? https://padlet.com/kpherng/pouehjp444iq/wish/251106621 - Association function may be used in data mining to determine what product that customer are likely to purchase next when they have already previously bought another product. Using this info, we may email customers that are more likely to buy the products.
- clustering function also can be used to determine the customer personal detail such as gender / age group / married status to sent them suitable email to them. For example, for female customer, the company can sent them email that related to female product
- description method
- Constrain - based method could be a good approach. A constraint refers to the user expectation or the properties of desired clustering results.

]]> 2018-04-12 12:35:33 UTC https://padlet.com/kpherng/pouehjp444iq/wish/251106621 Can they be performed alternatively by data query processing or simple statistical analysis? https://padlet.com/kpherng/pouehjp444iq/wish/251106713 - Statistics is only about quantifying data. While it uses tools to find relevant properties of data, it is a lot like math. It provides the tools necessary for data mining.

- Query Process, on the other hand, builds models to detect patterns and relationships in data, particularly from large data bases.

- Statics is very useful and effective for small group of data , however, query process which is data mining will be more effective for large amount of data. It will provde more comprehensive anlaysis of data and faster the data process. ]]> 2018-04-12 12:35:49 UTC https://padlet.com/kpherng/pouehjp444iq/wish/251106713 Question kpherng https://padlet.com/kpherng/pouehjp444iq/wish/251107691 What data mining function does a department store need to assist with its target marketing mail campaign? Can they be performed alternatively by data query processing or simple statistical analysis? ]]> 2018-04-12 12:38:16 UTC https://padlet.com/kpherng/pouehjp444iq/wish/251107691 https://padlet.com/kpherng/pouehjp444iq/wish/251109187 2018-04-12 12:41:24 UTC https://padlet.com/kpherng/pouehjp444iq/wish/251109187 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> kpherng https://padlet.com/kpherng/pouehjp444iq/wish/262108943 2018-05-19 14:58:37 UTC https://padlet.com/kpherng/pouehjp444iq/wish/262108943 Chapter 1 - ALVIN https://padlet.com/kpherng/pouehjp444iq/wish/262109679 Data Mining overview

Front part all logic
no need read

DM - extract information from a data set and transform it into an understandable structure for further use
- turn raw data into useful info

moores law
- computer speed double every 18 months

storage law
- total storage double every 9 months

6 steps
- Busniess understanding
- Data Understading/preparation/collection
- pre-procesing
-Modelling
-Mining
-Evaluation]]> 2018-05-19 15:09:04 UTC https://padlet.com/kpherng/pouehjp444iq/wish/262109679 Chapter 3 - KP kpherng https://padlet.com/kpherng/pouehjp444iq/wish/262153524 Discrete attribute
- finite
- countable eg zip codes, counts
- integer

Continuous attribute
- real numbers
- floating point

Nominal
- ID, zip codes, profession, eye color

Ordinal
- in order, rankings, grades

Binary
- true false, yes no

Interval
- dates

Ratio
- temperature in C vs F
- distance in CM vs Inch

Data & Data Pre-processing
Type of data sets
- Record (relational records) (a set of item like ppl who buy a cpu from computer have 90% will buy a mobo / 80% will buy ram / 50% case)
- Graph (From WWW, Social network)
- Ordered (maps / times-series)
why need prepocess data
- incomplete
- inconsistent (birth in 1996 and age is 50)
- noise (error)
all is no quality data which will make the mining results no quality

binning to handle noisy data
- smooth data by partition

Task in Data Preprocessing
- Data Cleaning clean dirty data (inconsistent, incomplete, noise)
- Data integration
- Data transformation
- Data reduction or data compression

]]> 2018-05-20 07:32:57 UTC https://padlet.com/kpherng/pouehjp444iq/wish/262153524 Chapter 2 data mining overview eddion717 https://padlet.com/kpherng/pouehjp444iq/wish/263908003 Prediction Method
- use variables and values to predict future values
- classification
- Anomaly Detection
- Regresson

Description Method
- human interpretable patterns
- Clustering
- Sequential Pattern discovery
- Association rule discovery

Classification
- supervised learning
- assigns item in a collection to target categories or classes
- to accurately predict target class for each case in the data

Regression
- predict numbers

Clustering
- find groups of closely related observtions
- euclidean distance

Sequential pattern
- predict strong sequential dependencies aming different events
- (A B) (C) -> (D E)

]]> 2018-05-27 23:22:36 UTC https://padlet.com/kpherng/pouehjp444iq/wish/263908003 kpherng https://padlet.com/kpherng/pouehjp444iq/wish/263909996 2018-05-27 23:43:42 UTC https://padlet.com/kpherng/pouehjp444iq/wish/263909996 kpherng https://padlet.com/kpherng/pouehjp444iq/wish/263910036 2018-05-27 23:44:16 UTC https://padlet.com/kpherng/pouehjp444iq/wish/263910036 kpherng https://padlet.com/kpherng/pouehjp444iq/wish/263910066 2018-05-27 23:44:40 UTC https://padlet.com/kpherng/pouehjp444iq/wish/263910066 Chapter 4 eddion717 https://padlet.com/kpherng/pouehjp444iq/wish/263911190 apriori algorithm
- if itemset is frequent, subsets must also be frequent

multiple level association rule

]]> 2018-05-27 23:56:08 UTC https://padlet.com/kpherng/pouehjp444iq/wish/263911190 Chapter 5 eddion717 https://padlet.com/kpherng/pouehjp444iq/wish/263911516 Classification
2 step process
- Model construction (step1)
- model is represented as classificaiton rules, decision trees, mathematical formula
- Model usage (step 2)

decision tree
-......

Bayesian classification
- probability

]]> 2018-05-28 00:00:44 UTC https://padlet.com/kpherng/pouehjp444iq/wish/263911516 kpherng https://padlet.com/kpherng/pouehjp444iq/wish/263911847 2018-05-28 00:05:23 UTC https://padlet.com/kpherng/pouehjp444iq/wish/263911847 chapter 6 eddion717 https://padlet.com/kpherng/pouehjp444iq/wish/263911940 Classification
KNN other names
- memory based reasoning
- example based reasoning
- instance based learing
- case based reasoning
- lazy learning

rule based classifier
eg (can give birth) = mammal
eg (can fly) = bird

direct method vs undirect method

mutual exclusive rule vs exhaustive rule

strategy for single rule
- top down (general to specific)
- bottom up (specific to general)]]> 2018-05-28 00:06:51 UTC https://padlet.com/kpherng/pouehjp444iq/wish/263911940