This is a note of Data Mining of XJTU.

## Confusion Matrix

Actual class \ Predicted class C1 not C1
C1 TP True Positive FP False Negative
not C1 FP False Positive TN True Negative

$$\text{Accuracy} = \frac{TP + TN}{\text{All}}$$
$$\text{Precision} = \frac{TP}{TP + FP}$$
$$\text{Recal(Sensitivity)} = \frac{TP}{TP + FN}$$
$$\text{Specificity} = \frac{TN}{TN+FP}$$

## 10 Algorithms

### Decision Tree

#### C4.5

C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan’s earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier.

##### Entropy

$$Info(D) = -\sum_{i=1}^{m}p_i\log_2(p_i), \quad 0\log0=0$$

##### Gain Ratio

$$SplitInfo(A) = -\sum_{j=1}^{m}\frac{|D_j|}{|D|}\log_2(\frac{|D_j|}{|D|})$$
$$GainRatio(A) = Gain(A) / Split(A)$$

#### CART

$$Gini = 1 - \sum_i p_i^2$$

$$G_{split} = \sum \frac{|Dj|}{D} * Geni_D$$