Receiver Operating Characteristics: ROC

#Machine Learning #Performance #Basics

ROC space is the two-dimensional space spanned by True Positive Rate and False Positive Rate.

ROC Space. The color boxes are indicating the confusion matrices. Green is the fraction of true positive. Orange is the fraction of false positive. Refer to Confusion Matrix for more details.

AUC: Area under Curve

TPR = TP Rate
FPR = FP Rate

The ROC curve is defined by the relation $f(TPR, FPR)$. Area under the ROC curve is

$$ \int TPR(FPR) d(FPR) \sim \sum_i TPR_i *\Delta FPR. $$

If AUC = 1, we have TP Rate = 1 for all FP Rate. This is the best performance a model could have.

How to Calculate ROC Curve

Not every model has an AUC. To get the AUC curve, we need a hyperparameter to be tuned to get different TP Rate and FP Rate.

In logistic regression, a threshold $T$ is predetermined to decide which label to use in classifications.

By tuning the threshold $T$, we get different TP Rate $TPR$ and FP Rate $FPR$, i.e., $TPR(T)$ and $FPR(T)$. The parametric relations between $TPR(T)$ and $FPR(T)$ forms the ROC curve.

Planted: 2020-05-13 by L Ma;

References:

Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.

Dynamic Backlinks to wiki/machine-learning/performance/roc:

Valid Confidence Sets in Multiclass and Multilabel Prediction

Ask for valid confidence: “Valid”: validate for test data, train data, or the generating …

Classifier Chains for Multilabel Classification

Classifier chains is a method to predict hierarchical class labels

wiki/machine-learning/performance/roc Links to:

Confusion Matrix (Contingency Table)

Confusion Matrix It is much easier to understand the confusion matrix if we use a binary …

Logistic Regression

logistics regression is a simple model for classification

L Ma (2020). 'Receiver Operating Characteristics: ROC', Datumorphism, 05 April. Available at: https://datumorphism.leima.is/wiki/machine-learning/performance/roc/.