How can we quantify the quality of the prediction of Machine Learning models?¶
There are a number of scores defined and computed for various machine learning problems. Here we categorize the scoring methods based on the problem category: Classification, Regression and Clustering and include the most frequently used measures.
Classification
accuracy score: fraction of the correct prediction to total predictions
precision-recall curve: shows the precision () against recall () by varying a decision parameter where , , , and refer to the true positive count, the true negative count, the false positive count, and the false negative count, respectively.
f1-score: f1 score is computed as where and refer to precision and recall respectively.
roc curve: shows the true positive rate (TPR = ) against the false positive rate (FPR = ) by varying a decision parameter.
confusion matrix: entry i,j in this matrix shows the number of observations in group i but predicted to be in group j.
Clustering
adjusted rand index: is the corrected for chance version of rand index which is a measure of similarity between two data clusterings and represents the chance of occurrence of agreements for any pair of elements.
adjusted mutual information: is the corrected for chance version of mutual information score which is bounded by the entropies of each cluster.
contingency index: entry i,j in this matrix shows the number of true members of cluster i predicted to be in cluster j.
Regression
mean squared error: defined as the average of square difference between actual and predicted output
mean absolute error: defined as the average of absolute difference between actual and predicted output
explained variance score: defined as 1 minus variance of error divided by variance of actual output, that is:
R² score: defined as 1 minus sum of squared error divided by sum of squared difference from output mean, that is: