Hacker. Researching in Quantum Machine Learning in academia and in industry. Privacy enthusiast, expertise in cybersecurity. Musician.
June 10, 2018
Practitioners in quantum machine learning should not only build their
skills in quantum algorithms, and having some basic notions of
statistics and data science won’t hurt. In the following the see some
ways to evaluate a classifier. What does it means in practice? Imagine
you have a medical test that is able to tell if a patient is sick or
not. You might want to consider the behavior of your classier with
respect to the following parameters: the cost of identifying a sick
patient as healthy is high, and the cost of identifying a healthy
patient as sick. For example, if the patient is a zombie and it
contaminates all the rest of the humanity you want to minimize the
occurrences of the first case, while if the cure for “zombiness” is
lethal for a human patient, you want to minimize the occurrences of the
second case. With P and N we count the number of patients tested
Positively or Negatively. This is formalized in the following
definitions, which consists in statistics to be calculated on the test
set of a data analysis.
TP True positives (statistical power) : are those labeled as
sick that are actually sick.
FP False positives (type I error): are those labeled as sick but
that actually are healthy
FN False negatives (type II error) : are those labeled as
healthy but that are actually sick.
TN True negative: are those labeled as healthy that are healthy.
Given this simple intuition, we can take a binary classifier and imagine
to do an experiment over a data set. Then we can measure:
True Positive Rate (TPR) = Recall = Sensitivity: is the ratio of
correctly identified elements among all the elements identified as
sick. It answer the question: “how are we good at detecting sick
True Negative Rate (TNR) = Specificity is a measure that tells
you how many are labeled as healthy but that are actually sick.
False Positive Rate = Fallout
False Negative Rate = Miss Rate
Precision, Positive Predictive Value (PPV):
$F_1$ score is a more compressed index of performance which is a
possible measure of performance of a binary classifier. Is simply
the harmonic mean of Precision and Sensitivity:
Receiver Operating Characteristic (ROC) Evaluate the TRP and FPR
at all the scores returned by a classifier by changing a parameter.
It is a plot of the true positive rate against the false positive
rate for the different possible value (cutpoints) of a test or
The confusion matrix generalize these 4 combination of (TP TN FP
FN) to multiple classes: is a $l \times l$ where at row $i$ and
column $j$ you have the number of elements from the class$i$ that
have been classified as elements of class $j$.
Bref. This post because I always forgot about these terms and I wasn’t
able to find them described in a concise way with the same formalism
without googling more time than that I spent writing this post. Other
The postings on this site are my own and don't necessarily represent my
employer’s positions, strategies or opinions.
© Alessandro ``Scinawa'' Luongo, 2017 — built with Jekyll using Lagom theme