Published: 24 Jan 2019 | Author: James Beresford
Recently, I had the opportunity to build a regression model for one of FTS Data & AI‘s customers in the medical domain. Medical data poses an interesting challenge for machine learning experiments. In most cases when running algorithms for binary classification, the expected result in the training set will contain a large percentage of negatives. For example the goal of an experiment might be to predict if – based on a set of known clinical test results – a patient has a certain medical condition. The percentage of positive results in such a set, if it is a generic dataset for a vast number of medical conditions will most likely be very low. As a result a machine learning model when initially tested using a small set of chosen features will most likely come up with a high number of false negatives.
The latter however is a big problem in experiments involving clinical data, i.e. categorising that a patient does not have a certain medical condition incorrectly could have disastrous consequences. Once a confusion matrix is built, the model’s effectiveness is measured using indicators such as area under curve, accuracy, precision, recall and F1 score. In medical datasets, recall plays a big role as it measures the impact of false negatives. It can therefore hold significant weight in determining the most appropriate model for a given experiment.
The definition of recall is –
Recall = (True Positives) / (True Positives + False Negatives)
In the confusion matrix, the denominator in this equation makes up the total actual positives. So, recall therefore is effectively measuring the correct positive predictions over the actual number of positives in the dataset. If there were no false negatives, recall would be at the ideal score of 1, however if a large number of actual positives were predicated as negatives (i.e. false negatives), recall would be much lower.
As the model evolves and more relevant features are chosen for prediction, recall should start improving. In domains such as medicine where false negative predictions can have dire consequences, the recall score should play a vital role in choosing the most optimum model.
Get the latest Talos Newsletter delivered directly to your inbox
Automation & Analytics Technologies for Business
Enable self service analytics to meet the needs of the whole organisation with our proven methodologies.
Specialising in all invoice-related processes, he has been trained to quickly learn specific invoice-related processes.
Using our EPIC methodology guiding you to deliver outcomes quickly and cost effectively.
Specialising in all compliance related processes, she has been trained to quickly learn specific compliance processes.