Published: 24 Jan 2019 | Author: James Beresford

Recently, I had the opportunity to build a regression model for one of Talos‘s customers in the medical domain. Medical data poses an interesting challenge for machine learning experiments. In most cases when running algorithms for binary classification, the expected result in the training set will contain a large percentage of negatives. For example the goal of an experiment might be to predict if – based on a set of known clinical test results – a patient has a certain medical condition. The percentage of positive results in such a set, if it is a generic dataset for a vast number of medical conditions will most likely be very low. As a result a machine learning model when initially tested using a small set of chosen features will most likely come up with a high number of false negatives.

The latter however is a big problem in experiments involving clinical data, i.e. categorising that a patient does not have a certain medical condition incorrectly could have disastrous consequences. Once a confusion matrix is built, the model’s effectiveness is measured using indicators such as area under curve, accuracy, precision, recall and F1 score. In medical datasets, recall plays a big role as it measures the impact of false negatives. It can therefore hold significant weight in determining the most appropriate model for a given experiment.

The definition of recall is –

Recall = (True Positives) / (True Positives + False Negatives)

In the confusion matrix, the denominator in this equation makes up the total actual positives. So, recall therefore is effectively measuring the correct positive predictions over the actual number of positives in the dataset. If there were no false negatives, recall would be at the ideal score of 1, however if a large number of actual positives were predicated as negatives (i.e. false negatives), recall would be much lower.

As the model evolves and more relevant features are chosen for prediction, recall should start improving. In domains such as medicine where false negative predictions can have dire consequences, the recall score should play a vital role in choosing the most optimum model.

Subscribe

Get the latest Talos Newsletter delivered directly to your inbox

TECHNOLOGY PARTNERS

Our partners including Microsoft, UiPath, Databricks & Profisee enable us to deliver business outcomes using best of breed technologies and solutions.

Automation & Analytics Technologies for Business

Our Solutions

Automation Initiation
Automation Initiation
Your automation journey

Using our EPIC methodology guiding you to deliver outcomes quickly and cost effectively.

Modern Data Platform
Modern Data Platform
Build Data components

Build, test and implement Data Platform components - secure, efficient, flexible and cost effective.

Enterprise PowerBI
Enterprise PowerBI
Self Service Analytics

Enable self service analytics to meet the needs of the whole organisation with our proven methodologies.

CORRIE
CORRIE
Compliance Info Expert

Specialising in all compliance related processes, she has been trained to quickly learn specific compliance processes.

Click here for more