Machine Learning for Doctors

4 min readMar 10, 2018

Part 1: Introduction

Machine learning-“Field of study that gives computers the ability to learn without being explicitly programmed” — Arthur Samuel, 1959

Medical journals are publishing articles which uses Artificial Intelligence (AI) algorithms to diagnose and treat diseases. To understand, review and critique these articles, physicians need to understand the basics of AI. This series aims to explain and teach artificial intelligence algorithms to physicians. In part 1, we will look at machine learning, which is a subset of AI.

Humans learn from experience. Traditionally, computers work by using programs or set of instructions written by humans. In machine learning, computer learns from its experience with the data. In a supervised machine learning task like predicting whether a patient will develop diabetes, we give the computer a set of input data (features) which maps to an output (label). In this scenario features could be body mass index, family history of diabetes, fasting blood glucose etc., output or target is presence or absence of diabetes . Once this data is fed into the computer, it maps out a model (mathematical structure) to predict those outputs (labels) from the available input data.

Once the computer has modeled the relationship between input and the output data, we can use this model to predict the output using data that is unseen by the computer. In other words, if you throw a bunch of data at a machine learning algorithm, it tries to identify a mathematical pattern. If we give good quality data for training the algorithm, we get better predictions. Predictions work, only if your training data is representative of the problem that you are trying to solve. Essentially machine learning models are optimized mathematical function for the given problem.

To see a visual explanation of machine learning, click here.

To watch MIT OpenCourseWare on machine learning, click here. If you just want to hear about the explanation of machine learning skip to 8:58 and watch till 11 minutes.

Machine learning algorithms

Machine learning algorithms can be broadly classified into supervised learning, unsupervised learning and reinforcement learning.

Supervised learning

If we have labeled data, supervised learning can be used. First we use the data we have with correct label (value we are trying to predict) to train a model. Then we can use this trained model to predict labels for completely new data. For example, if we have a large data set containing patient’s age,gender, BMI, fasting glucose level, lipid panel (features) and whether or not they are diabetic(label), we can use this to create a model. Later we can use this model to predict the probability of an unknown patient developing diabetes.

Supervised learning can be used to predict continuous values, like amount of insulin needed based on blood glucose. It can also be used to assign labels like benign or malignant. Predicting continuous values is called regression and predicting label is classification.

Lets us look at an example. Regression can be used to predict physician’s salary. Following graph explores the relationship between average number of patients seen in a day and average annual salary for physicians. As you can see there is a linear relationship between number of patients seen and salary.

Mathematically the regression / trend line in the graph can be represented as

y = mx + c

Where m is the slope of the line and c is the intercept (where the line cuts x axis). X is the average number of patients and y is the predicted salary. To learn more about equation of a line, click here.

In very simple terms, if we have a lot of data points with average number of patients and salary of physicians, a linear regression model can figure values for m and c. If you would like to learn more about linear regression model, please click here.

Another clinical example of regression is predicting insulin requirement for a type 1 diabetic patient. Traditionally, pre meal insulin for a type 1 diabetic patient is calculated based on his/her carb ratio (amount of insulin that is needed to dispose 1 carb unit) and insulin sensitivity (drop in glucose with 1 unit of short acting insulin). This can be written using the following formula.

Amount of insulin =

[(total amount of carbs for the meal) /carb ratio]+ [(current blood glucose — target glucose) / insulin sensitivity]

If we want to create a computer program to output the insulin dose, we can hardcode this into a computer program. But if we have a lot of data on carbs consumed, pre-meal and post-meal glucose and amount of insulin taken, we can create a machine learning model to identify the relationship between these variables in a better fashion. Next time, when the patient wants to eat, he can plug in the data into the model and get a recommendation on how much insulin to take. We can make this model better by adding other variables like his activity level from a fitness tracker etc.

Classification algorithms like decision trees can be used to provide labels for a set of features. Let’s say that you have a database with cytology features of thyroid nodules with atypia of undetermined significance (AUS) along with the actual surgical pathology. This database can be used to create a machine learning model to predict malignancy based on cytology features for nodules with AUS. In the next article, we will create a working model to predict malignancy based on cytology features from breast biopsies.

To be continued in Part 2.

github.com

Machine Learning for Doctors

Machine learning algorithms

Written by Johnson Thomas

No responses yet