ROC-curves are a very common tool to evaluate the quality of a prediction model in terms of sensitivity and specificity. Some time ago my colleague Prof. Dr. Dr. David Meintrupp wrote a sweet script to analyze and evaluate ROC-curves in JMP. Actually you can use the script to compare multiple ROC-curves and you will get confidence bands for them as well.

This will be a series of 3 blogposts:

  1. The first posts gives a basic introduction in how to use the script to create a ROC-curve with confidence-bands for a single test.
  2. In the second part we will be covering how to find the right cutoff-value for your tests.
  3. The last post will finally show you how to compare multiple tests via ROC-curves and partial Area Under the Curve-approaches.

This will not be a general introduction to ROC-curves. If you are not familiar with the basic concepts of reciever-operator-characteristics, sensitifity and specificity you might want to start reading something like this Introduction to ROC analysis.

You can find the script in form of an JMP add-in here.

Data

As an example I will use the aSAH-data-set from the r-package pROC. It contains data of 113 patients that suffered of aneurysmal subarachnoid hemorrhage. The idea here is to find a clinical test that predicts the outcome for the patient, as patients with bad outcome require special medical healthcare [1]. 

Therefore 3 different tests are available in wfns, s100b and ndka. The question to answer is: Which one of the tests is best in identifying patients with a bad outcome?

Now what means best? Well a good test:

  • Has a high probability to identify a patient with poor outcome. We call this sensitivity or recall.
  • Has a low probability that a patiend that is classified as poor outcome acutally has good outcome. This is sometimes called the False-Positive-Rate, as it's about predicting the event when it actually does not happen.

A ROC-curve is the most common measure to evaluate both the sensitivity and false-positive-rate (which would be 1-specificity) of a test in one graph. We want the sensitivity to be on a high level while the false-positive-rate should be on a low level.

ROC for wfns

Let's start with the ROC-curve for the first of the clinical tests: wfns. Just start the script and the following dialog will appear:

The evaluation-column will be the outcome for this example. Generally this is what you want to test for. The ROC-columns will be at least one test. Here I have chosen wfns. If you are working with predictive models (like Random Forrests, Neural Networks, Logistic Regression, etc.) you would use a column that contains the predicted probabilities for an event in here.

When you hit the OK-button a second dialog appears, asking for the 'positive' level of your outcome.

Especially for this example the message is very missleading. JMP is asking for the level that you want to predict - basically the event. For us this is 'Poor'! Finally we get the following report as a result:

Let's focus on the first section on the left hand side for a moment.

This is the general ROC-curve for wfns. The script already highlights a good cutpoint for the test with the black rectangle in the graph. That cutpoint grants a high number of true positives (a good sensitivity) and a low number of false positives. Basically it is as close to the top-left-corner, as possible. 

The numeric output gives you the area-under-the-curve (AUC) which is a common indicator for the quality of a predictor. A perfect prediction could achieve 100% sensitivity while not predicting any events for a patient that will have a 'Good' outcome. This would be equal with a ROC-curve that hits the (0;1)-coordinate  or a AUC of exactly 1. 

Here we get a AUC of 82% which might be good or not good - it depends on the application. The AUC is most usefull if you are comparing multiple tests. The test with the highest AUC might be considered the best test.

As a final step for today press the Bootstrap-button! Green confidence-bands appear for our ROC-curve. 

The green bands are another way to evaluate the quality of the test. A good test has to be different from the diagonal, which represents a classifier that randomly puts the patients into 'poor' or 'good'. In our case the diagonal is not inside of the green bands, which is a good sign for the test.

If you are wondering why the bands are not symetrically around the ROC-curve (actually the ROC-curve is even slightly outside of the bands on the top-right-corner): This happens as we are using bootstrap-confidence-bands. There are two kinds of classical (symmetrical) confidence-bands available in Binomial- and Kolmogorov-bands.

Next time we will discuss the middle part of the report, that helps to identify the right cutpoint for your test. Finally we will talk about partial AUC (pAUC) and the comparison of multiple tests using the script.

Literature

  1. Xavier Robin, Natacha Turck, Alexandre Hainard, Natalia Tiberti, Frederique Lisacek, Jean-Charles Sanchez and Markus Müller: pROC: an open-source package for R and S+ to analyze and compare ROC curves