Clinically applicable deep learning for diagnosis and referral in retinal disease - De Fauw et al., Nature Medicine, Aug 2018

Original article:

One-sentence summary

In this paper, a research group from Deep Mind created a two-stage AI algorithm that can analyse images of the retina to provide a diagnosis and referral decision with accuracy on par with experts.

What did they do?

They used a two-step process; first segmenting the image and then classifying it into different referral decisions. For the first step, five instances of the segmentation network were run in parallel to identify areas of ambiguity. For the classification step, the metrics were optimised to avoid missing sight-threatening disease.

What does this mean?

This could be used as a screening tool and triage tool in developed countries, as well as to increase accessibility in developing countries. Additionally, the automatic segmentation could enable automatic measurements of different features, which could play a role in research.

Mid-length summary

Two main problems with medical imaging analysis; technical variations (devices, noise, etc) and patient variation (including different ethnicities).

They overcome each problem in a two-step approach:

  1. Deep segmentation network to create device-independent tissue-segmentation map
    a. U-Net
  2. Deep classification network analyses the segmentation map to provide (i) diagnoses and (ii) referral suggestions

    In retinal scans there are often ambiguous areas. To address this, they trained multiple instances of the segmentation network, ran them, and then saw where the different segmentation maps agreed and disagreed. In the areas of disagreement, different networks will propose different plausible interpretations.

    In total they use five segmentation and five classification model instances, and ensemble them.

    To test how well the network identified referral suggestions, they compared the performance with a panel of 8 experts (4 retinal specialists and 4 optometrists) when evaluating (i) the OCT alone and (ii) the OCT with the fundus image and clinical notes.

    The model performed broadly better than humans with OCT alone and better/similar level when the humans had clinical notes.

    Their framework initially scored a lower average penalty than the experts. They then optimised to be better at picking up sight-threatening diagnoses, and overall performance dropped below experts.

    The framework recommends the most urgent diagnosis on each scan.

    Overall performance for most pathologies is over 99% AUC, and overall error rate on referral decisions is 5.5%.

    They tested the framework with a dataset from a different scanning device and it initially performed poorly (46.6% error rate). They then re-trained the segmentation network with a dataset of manually segmented slices from this device, and found the error rate dropped to 3.4%.

    Segmentations could also be used to quantify retinal morphology and enable measurements (e.g. of location and volume of macular edema). Some of the measurements could be done automatically.­ This could help with research.