This was originally shared as part of my weekly email newsletter.

I’ve written in the past more formally about how AI may impact different medical specialties, from psychiatry through to intensive care.

One area I haven’t written about, but where I think AI has huge potential, is in drug discovery.

I’ve spent this week reading all I can about AI in drug discovery, and I find writing down my thoughts a useful way to consolidate what I’ve learnt. This article will be that consolidation and exploration, and I’ll share links to the top resources I came across at the end. (If you’re interested in an expert opinion, consider skipping directly to those links.)

I want to avoid taking a “these are the top X uses of ML in drug discovery” approach, as I’ve seen so many articles doing exactly that, so don’t feel I’d be adding much. Instead, I’m going to outline the need for a new approach to drug discovery, how machine learning can fill that need and assess the current state-of-play.

The cost of drug discovery

The drug development process is notoriously difficult; it takes on average 10-15 years to take a drug from identifying the molecule to being used clinically, and only approximately 3% of the initial drugs will make it all the way through. The cost of bringing a drug all the way through the development pipeline, only to fail at the final hurdle, typically exceed $1 billion.

For those not familiar with the drug development pathway, here’s a useful overview:

As the diagram shows, there are numerous stages where drugs can fail. It’s far better to fail early than fail late, so any ability to predict which drugs will fail is advantageous (as we will see..). Doing so can be pretty difficult though.

What’s worse is that the cost of developing drugs is increasing while the success rate of development is decreasing (see Eroom’s Law). Pharmaceutical companies also have a bad rep from the public, and various regulations exists to limit shady practice (such as artificially inflating prices). All in all, this creates a pretty tricky situation for drug companies going forwards…

Machine learning: the light at the end of the tunnel?

So how can machine learning (ML) offer a route out? As I see it, it fundamentally comes down to its excellent pattern recognition ability, which can be applied in multiple ways. Let’s look at concrete examples of its use through this lens, in chronological order.

1. Candidate identification and selection

Given the huge time and cost investment of testing a molecule, it makes sense to put a lot of effort into identifying and selecting molecules. Traditionally, fairly empirical approaches have been favoured, such as a series of tests for solubility, strength of protein binding, cytotoxicity, molecule stability and so on. Attempts have been made to take a more ‘first principles’ approach, of identifying potential molecules based on our current understanding of disease. While attractive, doing so has been difficult, in-part due to the complexity of biological systems. Refoxocib is an example of this.

Through pattern recognition, ML offers new ways to identify candidates. The logic follows that; the more candidates, the greater probability of good candidates being contained within them. One novel method is the use of GANs, which trains a network to generate potential molecules that are similar to existing molecules. A recent success story was published in Nature last year.

Another approach is the use of ‘knowledge graphs’, which represent the relationships between targets, inhibitors and diseases as interconnected nodes. The rationale is that by studying known relationships, we can identify new relationships. For example, if molecule 1 is known to inhibit signalling proteins x, y and z and has been effective at disease A, and molecule 2 is known to also inhibit x, y and z (but until now has only been used to treat disease B), then perhaps molecule 2 could also be effective for disease A. This is an overly-simplistic example, but ML can enable this logic to be followed at scale. have described how they use this approach to identify targets.

As per the mantra fail fast, it pays to identify molecules that will fail as early as possible. ML may enable prediction of undesirable interactions by comparing with known interactions.

2. Optimising clinical trials

ML can also increase the success rate of clinical trials in several ways. One way is by identifying a subpopulation (through pattern recognition) that is more likely to respond successfully to a drug. This increases the success rate of the trials and thus the probability of the drug reaching market. Although the license may be more specific, it can be expanded by subsequent trials.

With all of these approaches, it’s worth noting that the incorporation of machine learning doesn’t guarantee the successful development of a new molecule. Drugs and biology are a very messy business, and we are a long way away from fully simulating these systems. However, what these techniques do offer is increased probabilities and shortened timescales which, when scaled up across numerous trials, can have a very big impact.

A new era for drug discovery?

Given the new potential to take such approaches, existing pharmaceutical companies are incorporating ML into their pipelines and many new companies centred around ML-based discovery have popped up. It seems like a crowded space, as these lists of 43 pharmaceutical companies and 221 start-ups suggest. However, I think this reflects the scope of the problem, and the financial reward to be gained from the solution.

So how much success have we seen so far? I’m aware of four new drugs that have progressed through R+D +/- preclinical stages which explicitly used ML techniques. These are; an antibiotic from drug repurposing with deep learning, an OCD drug and an antifibrotic agent, both identified using GANs, and a target for Wilson’s disease from the Toronto start-up Deep Genomics.

To my knowledge, none have yet made it fully through the pipeline, which likely reflects the recency of the technological developments.

It will be interesting to see how things play out. I believe the incorporation of ML into drug discovery marks an inflection point. The big players will need to adapt (and I understand that they are), and there is plenty of scope for collaboration with smaller, more agile start-ups. We will see how much market share from the old giants gets taken by newer companies.

In the past, drug discovery has often been about finding the next ‘blockbuster drug’ which provides an exponential return on investment, thus making up for all the failures. I’m hopeful that ML will enable a shift away from this, and enable a more balanced approach to drug development.

Given the fundamental importance of health, and the huge plethora of untreated diseases, there should be plenty of pie to go around. Time will tell how the 221 (and counting) start-ups in the area do. I suspect failure among them is inevitable, and may ultimately come down to investor support or early wins; the latter being determined by a combination of luck and skill.

Even with the promise of acceleration, drug discovery is still a relatively long-term game, with multi-year development and life cycles for drugs. So we may have to wait a while to see exactly what happens.

Regardless of which companies win and lose, I believe patients are likely to benefit from a greater availability of drugs, with lower associated prices. That’s something that makes me optimistic :)

What do you think? If you have any thoughts on the subject, I’d love to hear them.

UPDATE: Since writing this article, I had a discussion with a friend who works for a large AI drug discovery company, and he explained that the business model for many of the startups is in providing drug candidates to more established pharmaceutical companies who will takes things further. In this way, things are balanced; smaller start-ups don't shoulder the financial responsibility of running expensive trials, and larger companies can receive more drug candidates that have been intelligently vetted using artificial intelligence.

The most useful resources I came across while preparing this article were:


I’ll be honest, I ran out of steam on the sections on predicting drugs that will fail and on optimising clinical trials. I may update these in the future. (9/3/20)