I know that I just spoke about new antibiotic discovery here the other day, but there’s a new paper worth highlighting that just came out today. A team from MIT, the Broad Institute, Harvard, and McMaster reports what is one of the more interesting machine-learning efforts I’ve seen so far, in any therapeutic area.
This is another run at having an ML system work though a data set of both active and inactive compounds and letting it build a model of structures that seem to be associated with that activity. So let’s do a bit of background: that, of course is what medicinal chemists have been doing for decades (in our rather human way), and it’s also what we’ve been trying to use software to help us do for decades, too. The potential use of automated systems to help us sift through the huge piles of available compounds (and the even more towering piles of possible ones) has been a goal for a long time now, and every generation of chemists and computational scientists has had a go at it with the latest hardware and the latest code. In its final form, this would be “virtual screening”, the supplementation (or outright replacement) of physical assays with computational ones that could evaluate structures that haven’t even been prepared yet but only exist as representations of things that could be made if they look interesting.
We’ve been working on virtual screening for decades, with a level of success that can be characterized as quite variable but (to be honest) often underwhelming. There have been many levels of this sort of thing – a popular one has been to run though a collection of known drugs or bioactive molecules (which puts you into the low thousands of possibilities). And if you’re working at a drug company, you can screen the compounds that you already have in your collection or some set thereof (which can take you up into the hundreds of thousands or low millions). These days, there are also collections of structures such as the ZINC databases, which will give you over a billion molecules to screen, if you’re up to it – and it’s safe to say that nobody is, because doing any sort of meaningful computational screen on those sorts of numbers really is at the edge of what we can talk about, even in 2020. Also at that edge is the idea of “generative” screening, where you don’t just dump pre-generated structures into the hopper, but rather have the software build out what it believes would be interesting new structures based on its own modeling. That also is just beginning to be possible, depending on whose press releases you read.
What do you get when you run these screens? Well, it’s safe to say that you always get hits. The uncertainties in modeling (and the concomitant desire not to miss things) ensure that you will pretty much always get virtual screening hits. Unfortunately, you can also count on the great majority of those being false positives should you actually screen them out here in the real world. To be sure, every physical screen generates those, too, but virtual screens are particularly efficient at false-positive generation, and the scoring and ranking functions are generally fuzzy enough that you really can’t make a start on clearing them out other than by running the actual assay. It’s true, though, than in many cases (although not always!) that running those compounds does show that the virtual screen enriched the list with actual hits, as compared to the hit rate of the starting collection. That’s good, and it’s a success for the algorithms, although there are also times when that enrichment is nothing that a human chemist couldn’t have pointed to as well (based on similarities with known compounds).
To finish up this background digression, that last point is exactly what we would ask of virtual screens: to find us active compounds whose structures we wouldn’t have suspected ourselves. Ideally, the software would spit out a list of only such compounds, not scatter them lightly through a big pile of red-herring false positives and a bunch of real but coulda-toldya-that true positives. We can’t do that yet. But we’re getting closer, and this new paper is an example (to which at last I turn!)
An important feature of this work is that it’s a close collaboration between virtual screening ML methods and actual assays, run specifically for this project. For example, the team started out by taking a list of FDA-approved drugs and a somewhat shorter list of bioactive natural products (2,335 compounds total) and running a growth-inhibition screen with them against E. coli bacteria. Machine-learning models are exquisitely sensitive to the quality of the data used to train them, and it’s a very good idea to generate that data yourself under controlled conditions if you can. There are surely antibacterial numbers available in the literature for many of the compounds on that list, but they’re going to be from assays run by different labs under different conditions, against different strains and at different concentrations, making those numbers close to useless for reliable machine learning fodder. So collecting fresh numbers under tighter conditions was an excellent start.
120 of the molecules in that set inhibited bacterial growth at 80% or better at a set concentration, so those were classified as hits. The neural-network model was then trained up on these activities and structures. As the authors note, in years past compounds had to be rendered into numerical forms using human judgement calls (Do you assign different values to, say, different functional groups? To their arrangements in space? To other molecular properties as well? And in what proportion?) But our systems have gotten capable enough to generate these themselves along with the activity data, giving you a complex structure-activity model (the particular technique they used is detailed here). Further molecular features were added in as well using RDkit.
After this, the group used an ensemble of such generated models to evaluate a collection (from the Drug Repurposing Hub) of over six thousand molecules that have been reported as going into clinical development (across all sorts of indications). Compounds that overlapped with the initial training set were removed. And at this point they compared several different ML models in their ability to handle the data: one trained without the added RDkit properties, for example, along with one trained only on RDkit numbers, several random-forest trained algorithms, etc. (I would be very glad to hear from people with more ML experience than I have about this paper’s degree of disclosure on their main model and these others – one of the biggest problems in the field is the lack of enough disclosure to reproduce such work, and I hope that’s not the case here). Taking the 99 best molecules as predicted by their model and actually testing these against E. coli for growth inhibition showed that 51 of these did have some level of activity – as compared to the set of the 63 lowest-scoring molecules, of which only 2 showed activity. So the virtual screen did indeed seem to enhance for activity (although, as you can see, about half of the best “hits” were still false positives).
Looking through those 51 screening hits for interesting structures and degree of clinical development, one in particular stood out: SU3327, a c-Jun terminal kinase inhibitor that turned out to have an MIC of 2 micrograms/mL against E. coli growth (an activity that had never been noticed before). Its structure is vaguely like the nitroimidazole antibiotics (metronidazole, for example), but it displays a different spectrum of activity. In fact, the compound (renamed halicin) showed activity against an impressive variety of bacteria, including S. aureus, A. baumanni, C. difficile, M. tuberculosis, and others (it seems to have much less effect on Pseudomonas, unfortunately). Notably, it continued to perform against drug-resistant strains (against several common antibiotics with different mechanisms of action), and attempts to generate resistance mutants were not successful.
Looking at gene expression profiling, the compound seemed to affect cell motility and iron homeostasis, which led to a hypothesis that it was affecting the pH gradient of the bacterial membrane (disruption of which has been reported to interfere with both of these). Indeed, the compound’s effects were very pH-sensitive, and experiments with fluorescent probes and membrane disrupting compounds were consistent as well. This is not a commonly recognized mode of action, and it’s worth noting that the nitroimidazoles themselves don’t seem to work this way, but rather disrupt DNA synthesis. A quick search through the literature, though, turned up this paper that suggests that several antibiotics have effects on pH homeostasis that contribute to their bactericidal action (building on an earlier oxidative-stress hypothesis). But in that case, it seems to be the opposite gradient effect, if I’m reading it right: the antibiotics studied there (such as chlorpromazine) became more effective under alkaline conditions, whereas halicin becomes less so.
Halicin itself is shown in the paper to be effective in mouse models of drug-resistant bacterial infection, which is quite interesting. Topical infection with A. baumannii strain 288, which is resistant to all the usual antibiotics, was effectively treated with halicin ointment. Another model was C. difficle infection in the gut, where metronidazole is a first-line treatment, and orally administered halicin outperformed it. It would be quite interesting to know the compound’s profile in preclinical tox testing in its life as a JNK inhibitor, and how close it has come to being taken into human trials.
The group then went on to apply their ML model to wider sets of compounds. Trying it out against the 10,000 compounds in an anti-tuberculosis set at the Broad was predicted to not line up well, since it was a highly divergent chemical and biological space from the original set. And indeed it didn’t – running the best-scoring and worst-scoring compounds from the ML model in the E. coli growth inhibition model showed no real enhancement by virtual screening. These results were incorporated into the model, then, as the group moved on to the (huge) ZINC15 data set of over 1.5 billion structures. This is too large to screen the whole thing in such detail, at least it is in the year 2020, so the group concentrated on compounds with physical properties most like those of known antibiotics. That knocked it down to a mere hundred millions molecules – still a very impressive number. The top 6800 compounds were as before compared as ranked by several other ML models, and 23 were selected as having very good similarity in antibiotic property space but with wide structural divergence, in an effort to find new chemical matter that might well show new mechanisms.
These compounds were assayed against E. coli, S. aureus, K. pneumoniae, A. baumannii, and P. auruginosa, which is quite a list of heavy hitters, and 8 of the 23 had inhibitory activity against at least one of these. One of the structures is a sort of quinolone-sulfa hybrid that showed rather potent and broad activity, while not being significantly slowed down by quinolone-resistant strains. Now, if you showed this one to an experienced medicinal chemist and asked what it was, they’d say “Probably an antibiotic”, just because of the structural heritages, but it seems as if it might have interesting activity and is probably worth following up on. So the screen was worthwhile, and the paper claims that it took 4 days to run. That estimate doubtless does not include the significant amount of effort it took to get to the point of running said screen, but it’s true that much of that work doesn’t have to be done again if you want to go further.
So overall, this is an impressive paper. The combination of what appears to be pretty rigorous ML work with actual assay data generated just for this project seems to have worked out well, and represents, I would say, the current state of the art. It is not the “Here’s your drug!” virtual screening of fond hopes and press releases, but it’s a real improvement on what’s come before and seems to have generated things that are well worth following up on. I would be very interested indeed in seeing such technology applied to other drug targets and other data sets – but then, that’s what people all around academia and industry are trying to do right now. Let’s hope that they’re doing it with the scope and the attention to detail presented in this work.