Skip to main content

Chemical News

Calculating A Few Too Many New Compounds

The phrase “automatic chemical design” will generally get my attention, especially when it’s applied to drug-like molecules. And that’s one the the key parts of this paper, from researchers at Harvard, Toronto, and Cambridge. From what I can see, they’re trying to come up with a new technique for generating potential new chemical structures – for example, to do virtual screening on. Most of the paper discusses their methods for encoding (and decoding) numerical representations of chemical structures in a way that lets new ones be generated quickly. This process is also, in theory, taking some molecular properties into account:

To enable molecular design, the chemical structures encoded in the continuous representation of the autoencoder need to be correlated to the target properties that need to be optimized. Therefore, based on the autoencoder results, we train a third model to predict molecular properties based on the latent representation of a molecule. To propose promising new candidate molecules, latent vectors of encoded molecules are moved in the direction most likely to improve the desired attribute and these new candidate vectors are decoded.

This is not a crazy thing to do – in fact, many people have described ways to do it, and this paper is, in fact, presenting itself as an improvement on those. At the same time, I’m not sure if coming up with possible new structures is a rate-limited step for anyone, although I’d gladly be corrected on that one. What this work reminds me of a bit are the efforts by the Reymond group to determine all the possible molecular arrangement below a certain number of heavy atoms (such as the GDB-13 set). This paper is not a from-the-ground-up effort like that work, but is rather an attempt to say “Given this particular molecule (or this set of molecules), how can we use these structures as seeds in order to computationally explore chemical space?”

Update: several readers have pointed out that I’m missing a key point of the paper – the handling of what are essentially discrete variables of chemical structure as continuous ones, allowing, computationlly, the ability to slide along various axes towards desired properties. So I wanted to mention that here, to make sure it doesn’t get lost.

There’s some unfortunate coverage of this paper at Technology Review, summed up by the headline (“Software Dreams Up New Molecules in Quest For Wonder Drugs”). The article suffers from its handling of the point I just raised, since it claims that “Pharmaceutical research tends to rely on software that exhaustively crawls through giant pools of candidate molecules using rules written by chemists, and simulations that try to identify or predict useful structures”, which isn’t quite the case. Equally unfortunate is what happens when you start looking at the output of this process, since it becomes clear that none of the authors are from the departments of chemistry at any of the mentioned institutions. Figure 4 in the paper shows about 65 compound variations starting from aspirin, but by my own count, about 14 of them are either not at all drug-like (acid chlorides, anhydrides, cyclopentadienes, aziridines) or chemically implausible (a fluorocyclobutadiene, a diaminocycloheptatriene). Running structures such as these through any virtual screening effort is a waste of time, and what worries me is that aspirin is about as innocuous a starting point as you could imagine, and this method still produced about 20% craziness.

Looking through some of the other outputs in the appendix of the paper does not inspire more confidence. To begin with, there are a lot of three- and four-membered rings in there, many of them very unlikely indeed, and that reminds me of the graph-theory work mentioned above. The GDB data sets, it’s often forgotten, had to first be purged of well over 99% of their generated chemical frameworks because they were weird concatenations of small rings, and I think that the current program is exhibiting tendencies toward small-ring-forming of its own. To be sure, it also seems to like cycloheptatriene and cyclooctatetraenes a lot more than more people do, so the ring-generating problems may be deeper.


Other problems are immediately apparent if these are supposed to be (even vaguely) drug-like molecules. As with the aspirin-derived structures, there are a lot of reactive and/or unstable molecules in the outputs. The program seems to have no problem with enamines, hemiaminals, enol ethers, and several other labile groups, but there are even bigger problems. I have reproduced at right some (but by no means all) of the problematic structures that appear. It is not going too far, I think, to characterize software that proposes such compounds as defective. No organic chemist could have looked at these without raising the alarm – this stuff is not, by many standards, publishable at all. When the authors do show this work to someone in the field, it will not go well. In fact, this blog post is an example of just such an encounter, and no, it isn’t going well. We’re getting into deepfryer cow cow territory here.

As mentioned, I don’t find the idea behind this paper to be intrinsically wrong or anything of the sort. But what the authors are trying to do is harder than it looks. The molecules that are generated by this method have too many examples like the ones shown to be taken seriously, and on the other end of the spectrum, there are also too many that might be described as “Have you tried adding an isopentyl ether to it?”. No one’s in great need of that set, either. I do still encourage the authors to proceed with this work, if that’s what they’re doing, but I also strongly urge them to consult some actual chemists along the way. And don’t talk to anyone at Technology Review for a while, either.

55 comments on “Calculating A Few Too Many New Compounds”

  1. ScientistSailor says:

    It would be a great April Fool’s day joke to populate the registration SD file from a Chinese CRO with these structures. Would also be a good test to see who in your med. chem group is paying attention.

  2. c says:

    How silly of us! All this time and we forgot to look into cyclobutadienes!

    Such sweet good fortune that we have these wise computer scientists among our ranks!

    On an unrelated note, have these guys tried writing their code in Latin? Might speed up their calculations a bit. Just a thought! Can’t hurt to try!

    1. AndrewD says:

      See this program language
      Lingua::Romana::Perligata — Perl for the XXI-imum Century

    2. Gordonjcp says:

      There was, some fifteen years ago, a Klingon programming language in development. Yes, really.

  3. niqaeli says:

    It becomes clear that not only are these researchers not from the chemistry departments, they have also not taken *anything* more than general chemistry, because I have taken a grand old two semesters of organic chemistry, and those structures made my face do all kinds of gymnastics. Two semesters! I scraped through the bare minimum needed for my biology degree. But I remember enough for a structure containing a four-membered ring consisting of *three nitrogens* to be rather a red-flag.

    But I’m more awake now for all the facial gymnastics, so perhaps they can market their method as a generator of structures that can replace caffeine when given to someone who knows anything about chemistry.

  4. Mr. Eldritch says:

    I’m coming at this from an ML / “deep learning” perspective, and know ~zero organic chemistry, but I’m not terribly surprised this didn’t work. An autoencoder + search seems like maybe the second most basic, naive approach someone would try if they were going to try and use neural networks on this problem, and it would be a little disappointing if it worked! I’m not sure their choice of input encoding or network architecture was particularly well-suited either.

    I think the idea of using neural networks for this problem has meat on it, though; deep learning tends to be very good at extracting complex, difficult-to-explicitly-specify patterns from data, and it would be very surprising if it weren’t possible to at the very least match human (still very bad) performance in guessing which molecules might be good drug candidates. (Which, as you note, isn’t really a bottleneck, but it’d be nice.) Very possibly it could exceed it, given enough data and a clever architecture.

    However, I’m not sure there’s enough data available. Deep learning, with current methods, is extremely data-hungry; you might need millions of example structures with good labels, depending on what you were trying to train. Given how tedious, expensive, and slow it is to check whether a single molecule is (for example) toxic in rats, and how low-quality much of the literature apparently is, it may not be possible to label enough structures to train a network to guess tox!
    (Although, if you could get enough data and a good input structure, I’m pretty sure a network could actually learn that.)

  5. Chris Ing says:

    This isn’t meant to be a method for generating huge numbers of structures, or replace algorithms that construct a library of compounds combinatorially. It’s about placing molecules in a continuous space, such that you could take a target molecule and move to related molecules in different directions. For example, you could potentially move in the direction of decreased toxicity, increased solubility, or stability. The fact that you have to remove invalid structures is pretty minor, it just shows that this is a work-in-progress.

    1. Derek Lowe says:

      That’s a good point, but the questions, for a chemist who might be using the end results of such a program, are how toxicity, solubility, stability and other such variables are dealt with computationally. Is logP the proxy?

      1. HRC says:

        As far as I understand, there is no proxy.

        Take solubility, you can train a neural network (I think there already some out there) to go from the continuos space to your target (In this case, a super-soluble molecule).

        That neural network could be trained with anything. You could chose your output molecule to be super-drug-like, super-easy-to-make, super-like-organic-chemist-are-used-to-see, and so on. You could focus on one of them, or use them all at the same time. It is up to the user to define what kind of molecules they want to see coming out.

        1. M says:

          I’m too tired to go through the paper, but why is this different (in an interesting way) than encoding a molecule in one of the many existing representations that are bog standard (and convert it to a multi-dimensional representation) and using machine learning methods on that?

          From the abstract the one thing I’d figure is precisely that they can produce molecules that haven’t been previously thought of. (Most existing encodings lose information and can’t be decoded into a single, unique molecule.) But if that’s the case the criticism–that some subset of the novel molecules produced make no sense in many cases–seems on point.

  6. Federico V says:

    Hi Derek,

    not an author of the paper, or affiliated with them in any way, but I think you missed the main novelty of the idea. The space of molecular compounds is discrete: there are only so many arrangements of atoms and molecules. Mathematically, optimizing a discrete valued function is incredibly difficult for one main reason: you cannot meaningfully calculate the gradient, and almost all optimization methods that scale to high dimensions require knowledge of the gradient.

    What the authors try to do is learn a mapping between a continuous space and the space of existing molecules. If they can pull this off – then you can carry out the optimization in the continuous space, which makes the mathematical problem much easier.

    I think your criticisms are spot on – and some of the molecules that are generated are obviously critically flawed, but I think you missed the main novelty of the paper.

    1. Derek Lowe says:

      I’m adding something about that – thanks to you and the others pointing this out!

  7. ChemTSS says:

    I notice that chemists are very defensive when it comes to these computer programs that can generate/predict structures. Every time one is published there is lots of bashing. What I see (also as an organic chemist) is a lot of structures that are not viable for synthesis. On the other hand I also see more diversity in proposals than what I see in med chem talks, i.e. the very imaginative methyl to ethyl to isopropyl.

    When a first year graduate student proposes a crazy mechanism I do not write them off and make fun of them, I correct them and help them learn. These programs are the same way, If we do not like the output we should colaborate and help make it better, not shamelessly beat them back.

    Us Organic Chemists are stuck in the past! If we could get a computer to do the things such as the most basic of predictions, that allows us as chemist to explore much more interesting problems in med chem.

  8. Anon says:

    Fully automated shit generation.

    And that’s putting it nicely. I feel sorry for the chemists that will be forced to waste their time trying to make this crap, rather than making the molecules *they* would rather make based on their own expertise which is no longer trusted, because computers are so much smarter. [/sarcasm]

    1. A Nonny Mouse says:

      I have actually been through the process of trying to make stuff that was thrown up by “in silico” design during the death throws of a company that had managed to blow £20m of VC money.

      They had got rid of most of the staff and wanted one last attempt with the money that was left. The compounds had already been pre-screened to get rid of the absolute junk and 80% of what was left was just regular junk (enol ethers, enamines and the like). The rest were a diverse array of molecules with no consistent thread and so making a selection of them within time frame and budget required not the most likely to be active but the most likely to be made.

      In the end they were all totally inactive on the intended target anyway.

      1. Anon says:

        Well at least you kept your job for a while longer than most as the company accelerated into a wall.

        1. A Nonny Mouse says:

          It was contracted out to me; I’m still standing

      2. Go master says:

        …and Go software was crap not a year ago and now we will not have a human beat a (top-level) computer ever again.

        1. Anon says:

          Go software works because all the rules and moves are fully defined and predictable and limited in complexity. Biology isn’t, so don’t even pretend that the two are comparable.

          1. Biology master says:

            Biology IS, just the scale is different. Vastly different. But the same kind of discussions were taking place between chess and go players not so long ago, and here we are.

            My point is not to diminish the incredible and daunting task that chemistry/biology/medicine has in from of them, my point is that it is foolish to bet agains AI. Even worse, to dismiss it beforehand when it presents itself as a tool to help, and not to replace.

          2. Anon says:

            No, you just don’t understand the fundamental difference. With chess and Go, the rules (and thus all the required information) are fully defined within the algorithm. But with biology, they aren’t; it requires doing actual experiments, and there are no shortcuts for that.

  9. Foxtrot says:

    I would recommend checking the paper “Generating Sentences from a Continuous Space” by Samy Bengio at Google Brain & Others. They use the same techniques to interpolate between sentences written in English. That one has had a huge impact in many fields (clearly now, even Chemistry) and some of the sentences included there (particularly in the SI) make very little sense, but nobody would make the claim that the authors “must not be English native speakers”.

    PS Looking at the affiliations there are many Chemists in the author list.

  10. Curious Wavefunction says:

    Agree that the novelty here is the method and its potential, not the results. In some sense it’s like virtual screening where you predictably get a lot of crap (wrong tautomers, strained molecules, weird bond orders etc.). It’s the job of the chemist to sift through this deck and use intuition to pick out promising structures, and most computational chemists and medicinal chemists worth their salt collaborate in doing this on a regular basis.

    Just like with virtual screening, the main goal with such methods is to generate diversity and point the chemists in directions which they may not have gone in otherwise, not to provide a ready list of molecules that you can make out of the box. As one commenter above says, filtering for weird structures is actually a fairly trivial task that can itself be automated.

    This paper reminds me of something Peter Thiel has said about AI and automation; it was something to the effect that it’s a mistake to think of algorithms as ‘replacements’ for human thinking, instead they should be thought of as complementing or supplementing human thought. The best results would come from algorithms generating results and humans classifying and refining them, perhaps using other algorithms.

    1. Crocodile Chuck says:

      Tyler Cowen writes about just this approach in ‘Average is Over’

  11. exGlaxoid says:

    The problem is that far too often some young computational chemist will get those results and actually propose compounds based on that, often to med chem projects that have to struggle not to laugh out loud. I have a lab mate who has gotten compounds to make that were almost in the pentavalent carbon range, as well as computational chemists who propose 2-methylphenethylamine in a set of starting materials and can’t understand why the organic chemists are sighing. Or compounds for brain based receptors with a guanadine and carboxylic acid on the molecule. Sometimes it is nice when the starting point for your chemistry is a actual existing molecule core.

  12. Anon says:

    Medicinal chemists, like luddites are so nineteenth century!

    You guys are the taxi drivers before Uber! You’ll come crying to your robot overlords asking for jobs.

  13. Barry says:

    Ultimately, our biological targets see shape, charge and polarizability. The actual small molecules we build to display those properties are constrained by the menu (CHNOPS, occasionally B) and by bond-lengths and bond-angles. Those strained-ring compounds (aziridines, cyclobutadienes..) that we chemists discard do fill (with their fractional hybridizations) some bond-angle space that we can’t access otherwise.
    That doesn’t make them drug-like or even lead-like. But it does remind us that small-molecule space–although vast–is not dense. There are itches we can’t scratch.

  14. Anon says:

    I don’t know which is more stupid – AI, or the folks that feel threatened by it, Or maybe just me.

  15. Anon says:

    Quantity has never been the problem in drug discovery. It has always been quality, and this just makes matters worse.

  16. Canman says:

    Had the computer guys do some modeling/docking stuff once. Most compounds they suggested that fit the best pretty much looked like graphene. Less than useless. And this was supposed to be their job, what they were hired for, and were supposedly good at.

  17. John says:

    Maybe I am cynical. With the AI, we are moving from making junks to design junks.

  18. John says:

    The article seems to completely ignore the 40 years of previous work in inverse QSAR, which solves exactly the same problem: Going from a continuous representation of molecules to a discrete one.

    This idea is not novel!

    Check for example:

    1. John says:

      Are there any molecules in that paper?

  19. Emjeff says:

    This reminds me of a regression analysis I saw years ago. The model included a large number of terms (selected by the computer), many of which seemed contradictory based on the pharmacology of the compound. When I asked the analyst why he included so many terms in the model, he said ” That’s what the computer told me to do”.

    1. Anon says:

      Skynet. Skynet I tell you.

  20. Jack Straw from Wichita says:

    looking forward to the process chemistry on the bis-cyclobutadiene compounds

  21. Peter S. Shenkin says:

    Whether or not it’s unreasonable to look at cyclobutatdienes, it’s certainly unreasonable to consider them aromatic…

    1. Gerben van Straaten says:

      And putting two of them right next to each other makes me think the computer was told to look for either CROMP monomers (very generous interpretation) or explosives (much more likely)

  22. Alum says:

    How is no one commenting that the senior author *is* faculty at Harvard *chemistry*?? You would think he would have an idea about what is and is not reasonable, or at least walk down the hall and talk to someone!

  23. Project Osprey says:

    Why does it shown some structures as delocalised aromatic and some as Kekule aromatic? Is it treating the two as being different?

    1. Derek Lowe says:

      I wondered about that one, too – we may get some clarification shortly, though.

  24. Dominic Ryan says:

    I wonder if this is a problem of effective resolution?
    The paper uses a 250K set of ZINC molecules as a training set for drug-likeness. Despite this there are lots of non drug-like compounds output.
    A neural net, especially a large one with many latent variables, is essentially placing ‘similar’ molecules within the numeric resolution of some mapping function. The decoding is neat to to produce new structures. But, if one were to apply more rules I wonder if this would break down? They mention having to limit ring sizes in ‘optimizing’ LogP, no surprise there! They certainly need to re-optimize by eliminating PAINS and adding rules that are probably not part of any PAIN set such as no anti-aromatic compounds.
    More than that though, I wonder what would happen if they also were to re-optimize by restricting high self-similarity? My speculation is that the gradient would no longer be stable since their method appropriately includes stochastic sampling.
    For me the real test would be to compare the diversity of results from this method with the diversity you get from a pharmacophore screen of a database. I am not convinced that being able to optimize a gradient of highly self-similar compounds solves a problem.

  25. myma says:

    There are/were at least 4 or 5 companies I am aware of that have tried this – Schrodinger, Concurrent (now Vitae), Nimbus, one or two others that went bust. There are ways to filter out most of the shite (MW, number of heteroatoms, and and and and), and also many papers describing fragments useful for same (and weighting factors too) (and what not to connect to each other rules).

  26. Curt F. says:

    I really liked this paper and saw its key advance as a new way to map chemical structures (a discrete set) onto a continuum. Gradient optimization is really nice.

    I feel like the fact that it comes up with crazy structures is a sign of its novelty. Whether a nonzero fraction of those crazy structures will ever be useful is an important question but outside the scope of their work.

    The bigger limitation which I haven’t seen anyone discuss here is size of the required “training sets”. I think you need many valid structures and associated property of interest to properly train one of their “autoencoders”. If you want to use cLogP, sure you can find hundreds of thousands of structures and associated cLogP values, no problem, but for how many other properties are there going to be tens of thousands or even just thousands of valid (structure, property) pairs? In addition to needing the paired structures and property values, I’m pretty sure the property values in the training set would nearly to “span” the range of potential property values pretty evenly. Say you did have 10,000 structures and a a matched affinity to some target. I don’t think this technique would be viable if 100 of the structures had picomolar affinity and the rest had >millimolar (i.e. zero) affinity. Instead you’d need 10,000 structures which approximately evenly spanned a range of affinities, i.e. 1000 compounds with pM affinity, 1000 compounds with nM affinity, etc. And I don’t think those datasets are easy to come by!

    1. Morten G says:

      Pretty sure that cLogP is about as accurate as cLogSolubility these days. Whether that means that cLogP is overrated or cLogSolubility is underrated is not for me to decide.
      But you are right about the ability-to-bind-proteins-tightly score. That’s lacking.

  27. Gareth Wilson says:

    This reminds me of John D Clark’s experience with computer-generated rocket propellents. In “Ignition”, he said he was afraid to even draw the structures, let alone synthesize any of them.

  28. Gordonjcp says:

    “or chemically implausible (a fluorocyclobutadiene, a diaminocycloheptatriene)”

    But the Klapötke lab folks’ ears pricked up at those – “Hey, I bet they’d be good with a few nitrogens crammed in!”

  29. Robert Burns W says:

    This type of calculation is too abstract to be meaningful to the pharma industry. Reminds me of the calculations that the number of molecules below 500 was 10 to the power of 60- too many assumptions made and others have since come up with much smaller estimates.
    I liked this paper from some years back where they tried to enumerate all possible heterocycles that are drug-like. At least it was from a pharma company and had a dose of reality and they even suggested some heterocycles that were unknown and interesting to make.
    J Med Chem. 2009 May 14;52(9):2952-63. doi: 10.1021/jm801513z.
    Heteroaromatic rings of the future.
    Pitt WR1, Parry DM, Perry BG, Groom CR.

  30. Scott says:

    I can actually see two potential problems with the program as it currently exists:
    The first one has already been beaten into a thin red smear, the fact that the program comes up with a lot of non-useful compounds.
    The second one is related, in that the real chemists doing the junk-removal may toss some oddball possibilities because they look like junk or “that will never work”. Not that the PhDs in Chemistry don’t generally know what’s going to work, but sometimes stuff that works is stuff that we never would have guessed would work.

  31. loupgarous says:

    Once again, there are ominous parallels with the experience of John D. Clark (of the Naval Ordnance Test Station). On page 172 of his deathless Ignition!. Dr. Clark says:

    “Just as Wharton was starting his IBA work, there occurred one of the weirdest episodes in the history of rocket chemistry A. W. Hawkins and R. W. Summers of Du Pont had an idea. This was to get a computer, and to feed into it all known bond energies, as well as a program for calculating specific impulse. The machine would then juggle structural formulae until it had come up with the structure of a monopropellant with a specific impulse of well over 300 seconds.

    It would then print this out and sit back, with its hands folded over its console, to await a Nobel prize. The Air Force has always had more money than sales resistance, and they bought a one-year program (probably for something in the order of a hundred or a hundred and fifty thousand dollars) and in June of 1961 Hawkins and Summers punched the “start” button and the machine started to shuffle IBM cards. And to print out structures that looked like road maps of a disaster area, since if the compounds depicted could even have been synthesized, they would have, infallibly, detonated instantly and violently. The machine’s prize contribution to the cause of science was the structure,

    H—C= C—N N—H
    O O
    F F

    to which it confidently attributed a specific impulse of 363.7 seconds, precisely to the tenth of a second, yet. The Air Force, appalled, cut the program off after a year, belatedly realizing that they could have got the same structure from any experienced propellant man (me, for instance) during half an hour’s conversation, and at a total cost of five dollars or so. (For drinks. I would have been afraid even to draw the structure without at least five Martinis under my belt.) ”

    Solomon was right. There is nothing new under the sun.

    1. Bender says:

      Are you really quoting 1961 to evaluate what AI can or can’t do?

      Do you realize that we have self-driving cars today? Full flight auto-pilot? Software that can recognize faces? Create music?

      1. loupgarous says:

        Did you read Derek’s post? He related the exact same thing happening now that happened in 1961 – sterically improbable, unstable, unsuitable to the target use and – I said “unstable”, right? – compounds. But I don’t work in med-chem. Listen to all THOSE guys say “it happened again”. I’m not even the first guy to mention that episode, or that Clark put it in his book.

        It’s not my fault expert system algorithms for generating chemical structures haven’t improved much in over fifty years. Driving a car down the road? Since my oldest kid was swiping the family car late at night at the age of 14, it literally is “child’s play”. And when a computer has to be wheeled up to the stage to accept a Grammy, we can talk about how good they are at scoring music.

  32. David Edwards says:

    As an antidote to papers like the one featured in your blog post, Derek, you might like this one>/a>, that involves some actual chemistry …

  33. Some codes use only molecular fragments that have been used before in active molecules (that way the cyclobutenes and so on are avoided). What is your opinion on that approach?

Comments are closed.