Skip to main content

Chemical News

More On Automated Medicinal Chemistry

I wanted to say a bit more about the Cyclofluidics automated med-chem system I wrote about the other day. Here’s another PDF from the company about the software used in the device. And here’s how the software is making its decisions:

These design methods utilise Random Forest activity prediction employing one or both of two design strategies – “Chase Objective” and “Best Objective Under-sampled”. Regression models are built using the Random Forest method implemented in R accessed via the Pipeline Pilot ‘Learn R Forest Model’ component4. The default settings of this component are used except for the molecular descriptors as follows:
1. AlogP
2. Molecular Weight
3. Number of H Donors,
4. Number of H Acceptors,
5. Number of Rotatable Bonds, 6. Molecular Surface Area,
7. Molecular Polar Surface Area
These are used together with ECFP_6 fingerprints. The fingerprints are not folded and are used as an array of counts. The Random Forest algorithm was chosen because it can perform non-linear regression, is resistant to over- fitting and has very few tuning parameters. In all cases the dependent variable is pIC50.

That’s about what I’d figured. I wrote about what turns out to be an early application of this device last year, in an effort to find Abl kinase inhibitors. At the time, it struck me more as “automated patent busting”, because the paper had quite a leg up based on SAR from competitor compounds. That seems to me to be the best use for this sort of system – where you know something about the core (or cores) that you’re interested in, and you want to vary the side chains in a systematic fashion. The appealing thing is the possibility of very fast cycle time with an integrated assay. I’m not sure if the machine is robust enough to leave it going over the weekend, but it would be nice to come in on Monday and find the most potent derivatives already rank-ordered for you.
Potency, of course, is not everything. But it’s a reasonable place to start, because you’d rather have activity to burn in the pursuit of selectivity, ADMET, etc., because burn it you often must. And it’s also true that not every SAR follows such an algorithmically reducible path, in which case I imagine that the software (and hardware) must end up thrashing around a bit. (Well, so do we med chemists, so why should the machines miss out on the experience?)
And as mentioned before, the chemistry has to be amenable to reasonable-yielding wide-ranging techniques. I would imagine a lot of amides, a lot of metal couplings, some of the easier condensations – to be honest, the run of compounds I’m making in the lab right now would work find. My grandmother could have prepared these, were she still with us, because it’s dump-and-heat-overnight stuff. The problem is that the assay that these compounds are going into is most definitely not ready to be easily automated yet; it requires the ministrations of a good enzymologist. So this machine lives in the Venn-diagram intersection of “Robust chemistry/robust assay/comprehensible SAR”, but in that space it would seem to be pretty useful. Oh, there’s another circle in that diagram: “Chemistry that doesn’t crash out in the flow channels”. I’m a big fan of flow chemistry, but the wider you cast your net in an array of flow experiments, the better your chances of finding the Monolithic Mega-clog. I, too, have stop-started straight DMSO through the tubing while holding a heat gun.
A reader, “Brian”, has sent along some background on the Cyclofluidics instrument:

I can answer your query of whether an expert system containing “med-chem lore” could exist.
The “Cyclofluidic” Business and Technical Plans of 2008 were developed and presented for UK Government SME funding by SMEs who in 2006 had combined their know-how and technology to provide a successful demonstration of an iterative ‘closed loop’ system using a flexible reagent palette and field-based software able to jump to unforeseeable chemotypes with improved activity. The system had two learning loops and relied on large knowledge databases. One loop used iterative ‘rational design, synthesis, test and re-design’ cycles set to define potential lead chemotypes. The other loop cycled much more frequently, levering the extreme frugality and speed of sub-microgram, nano-flow chemistry to investigate and optimize the reactions needed to reach novel hypothesized targets. As the approach was essentially based on 3-dimensional whole-molecule bioisosterism (rather than 2-D atom-centered charged structures), much of your concern regarding the need to avoid a “core and substituents” approach and have a rapid route-scoping process was addressed. More details of the method and how the next targets are chosen can be found here.
For reasons that they have never fully shared, the UK Government funding body (TSB) were persuaded that all of the SME fund allocated on the basis of the 2006 proof of concept would be better spent on a zero-based start by ex-Pfizer and UCB staff via a jointly fully owned SME, Cyclofluidic Ltd, created in August 2008 for the purpose. None of original system developers was funded or employed. Although many aspects of the original hardware configuration are clearly carried through to Cyclofluidic, as far as I know their “design algorithm” has never been described. Therefore while I can tell you how the next round of analogues were chosen automatically in the predecessor system, I cannot answer your specific question of how Cyclofluidic does it other than it is not the same.

Interesting. And I also wanted to highlight this comment from the previous post, in case the server problems have made it too hard for people to see it:

. . .Another big pharma company spent well over $200 million building 2 automated chemistry buildings and filling them with automation that was going to automate all of synthesis, purification, handling and screening of pharmaceuticals.
The systems clogged constantly and broke often, the software was too complex to anyone to use, and the chemistry was very limited and made amide and triazines. The company also bought two other companies that automated chemistry and made vast libraries of amides and triazines, both of which eventually also went away. The real issue is not finding leads, it is discovering drugs.
Both buildings are now empty and each one never produced as many compounds as a small team of chemists did at each of two other small groups which were not given the budget to fully automate chemistry, but merely allowed to buy some useful tools for simple automation, some of which worked really well, like automated vial weighers, prep HPLC systems, and Gilson 215 liquid handlers. And several other groups of chemists made simple compounds the old fashioned ways, which lead to more more clinical drugs than either.
So of course all of the groups of chemists were laid off, and the few groups than made zero marketed drugs were kept. I guess that statistically speaking, they are due.

Food for thought. The situation described in Arthur C. Clarke’s classic story “Superiority” is always waiting for us.

11 comments on “More On Automated Medicinal Chemistry”

  1. A Nonny Mouse says:

    And they have just received another £1.5m from the UK government.
    Other stuff going on with them and one of my contacts about design, but can’t say any more!

  2. Barry says:

    Of course, if:
    1-your hit cmpd came from a combinatorial library,
    2-you synthesized/screened only what you hoped was a representative sample of the possible compounds that your chemistry allows,
    the first thing you do on finding a hit is to synthesize closely related cmpds by that same combinatorial chemistry. To the extent that you’ve worked it all out to make that first screening set, yeah, it’s automated. That doesn’t replace med.chem. It’s just the quickest way to lay down some SAR around your hit.

  3. Anonymous says:

    Trolls to commence bashing of computational chemistry in 3, 2, 1 . . .

  4. John Wayne says:

    I’m surprised that nobody has brought up another limitation of systems like this; how many reagents does it have access to? Even if you limited it to Suzuki couplings, having a reservoir of five hundred boronic acids (and some smaller number of catalysts, additives and solvents) is both a headache and doesn’t cover chemical space that well. We could have the computer order up fragments it wants to try. Until a vendor sells cartridges of chemicals for these machines, loading them with reagents is going to be a human operation. Feed me, Seymour!

  5. I am the person who wrote the design layer of the Cyclofluidic system and have been reading this and previous posts (and comments) with great interest. I was a paid consultant to Cyclofluidic, the comments below are my opinion and not Cyclofluidic’s.
    First some technical points. The QSAR modeling is indeed done by random forest regression. This is a widely accepted method for doing regression but this does not set this system apart. The system also contains an implementation of the Free Wilson method and more methods could be added. There is no reason a 3D method as proposed by ‘Brian’ would not fit in. The system is unique because of the single compound design feedback, a compound is designed only after the outcome of the previous compound is known. If you have ever played the games ’20 questions’ or ‘who is who’ you know how much more knowledge is gained from asking 20 questions in serial instead of 20 questions at once. When few compounds have been made and assayed the model is probably not very good. However it learns quickly since adding a single data point to a small training set has a large impact on the next model. Human designers in most cases have to design the next synthetic targets in batches, often when the activities of the previous targets are not yet known. This is not a level playing field.
    The system has two search strategies ‘chase potency’ and ‘most active undersampled’. These strategies are more important how smart the system is than the random forest regression. The first strategy will select each round the compound that is predicted to be the most active. This method may get stuck in a local minimum in the virtual compound space and is therefore more applicable when some searching has been done with the ‘most active undersampled’ method. Here the compounds are ranked by how often the constituting reagents have been used. Typically this ranking results in groups of compounds that are equally undersampled so the predictive most active of the group is selected. This search strategy is a hybrid between Design of Experiments and Active Learning, which are both validated and widely used methods but underused in medicinal chemistry.
    Note that the search strategies have been renamed to ‘chase objective’ and ‘best objective undersampled’ since the system does not have to optimise potency alone and actually works with multi-parameter desirability functions. There is no reason only one assay should be coupled to the system, this could easily be two or more and the design driven by selectivity instead of potency. The desirability score can also contain predicted properties like model scores, for instance if a customer has a good in-house model of an anti-target.
    To all readers who have used the word overlord or similar. No, this system does not take the human out of the equation and no, the human is there for more than maintaining the system. Note that the largest part of a design is done by selecting which series to work on and which reagents to add to the system. If a reaction uses primary amines there are thousands to choose from, selecting the few that make it into the system has a large influence on what the hits will look like. This is real medicinal chemistry design work and when you put in the methyl/ethyl/futile reagents don’t blame the machine for the outcome. Look at the original publication and see how much design work went into the template and the reagents bringing in the Rgroups.
    The question of ‘what compound to make next’ has been translated into ‘what chunk of chemical space to explore next’ (maybe the latter should read ‘what chunk of chemical space to have explored for me next’). In my field of software/algorithm design having an idea is what excites me, coding it up is mostly drudgery. I would assume that medicinal chemists experience the same excitement when having an inspired idea for the next design, but that making it (waiting for it to be made in China) is not that much fun?

  6. simpl says:

    The last quote by ex Glaxoid, on the decay of machines without financial and human support, reminded me of the weakness in the film Logan’s Run – How could decaying machines support a future city when the humans in it didn’t realise what kept their cultures alive?

  7. a. nonymaus says:

    Re: 6
    You weren’t expecting the machines to be taken care of any better than the people were you?

  8. Rube Goldberg says:

    I think this is great. True innovation.

  9. Heath Robinson says:

    @ #8 – I agree. Smashing success!!

  10. Anonymous says:

    In case you didn’t realise from my follow-up post….unless i’m very much mistaken the people who helped design those buildings at the ‘big pharma’ company in question in your comment above went off and got Venture Capital…. to start Cyclofluidic.

  11. anon says:

    ugh, until the comments sections work again, I’m gone. very frustrating.

Comments are closed.