I wanted to say a bit more about the Cyclofluidics automated med-chem system I wrote about the other day. Here’s another PDF from the company about the software used in the device. And here’s how the software is making its decisions:
These design methods utilise Random Forest activity prediction employing one or both of two design strategies – “Chase Objective” and “Best Objective Under-sampled”. Regression models are built using the Random Forest method implemented in R accessed via the Pipeline Pilot ‘Learn R Forest Model’ component4. The default settings of this component are used except for the molecular descriptors as follows:
2. Molecular Weight
3. Number of H Donors,
4. Number of H Acceptors,
5. Number of Rotatable Bonds, 6. Molecular Surface Area,
7. Molecular Polar Surface Area
These are used together with ECFP_6 fingerprints. The fingerprints are not folded and are used as an array of counts. The Random Forest algorithm was chosen because it can perform non-linear regression, is resistant to over- fitting and has very few tuning parameters. In all cases the dependent variable is pIC50.
That’s about what I’d figured. I wrote about what turns out to be an early application of this device last year, in an effort to find Abl kinase inhibitors. At the time, it struck me more as “automated patent busting”, because the paper had quite a leg up based on SAR from competitor compounds. That seems to me to be the best use for this sort of system – where you know something about the core (or cores) that you’re interested in, and you want to vary the side chains in a systematic fashion. The appealing thing is the possibility of very fast cycle time with an integrated assay. I’m not sure if the machine is robust enough to leave it going over the weekend, but it would be nice to come in on Monday and find the most potent derivatives already rank-ordered for you.
Potency, of course, is not everything. But it’s a reasonable place to start, because you’d rather have activity to burn in the pursuit of selectivity, ADMET, etc., because burn it you often must. And it’s also true that not every SAR follows such an algorithmically reducible path, in which case I imagine that the software (and hardware) must end up thrashing around a bit. (Well, so do we med chemists, so why should the machines miss out on the experience?)
And as mentioned before, the chemistry has to be amenable to reasonable-yielding wide-ranging techniques. I would imagine a lot of amides, a lot of metal couplings, some of the easier condensations – to be honest, the run of compounds I’m making in the lab right now would work find. My grandmother could have prepared these, were she still with us, because it’s dump-and-heat-overnight stuff. The problem is that the assay that these compounds are going into is most definitely not ready to be easily automated yet; it requires the ministrations of a good enzymologist. So this machine lives in the Venn-diagram intersection of “Robust chemistry/robust assay/comprehensible SAR”, but in that space it would seem to be pretty useful. Oh, there’s another circle in that diagram: “Chemistry that doesn’t crash out in the flow channels”. I’m a big fan of flow chemistry, but the wider you cast your net in an array of flow experiments, the better your chances of finding the Monolithic Mega-clog. I, too, have stop-started straight DMSO through the tubing while holding a heat gun.
A reader, “Brian”, has sent along some background on the Cyclofluidics instrument:
I can answer your query of whether an expert system containing “med-chem lore” could exist.
The “Cyclofluidic” Business and Technical Plans of 2008 were developed and presented for UK Government SME funding by SMEs who in 2006 had combined their know-how and technology to provide a successful demonstration of an iterative ‘closed loop’ system using a flexible reagent palette and field-based software able to jump to unforeseeable chemotypes with improved activity. The system had two learning loops and relied on large knowledge databases. One loop used iterative ‘rational design, synthesis, test and re-design’ cycles set to define potential lead chemotypes. The other loop cycled much more frequently, levering the extreme frugality and speed of sub-microgram, nano-flow chemistry to investigate and optimize the reactions needed to reach novel hypothesized targets. As the approach was essentially based on 3-dimensional whole-molecule bioisosterism (rather than 2-D atom-centered charged structures), much of your concern regarding the need to avoid a “core and substituents” approach and have a rapid route-scoping process was addressed. More details of the method and how the next targets are chosen can be found here.
For reasons that they have never fully shared, the UK Government funding body (TSB) were persuaded that all of the SME fund allocated on the basis of the 2006 proof of concept would be better spent on a zero-based start by ex-Pfizer and UCB staff via a jointly fully owned SME, Cyclofluidic Ltd, created in August 2008 for the purpose. None of original system developers was funded or employed. Although many aspects of the original hardware configuration are clearly carried through to Cyclofluidic, as far as I know their “design algorithm” has never been described. Therefore while I can tell you how the next round of analogues were chosen automatically in the predecessor system, I cannot answer your specific question of how Cyclofluidic does it other than it is not the same.
Interesting. And I also wanted to highlight this comment from the previous post, in case the server problems have made it too hard for people to see it:
. . .Another big pharma company spent well over $200 million building 2 automated chemistry buildings and filling them with automation that was going to automate all of synthesis, purification, handling and screening of pharmaceuticals.
The systems clogged constantly and broke often, the software was too complex to anyone to use, and the chemistry was very limited and made amide and triazines. The company also bought two other companies that automated chemistry and made vast libraries of amides and triazines, both of which eventually also went away. The real issue is not finding leads, it is discovering drugs.
Both buildings are now empty and each one never produced as many compounds as a small team of chemists did at each of two other small groups which were not given the budget to fully automate chemistry, but merely allowed to buy some useful tools for simple automation, some of which worked really well, like automated vial weighers, prep HPLC systems, and Gilson 215 liquid handlers. And several other groups of chemists made simple compounds the old fashioned ways, which lead to more more clinical drugs than either.
So of course all of the groups of chemists were laid off, and the few groups than made zero marketed drugs were kept. I guess that statistically speaking, they are due.
Food for thought. The situation described in Arthur C. Clarke’s classic story “Superiority” is always waiting for us.