Skip to main content

Chemical News

How Deep Is That Literature?

The literature of synthetic chemistry is large, and it goes back well over a century. Those of us who know the field sometimes despair of the state that literature is in – it can be pretty messy – but we really shouldn’t. It’s actually far more orderly than many other fields, and it has a lot of aspects that make it intrinsically more “organizable”, not least the backbone of chemical structures that underlies it. Now, it’s for sure that not all those structures are drawn correctly and that not all those transformations of them actually work when you try them, but at least there’s a structured form to the data, as opposed to (say) the literature on rodent behavioral science or something.

This makes the chemical literature very attractive for a machine learning approach, and of course that’s just what we’ve seen in recent years. The advent of retrosynthesis software in organic chemistry is an applied example of just that, and there have been many other investigations into how to extract rules, trends, and even predictions of new reactions and substances from the existing literature. But to get any of those to work well – to get any machine learning approach to anything to work at all – you have to pay close attention to the state of the data that you’re pouring into the hopper. It needs to be reliable, well-formatted, wide-ranging, and with a good selection of both positive (here’s something that worked) and negative (here’s something that you’d think would work, but didn’t) results. All of those factors need some work, and by “some” it will be understood that I mean “sometimes a whole bunch”.

A useful way to check the reliability of a given transformation would be to see how many times it shows up in the literature. I know that the people building retrosynthesis programs think about this a lot. Einmal ist keinmal, as they say (one time is no time), and you wouldn’t want to fill up your database with a pile of one-off reactions that might not be real (or might only work under far more limited conditions than the titles of the papers might lead one to believe!) Here’s a new paper that looks into just that question of finding repeat syntheses and what that tells us about the chemical literature.

The authors, from Georgia Tech, look at the metal-organic framework (MOF) literature, and I’d say that’s a good choice. I did a fair amount of MOF work a few years ago (on the “crystalline sponge” X-ray structure idea, if you’re wondering), and if you’ve never looked at that stuff, let me tell you that the literature in that field is a massive shaggy pile. There are a zillion MOFs out there, produced under a ridiculous number of synthetic conditions, and the barrier to making new ones is extremely low. I mean it – you can step right up to your hood and within a few days make some that have never been reported before. I sure did, and it was a blast.

Those things can form spectacular crystals, and what synthetic chemist doesn’t like that? I would set up a whole line of sealed vials with various combinations of metal salts, multivalent ligands, and additives and heat them up in something like DMF for a few days, and likely as not collect a series of brand-new MOFs. This accounts for the vast “stamp-collecting” literature on these things. They’re generally not that hard to collect X-ray data on (all those metal atoms!), and even a Neanderthal like me could get decent data sets, although you don’t want me to be the guy who processes and refines them. Below are some of the many that I prepared with my own hands, and I can’t tell you when I’ve had a better time in the lab. If you’re guessing cobalt as the first metal and copper as the third one, right you are. Now, getting them to do what I wanted them to do (sequester small molecules in an ordered fashion) was another topic entirely, but making crystals to try that out on? Oh, yeah. If you’re going through one of those periods where it seems that you can’t get anything to work in the lab, go make some MOFs – you’ll feel better quickly.

The Georgia Tech team used the CoRE MOF database, a curated collection of thousands of X-ray structures in the field. They selected 130 MOFs randomly from the pre-2014 literature. The papers describing these had been cited between 8 and 168 times (average 34 citations). What they found was that most of these had never been resynthesized at all, as you might expect, while others had been made multiple times:

Only 1 material was synthesized more than 3 times: a Zn-based MOF first produced by An et al. (16) with structure code SAPBIW (common name Bio-MOF- 100) has been synthesized 7 times, including 2 instances by groups distinct from the original authors. Seven of the 130 MOFs have been resynthesized by a group distinct from the original authors, and 15 of the MOFs have been synthesized more than once by anyone.

Now that’s for direct replication – if they broaden the field to modifications of the original synthesis, 65% of the 130 MOFs had had some sort of follow-up work. It seems quite possible, they note, that many of these papers also made the original substance along the way but did not bother to report it in the actual paper, so the replication statistics are likely lower bounds. A broader analysis of the MOF literature, though, picked up a short list of substances that had had hundreds of papers written about them, and the vast majority with no direct replications at all.

This is not the 80/20 distribution beloved of consultants everywhere – it’s more like a very few substances account for nearly all the replications while everything else tails out extremely fast. Their estimate is that 0.03% of the reported MOFs account for 50% of the replications, which does not make for as instantly memorable a PowerPoint slide title. Outside of those “super repeats”, the distribution broadly follows a power law, although proving that a power law is actually at work is a much harder task. How do some of these things end up on the greatest hits list? It’s impossible to say for sure, but the authors note that they involve inexpensive materials and undemanding experimental conditions (for starters), and that there are surely sociological factors at work as well, not least timing of the original publications.

How reproducible were all these repeat runs? In the MOF case, we have parameters like the Brauner-Emmett-Teller (BET) surface area, which is pretty easy to measure. The paper has some interesting plots of how these numbers have shaped up over time for the high-repeat substances. You can see a rough bimodal distribution in some of the moisture-sensitive ones, and tighter numbers in the ones known to be more robust, which makes sense, although the numbers do not seem to be converging over time, either. Having MOFed around myself, I feel sure that what we’re seeing are variations due to different levels of solvent removal (activation) before measuring, samples with varying amounts of impurities and small crystalline defects, etc., and the authors advance just such explanations.

So how widely applicable is this analysis? Of special relevance is the repeatability of synthetic methods, and I would love to see statistics on that. You’d have to deal with a lot of variation in conditions and be willing to loosen up your literature-search constraints, but a look at these situations could be very useful in setting cutoffs for machine-learning in the synthesis literature as a whole. The same sociological factors that made some MOFs super-popular have surely been at work in making some reactions and types of reactions popular as well (think, for example, of the wave of olefin metathesis reactions that hit the literature some years back). But reactions that just don’t work well don’t ever get to be popular at all. How many good or interesting ones are there, though, that never got their time in the spotlight? It might be worth mining the old journals for unusual transformations that didn’t get followed up on and devote some high-throughput synthesis effort to seeing how many of those things can be revived. Would anyone fund such an effort?

17 comments on “How Deep Is That Literature?”

  1. Barry says:

    reproducibility may not be as simple as that. Sure, a trained pigeon can oxidize a secondary alcohol to the ketone with Jones reagent; no one contests that that’s reproducible. But then there are cases like the Aluminum reduction Woodward/von Eggers Doering used on the way to Quinine. The reaction works, but perhaps the literature report under-specified how the reagent must be prepared.

  2. zero says:

    I’d like to see a ML bot trained to identify papers likely to have problems. Perhaps it comes up with a risk score that can be checked through replication studies or expert debate or something.

    If a tool like that existed and provided meaningful confidence scores for a body of literature, the results could make it much easier to train other ML bots since their input would be more reliable (pr perhaps, the learning dataset would have items whose reliability is known to a much higher degree of accuracy than simply assuming the whole published field is A-OK).

    The same tool could help identify papers that are likely to be solid as well. Exploring how it makes that distinction might help us pick up on patterns behind reliable work, which could feed back into the field and help people improve their work’s reproducibility.

    It’s not sexy and isn’t going to make the enterprise megabucks in sales, so it’s probably not going to happen unless someone invests in more of a ‘fundamental research’ approach to AI/ML.

    1. Student says:

      It could also be used to rewrite your paper just right to maximize all the important statistics. More reliable paper are ranked higher in search engines, are more cited and advance your careers better.
      That would be a direct effect after such a tool is published. But what you are asking for is a machine that can reliably tell if information is correct and rank it. We are (hopefully) far, far away from it. Otherwise there is no work left for humans.

  3. Glassveins says:

    Just out of curiosity, what’s the metal for the center picture?

    1. Derek Lowe says:

      That one is good old zinc!

  4. Med(iocre) Chemist says:

    “How many good or interesting ones are there, though, that never got their time in the spotlight? It might be worth mining the old journals for unusual transformations that didn’t get followed up on and devote some high-throughput synthesis effort to seeing how many of those things can be revived. Would anyone fund such an effort?”

    Isn’t that basically what Sharpless has been doing for the past 15 years or so? And I mean that in the best way possible. Ol’ Barry K learned German in school and has put that early childhood education to good use extracting nuggets from the old German literature. I’ve always wondered what’s lurking in all those old Japanese and Russian journals. I can read French but sadly that’s less useful here.

    Would be interesting to see if you could get a multi-lingual AI to canvass all of the more esoteric regional journals rather than relying on someone who happens to speak Latvian, for example.

    1. Derek Lowe says:

      I was thinking of Sharpless as I wrote that part, definitely!

      1. wildfyr says:

        That’s exactly what Sharpless did to create SuFEx. Mined some old German papers (I see 1927, 1930, and 1944 mentioned in his first SuFEx paper from 2014 ref#6) and noticed something interesting in a minor Synlett paper by Gembus et al in 2008, and boom, new click reaction.

    2. Anonymous says:

      Mein Deutsch ist nicht so gut but I’d slog through the lit with dictionaries at hand. I have previously mentioned In The Pipeline that you can find potential projects in the older lit (German, French, Italian, ENGLISH, etc.), I found quite a few “That can’t possibly be the correct product.” papers that I always wanted to repeat. One outcome would be a publishable correction of an incorrect structure. As mentioned here, another outcome could be the publishable verification of an old reaction and its extension in new directions.

      On the subject of reproducibility, the Katsuki-Sharpless is, sort of, its own example. I think that KBS has reported that the group had tried the same “recipe” before but it didn’t work well. To simplify things before packing up his Stanford lab to return to MIT in 1980, he put everyone on Asymmetric Epoxidation. They were pulling everything off the shelves and Katsuki got to do the DET work. I don’t remember what the key thing was to get it to work: sieves to reduce H20? The point being … even though they had tried it before and it failed, a new researcher went back and “repeated it” but with a tweak that proved to be consequential. … Likewise older published lit? (If anyone please has a better (sourced?) version of the K-S story, please correct my errors.)

      And I have to add a comment about BET: I was working on a materials science project and the grad students and post docs were supposed to be measuring surface area using BET. A dedicated apparatus was shared by a few groups and it worked well for SOME researchers. Many couldn’t measure anything reproducible! Eventually, samples had to sent out to a commercial lab. It might have been incompetence or inexperience. And that argument applies to other preparative and analytical methods, as well. Two hours of training is not always sufficient to use a technique adequately, let alone masterfully.

      1. David Borhani says:

        Molecular sieves was Bob Hanson’s (now St. Olaf College) innovation, ca. 1984. Allowed use of much less (5–10 mol%) of the active catalyst. Critical for AE of, e.g. allylic dieneols.
        Not sure what Katsuki did to make AE (very) reproducible, but it already was by the time I joined the lab in Fall 1980.

  5. Andre St. Amant says:

    I’ve never heard of the crystalline sponge method – I started reading Fujita’s paper on it ( It didn’t take me long to realize that this probably won’t work for a protein!

  6. Marya says:

    Maybe this would be a use for all those sophomore organic lab reports on standard synthetic preparations…ferrocene must be acetylated thousands of times in hundreds of labs each year.

    1. Retro Sinner says:

      That brings back a memory of Chris Moody’s 2nd year undergrad organic lab at Imperial College 1986/87. Making ferrocene, subliming it and acetylation as well as making Ni(P(OEt)3)4 complete with inert sealing in a handmade vial with plenty of green showing across the samples – oops. I was just starting out being fascinated by transition metal chemistry and the colours, bonding and reactivity and it felt like magic.

    2. Anonymous says:

      I had to make ferrocene in 1st semester orgo lab and I followed my TA’s advice and instructions and cooled my solution in ice. I got a good yield of fine orange powder. The head lab instructor (teaching staff professor) took off points because we were supposed to let it cool slowly in our locked lab drawer and filter off large crystals at the next lab class. Several students were getting excellent yields of big chunks of ferrocene.

      During 3rd semester orgo lab, the Prof in charge of the course made an announcement. “It has come to our attention that some of you who are doing undergrad research in the research labs are using those labs for your lab course experiments. That is not allowed.” It turns out that a lot of undergrad ferrocene came out of the commercial bottles in research lab stockrooms. Other lab course experiments were repeated over and over with help from experienced grad students, in labs much more well equipped than the UG lab, until the best outcomes were obtained and submitted as undergrad lab reports. Those not working in research labs were limited to the undergrad lab “open hours” only.

  7. Sisyphus says:

    Today’s version of buckyballs.

  8. David Edwards says:

    Apart from the issues that plague AI/ML from the software developer’s perspective, there’s another issue that crops up here … namely, what happens to your AI/ML training session, when someone alights upon something completely new, with the potential to force a rewrite of textbooks? I’m thinking here of the recent announcement in C&EN that Rhodium Boride has been found to have a quadruple bond coupling the atoms. That one is probably raising a lot of eyebrows, even among chemists with some bonding exotica of their own in their background, and bringing your AI/ML to the point where it can handle surprises on this scale, almost certainly constitutes one of the hard problems whose solution is the stuff of accolades up to and including Nobel level.

  9. Former NMR person says:

    “Now, getting them to do what I wanted them to do (sequester small molecules in an ordered fashion) was another topic entirely”

    I would be interested to read your experiences with this.

Comments are closed.