Skip to Content

Just How Many Compounds Are We Talking About?

Just how many chemicals are there? As written, you can find estimates of anywhere from 10 to the eighteenth (pretty big, all right) all the way up to the gibbering, flee-in-terror order of ten to the two hundredth. A range like that makes it clear that no one knows what they’re talking about, so the question need to be cut down to size. “How many chemicals are there below a certain molecular weight?” is a good start, and once you set that, you might want to stipulate the list of elements you’ll include and whether or not the compounds are stable enough to be isolated.
A group from the University of Berne has just published a paper in Angewantde Chemie (44, 1504 in the English edition) which claims to answer just such a question, namely: “How many reasonably stable compounds are there with up to eleven atoms of either carbon, nitrogen, oxygen, or fluorine?” Should this one come up during your next poker game, you can now answer, in your best Mr. Spock voice, “Approximately 13,892,436.” But hold on. Does that number sound low to you? If not, it should – read on.
The Berne group came up with their estimate by computationally assembling graphs which corresponded to all the saturated hydrocarbon backbones up to eleven carbons. Then they systematically replaced all possible carbons with N or O, allowed for double and triple bonds, and substituted all carbons with H or F. So far, so good. These variations generated a low of 4 and a high of 79236 compounds per carbon skeleton.
But they applied a set of mighty strict standards during these operations. Their algorithm rejected heteroatom-heteroatom bonds, except for those found in some aromatic heterocycles, as well as nitro groups, oximes and the like, so no peroxides (and no hydrazines, I suppose, although they’re stable.) They also rejected bridgehead double bonds and allenes, and (to my surprise) only allowed triple bonds for nitriles (so no acetylenes.) They also rejected hydrolytically unstable groups – no enamines, no acyclic imines, no acyl halides, no enols and not even any orthoesters.
What this means is that there are plenty of compounds you can order from a catalog that aren’t even on the list. Heck, there are compounds that are shipped in tank cars that aren’t on the list. Allowing some of these compound classes to gain a foothold would have swelled the ranks a great deal. Moving further past their criteria, you can imagine how out of control things would get if you started calculating in sulfur, phosphorus, and more than one type of halogen atom. I don’t know if this team is contemplating that exercise or not; they’ll probably have to wait for a fresh crop of grad students before they can even try.
But I’ve left out a key statistic of theirs, a startling one. Back at that first step, when they graphically assembled those carbon frameworks, it turned out that the huge majority, a full 99.8% of them, had three- and four-membered rings in them. In order not to have a list so skewed toward cyclopropanes and cyclobutanes, they threw all of these out at the very start, leaving them with 1830 basic skeletons as opposed to 843,335 of them. Throwing out the likes of orthoesters and acetylenes, as it turns out, is nothing compared to the massive effect of shedding the small rings.
In this light, as the authors point out by an excellent astronomical analogy, their list of thirteen million stable compounds is actually surrounded and permeated by a huge unseen amount of “dark matter” – all those 3- and 4-membered rings. Many of them might be too strained to be stable, but many others would be fine. They just haven’t been explored because they’re too much of a pain to make. This, to me, was the single biggest surprise of the whole effort. I knew that there must be a lot of these compounds, but I never would have thought that their possible forms hugely outnumber all the other small molecules I’ve ever seen or thought of. What else don’t we know?

4 comments on “Just How Many Compounds Are We Talking About?”

  1. Daniel Newby says:

    I once got to thinking about a slightly different question: in the vast sea of reasonably stable structures, what fraction are immune to conventional synthesis? I.e., stuff where the most fruitful synthesis would be to smash likely-looking precursors in a particle accelerator and sort the debris with single-molecule NMR. For more than a few dozen carbons, I bet the fraction is appallingly large.

    Now that I think about it, carbon nanotubes and fullerenes fall squarely in the “impossible to synthesize” category. Trying to make them an atom at a time would be madness. It is sheer luck that they self-assemble so nicely.

  2. David Govett says:

    Will software ever be sophisticated enough and will hardware ever be fast and capacious enough to infer function from structure? If so, software would be able to model and characterize any number of compounds relatively quickly.

  3. Derek Lowe says:

    The authors did run some “virtual screening” software through their library, and broke it down into how many structures were potential receptor ligands and so on. But my trust in those methods just barely moves the meter off zero.
    I think that de novo function-from-structure falls into the category of “probably not quite impossible.” What, in other words, an engineer would call, with a pained expression, “nontrivial.”

  4. a Chemist says:

    You might also want to have a look at Jonathan Goodman’s article in the current Chem & Und (when it makes it’s way across the pond, or online), Issue 6, p 18 (or the cited article from it, J Goodman and K de Silva, J. Chem. Inf. and Modeling 2005, 1, 81)
    Looks at this from a slightly different perspective

Comments are closed.