Skip to main content

In Silico

Chemical Space

chemical space
I’m listening to Jean-Louis Reymond of Bern talking about the GDB data set, the massive enumerated set of possible molecules. That’s the set of chemically feasible molecules at or below a certain heavy atom count – the first iteration was GDB11 (blogged about here), and it’s since been extended to GDB13, which has nearly one billion compounds with up to 13 C, N, O, S and Cl atoms. (Note, as always, that huge vast heaps of poly-small-ring compounds, especially concatenations of 3-membered rings, are pre-filtered out of these sets, because otherwise they would overwhelm them completely). They’re working now on GDB17, which is a truly huge mound of data.
I was particularly taken with the image shown (from this paper), an artificial set of compounds (up to heavy atoms counts of 500) from several main classes of real molecules. It’s a 3-D principle components analysis plot, which tunes things up to emphasize the differences, of course, and there’s what chemical space looks like from this angle. There go the proteins and nucleic acids, off into their own zones, and similarly the linear alkanes and diamond-like lattices, beaming off in separate directions. In the middle are drug-like compounds – and don’t imagine for a minute that any substantial number of those have actually been prepared, either. This is where we live, all of us organic chemists.

4 comments on “Chemical Space”

  1. SJ says:

    The different maps are available online in their browsable version:

  2. SJ says:

    The different maps are available online in their browsable version:

  3. Anonymous says:

    Is “Seaborgium Carbonyl” represented here?
    I suspect it may make “Things I Won’t Work With”, for at least two reasons.

  4. Daniel says:

    What do the dimensions represent? I could imagine a chemical space representing the number of atoms of each type in a molecule, but I don’t think that is what is happening here?

Comments are closed.