To what extent can scientific discovery be automated? Where are the areas where automation can make the biggest contribution to human efforts? These questions and a number of others are addressed in a very interesting two-part review article on “Automated Discovery in the Chemical Sciences”. The authors, from MIT, are well-equipped (in all senses of the word) to give their perspectives on this – part I of the review defines terms, and introduces a classification scheme for the sorts of automation and the sorts of discoveries under discussion, and reviews what’s been done to date. Part II is more of a look toward the future, with some open questions to be resolved.
As the authors say, “The prospect of a robotic scientist has long been an object of curiosity, optimism, skepticism, and job-loss fear, depending on who is asked” I know that when I’ve written about such topics here, the comments and emails I receive cover all those viewpoints and more. Most of us are fine with having automated help for the “grunt work” of research – the autosamplers, image-processing and data-analysis software, the plate handlers and assay readers, etc. But the two things that really seem to set off uneasiness are (1) the idea that the output such machinery might be usefully fed into software that can then reach its own conclusions about the experimental outcomes, and (2) the enablement of discovery through “rapid random” mechanized experimental setups, which (to judge from the comments I’ve gotten) is regarded by a number of people as a lazy or even dishonorable way to do science.
I think that the classification scheme in the paper is a useful one to start to deal with these objections. They divide scientific discoveries impacted by automation into three categories: physical matter (a drug candidate, a new metal alloy, new crystal form, etc.), processes (such as new chemical reactions), and models (new laws, rules of thumb, correlations, and connections). The authors argue that all three of these are fundamentally search problems – they just differ in the knowledge space being searched, which is a process of validation and feedback. That holds whether you’re talking about a hypothesis-first (Popperian) mode of discovery or an observation-first (Baconian) one; the difference between the two is (to a large extent) where you enter that cycle of observation and experimentation. The paper makes the key point that in every example of machine-aided discovery so far, the search space has been far larger than what was (or even could be) explored. When you look closer, it’s human input that has narrowed the terms and the search space. Whether that will eventually change is one of those open questions. The authors also note the three factors that are enabling automation in all of these classes – access to large amounts of data, the increasing computing power to process it all, and the advances in hardware to mechanically manipulate the physical tools of experimentation.
Now one gets to the question of just how automated/autonomous things really are (or really can get):
Here, we propose a set of questions to ask when evaluating the extent to which a discovery process or workflow is autonomous: (i) How broadly is the goal defined? (ii) How constrained is the search/design space? (iii) How are experiments for validation/feedback selected? (iv) How superior to a brute force search is navigation of the design space? (v) How are experiments for validation/feedback performed? (vi) How are results organized and interpreted? (vii) Does the discovery outcome contribute to broader scientific knowledge?
As you’d imagine, existing examples fall all over the place on these scales, and the upper reaches of this scheme are still basically unpopulated. But that is surely not always going to be the case, is it? We humans operate in a pretty unconstrained search space, and that means that you can have situations where the human makes all the decisions and points the machine at the task, where the human uses the machine to narrow down the possibilities and then takes action, and (finally) where the machine narrows down those possibilities and takes action itself.
The how-effective-is-brute-force question applies to the “automated serendipity” style of reaction discovery that has shown up in the literature in recent years. I see that as a sliding scale – on one end you have a machine chugging through every single flippin’ possibility, filling in all the boxes as you hope something interesting hits, and at the other is the ideal of the human scientist, eyes closed and fingers to temples as they stand in front of the whiteboard, in the very act of bringing a creative discovery into being. In truth, most discoveries are in between those two extremes. The machines (as mentioned) have their search space limited by human input, and the humans often have to try and discard a number of possibilities before hitting on the right one.
I haven’t even made it in this post to the second review paper – we’ll save that for another day! The rest of this one features an extremely comprehensive review of past examples of machine-aided discovery in chemistry (literature mining, reaction prediction, new reaction discovery, property prediction, ligand prediction, optimization of existing reaction conditions, and more). It’s a thorough look at what’s been done – but the next paper goes into what might be accomplished from here on, and what we’ll need in order to do it. . .