Any article titled “How to Engineer Biology” is going to get a look from me – and when I’m referenced in the opening paragraphs, especially so. This is a piece by Vijay Pande in Scientific American, and I get called out for my naming of the “Andy Grove Fallacy” (found in this post and the links therein). That’s the idea that the drug industry makes slower progress than Silicon Valley does, therefore applying the engineering and management styles of the Valley to drug industry will speed things up. I’ll spare you the suspense: that’s wrong. See that last link above for why.
Pande is here with the news that whatever merit my point of view might have had, the Valley is now here to melt those objections away. There are certainly parts of his article that I agree with, such as the idea that an engineering approach to biology is more easily seen (and accomplished) in the engineering of the “tools we use to manage biology”. One of the striking things about molecular biology (indeed, a striking thing that makes it possible as a separate science) is how those evolutionarily-developed tools can be adapted to new purposes by humans. The polymerase chain reaction, DNA endonucleases, homologous recombination, DNA ligases – the list goes on for quite a ways past that. We’re able to take these processes and the enzymes that accomplish them and put them to uses both inside and outside the cell. We most certainly can engineer proteins (sometimes), and we can engineer enzyme function (sometimes) – neither of those are exactly straightforward tasks. I think that it’s correct to say that learning to borrow and modify such things really is a triumph of human science and human engineering, and that it’s nowhere near being exhausted yet.
But I’m afraid that that’s a minor point in the article, and I have problems with most of the major ones. To my reading, the piece takes a number of very broad leaps, disguised as dance steps. For example, after talking about the modularity of biology (building blocks such as nucleotides and amino acids being assembled into biomolecules, cells being assembled into organs), this statement appears. “Once we identify the Legos in biology and their properties, we can engineer them and even mix and match them to design novel functionality“. And that’s true, but when it’s put like that, there’s some natural confusion about how many of these “Legos” there might be. We’ve identified a few, but there have been some profound surprises along the way. To pick one, how about the number of different kinds of small RNA species running around in the cell? There’s a whole set of Legos, a whole set of sets, and we had no idea until fairly recently that they were in there.
Even if there were a relatively small defined number, this statement would have problems. There are around 80 stable chemical elements – would it be appropriate to say that “now that we’ve identified the Legos in chemistry, that we can engineer them and even mix and match them to design novel functionality”? Well. . .yeah, I supposed so, but that skips over a few million details and makes all of chemistry sound like a fairly straightforward exercise. Biology, I should add, is far worse.
This is one of the points where the straight engineering viewpoint (as in the famous “Can A Biologist Fix a Radio” essay) breaks down. Radios (of the kind discussed in that article) are human-built objects made of discrete parts. You can see all of them clearly – even if you don’t know what they do (yet) you know that they’re there. But that’s not the case in a cell. We had no idea about small hairpin RNAs, double-stranded RNAs, microRNAs, circular RNAs and all the rest of the menagerie. Didn’t know that they were there. Didn’t see the functions that they were performing, didn’t know that there were such functions. And we didn’t see the workings of the cell and say “Ya know, there’s gotta be a bunch of small regulatory RNA molecules in there, that’s the only way this thing makes sense”. Not a bit of it.
So when you casually say “Once we identify the Legos in biology” you’re actually asking for a great deal, and by disguising it in terms of similarly-sized little building blocks, you actually are confusing the issue. Lets say that the Lego blocks in this case are the five major nucleotides in DNA and RNA. We’ve identified them. Does that mean that we understand their systems well enough to mix and match them? Well, crudely, yes – we can go in and change a genomic sequence. But do we know what happens when we do that, and why? Not so often, not at all. Can we add in the novel functionality we want? Sometimes yes, sometimes no, and it usually takes empirical tests to see what the answer is going to be (and even when we get that answer, we often don’t know why we got it, as with protein expression experiments). Now that we’ve identified the five common nucleobases, does that tell us about the weird little uncommon ones, and why they’re used? It does not. Does it suggest all the strange RNAs mentioned above? Not at all. Does it tell us about the promoters, stop codons, histones, transcription factors, RNA polymerase enzymes, ribosome entry sites and all the other wildly complex things that go into the expression of the gene whose sequence we’ve just edited? Not a bit of it. “Identifying the Legos” only takes you so far. It’s necessary – vital – but not sufficient.
That Lego stuff was Pande’s “Principle 1”. The second principle is “Repeatability and Reproducibility”. His point is that biology has been held back by the tricky, delicate, bespoke nature of many of its experiments, and that modern equipment allows things to be run in a much more reproducible way. Like the Lego section, this one is true, but only so far. Here we go:
The identification of biomarkers (chemical substances we can measure and then target) for disease is currently driven by discovery via a bespoke, one-off process—so the discovery of PSA for prostate cancer, for instance, does not suggest a biomarker for ovarian cancer. Introducing machine learning into the process, however, can turn this handcrafting into assembly-line production.
This is Lego-istic thinking again: the belief that these things are discrete graspable units that can be handled as if they were lines of code, hardware components, or indeed Lego bricks. But we’re not to that point yet. I’m not sure how to put this gently, but the discovery of PSA may not even suggest a biomarker for prostate cancer itself, for starters. It does in some patients, but not in others, and whether it’s of benefit in the general population is very much a subject of debate among physicians. You don’t get this so much in engineering, because engineering is so much simpler. Saying “a biomarker for ovarian cancer” makes it sound like there is a biomarker out there, waiting for machine learning to uncover its benefits for a disease called ovarian cancer. But Pande himself is far too well versed in this stuff to really believe that. “Ovarian cancer” is not one disease with one biomarker – like almost all cancers, it varies from patient to patient, and over time in any individual patient, and from cell to cell inside any individual tumor in any one patient. Bridges do not work this way, to use an engineering metaphor that appears in the article, because bolts are bolts. But not in the biology of human disease.
The article would have you believe that there are machine learning companies that are right now churning out biomarkers for all sorts of diseases, and that “they can now mass-produce many tests in a predictable, precise and repeatable manner”. My main response to this is to ask for the names of some of these companies and the tests that they have actually brought to the market.
Pande’s “Principle 3” is “Testing and Process Engineering”. He says (absolutely correctly) that in engineering “the need for testing is obvious, how to test and what metrics to measure success are not. So, the choice and engineering of key performance indicators (KPIs) is critically important here; without this guiding compass, a project could go in the wrong direction.” But then the argument is that biology and biologists have missed out on the application of these KPIs to their own field. Pande goes on to say: “Now, a new wave of bio startups—drawing on engineering and computer science—are identifying KPIs for measuring molecules synthesized to protein expression, numbers of cells screened, and much more“. I think that something must have gotten turned around in the editing there, unless there should be a comma after “synthesized” and the following “to” should be struck out. But again, it would be useful to hear the names of some of these new wavers, and why they think that some of these are key performance indicators. When they “measure molecules synthesized”, what are they measuring, exactly?
Moving on, “Principle 4” Is “Borrowing From Other Disciplines”. Pande says “. . .the rise of numerous, novel quantitative measurements of biology—i.e., big data sets in biology—has opened the door to incorporating other engineering approaches“, and while that’s true up to a point, I’m not so sure about his take on this. For example, he says that “By applying the materials-science based engineering technology he learned in solar cell materials design to food, James Rogers used techniques from nanoscience to create nanoscopic barriers that protect fruits and vegetables from spoilage.” As a correspondent noted to me over the weekend, this makes it sound as if these “nanoscopic barriers” were being designed layer by layer under a scanning tunneling microscope or something, when what they actually are, are hydroxy-fatty acids and their glycerides to make a coating that goes on more uniformly. What’s more, it’s based on what we know about plant-based waxes such as cutin and suberin – and is in fact produced from them. This is a perfectly good invention, and looks to be really useful, but does not herald the advent of solar-cell engineering techniques applied to living systems.
Finally, we hit “Principle 5”, which is “Reinventing the Process Itself”. That title gives me flashbacks to various HR initiatives I’ve been roped into over the years, but getting past that, what he’s saying is this:
“The challenge in biology lies in breaking down the problem into steps and often reinventing the process itself. But once the desire to consistently improve performance (what (Andy) Grove was suggesting in the first place) moves biology from bespoke, artisanal approaches to designed, scalable processes, even seemingly modest performance increases can make a difference”
OK now. I know exactly what’s being said here – my response is not due to incomprehension. But this exemplifies what to me is a problem with the entire article. It references a number of engineering practices, asserts that it’s now possible to do these things in biology, and simultaneously gives the impression that no biologists had ever thought of trying any of them before. And there are major problem with both of those. How, for example, does anyone think that the Structural Genomics Consortium has been running through so many automated protein crystallography and X-ray structure experiments over the years? How did PCR get optimized to the workhorse procedure it is now? How did monoclonal antibodies move into industrial production? How, indeed, did something like Sanger sequencing get developed back in the early days? By breaking the problems down into pieces and optimizing them separately.
The objection to that might be that those are examples of what Pande was talking about earlier, the application of engineering to the tools of the trade. But my response would be that (1) this shows that such approaches have been going on for a long time (as applied to tools and processes) and are not some new revolution, and (2) that, on the other hand, such an engineering mindset is still not possible for the basic-research side of the business. Pande’s article really tap-dances around that latter point. Engineering just somehow is going to do these things, and it’s on to the next bullet point.
Perhaps the last sentence of the article is where the problem gets stated most clearly:
The question now isn’t whether this is possible in biology or not, as the Grove fallacy argued, but how to do it, given where we are in engineering biology today.
I think we’re dealing with a fundamental misunderstanding here. When I’ve written about the Andy Grove Fallacy, I have not been suggesting that it’s impossible to do biology in some sort of organized fashion. What I’ve been emphasizing is, though, that the things that make such an approach so productive in Silicon Valley will keep it from having the same effect on drug research, at least for a good long time to come. The challenges in hardware and software design, though significant, yield much more easily to human pressures than those of biology and medicine, and twenty paragraphs of repeated assertion to the contrary doesn’t change that much.
Consider the nematode. It’s a terrific little animal to study; Sydney Brenner was right. C. elegans has a limited number of somatic cells (959 in one sex, 1031 in the other) and we know the exact lineage of every one of them through systematic study. It has around 20,000 protein-coding genes, and it has of course been completely sequenced in great detail. Our current technologies allow us to step in and mess with those genes individually. That has, in fact, been done (and far more than once, using different technologies). The nematode proteome has been studied in great detail, under many different conditions (stress, age, mutations). And so on.
My point is that if the nematode were a product of Silicon Valley, we would have more than enough information in hand by now to build one, reverse-engineering it like a new device or a pile of source code. But we can do no such thing. The nematode has been subjected to systematic, engineering-driven analysis fit to to Vijay Pande proud, but we cannot assemble a single one of those thousand cells. There are unknown things going on inside each of them that will win people fame, fortune, and Nobel prizes once we figure them out, of that I am absolutely certain. Real functioning nanotechnology is at this very moment rolling along a nematode’s genome, spitting out messenger RNA that in turn is being ratcheted through ribosomes in ways that we’re still in awe of (and still figuring out the details of, for that matter). Nothing of the sort is happening in an iPhone, and I don’t mean protein biology, I mean extremely important things that we don’t understand and may well not even suspect the existence of. That’s because we built the iPhone, in every detail, and we found cells waiting for us with a three-billion-year head start.
So rather than refuting or superseding the Andy Grove Fallacy, from my viewpoint Pande’s article gets a running start and takes a cannonballing high dive directly into it. I will be very glad to hear opinions on that. . .
Here’s the Wavefunction take on the article (“Whatever the complexities of challenging engineering projects like building rockets or bridges, they are still highly predictable compared to the effects of engineering biology”), and here’s Keith Robison’s (“What Pande is far too optimistic about is the difficulty in figuring that out, particularly when trying to deliver therapies”)