Skip to main content

In Silico

The Latest on Protein Folding

The results of the biannual CASP (Critical Assessment of Structure Prediction) effort have been released. This is a widely-watched competition between different groups (and different programs, methods, hardware, etc.) to see how well protein structures can be predicted de novo from just the protein sequences themselves. In the main category, the organizers provide a set of proteins whose structures have been solved by physical methods, but which have not yet been released publicly, and everyone goes at them to see how close they can get to reality.

There are a lot of ways to skin this particular cat, naturally. One of the more accurate ones is to search through known protein structures looking for things with broadly similar sequences, which can provide a template for a new protein’s solution. As the organizers note, this is the technique that made the most rapid progress of any during the first ten years of the competition (1994-2004), but progress slowed for about eight years after that. It seems to have picked up again, though, through several improvements. (Note that these links are for papers evaluating last year’s competition – there will be a similar set for the new results eventually). At the other end of the scale, dealing with a protein that has little similarity to existing structures is a major challenge indeed. The first time anyone got any of these remotely right was around 1998, and the best hits have continued to be on smaller proteins. Progress is being made, though.

One way to measure overall progress is how well the various teams predict 3-D contacts between protein residues, and that’s been pretty impressive, likely due to a combination of larger databases and new machine-learning techniques to dig through them. In recent years, CASP has also included categories that bear on some situations that come up often: refining an existing structure model to make it more accurate, or building one based on various data sets that provide lower-resolution structural information. That last one is particularly common and particularly tough – you can get information from cross-linking mass spectrometry studies or low-angle X-ray scattering that should be able to help you constrain the search for structural models, but as that last paper notes, adding these to the mix didn’t seem to improve things much. Disordered regions are one reason for that, and those by definition will be refractory to structural solution. But many proteins consist of a mixture of those regions and more orderly ones, and the hope is that we’ll be able to piece things together more usefully. End-to-end disorder, though, as seen with the solution behavior of proteins like the infamous c-Myc, will remain hard to deal with.

Google’s DeepMind team had an entry this year (AlphaFold), and they came in first in overall performance (they’re the “A7D” team). The press stories on their performance, though, make it sound as if they just completely blew everyone else away, but that’s not what I get out of perusing the results. The AlphaFold site itself is more reasonable-sounding. If you look at the data for contact predictions, for example, they do look quite good, often solidly in the group of best-performing proteins. Sometimes the program just gets its teeth into a particular case and outperforms everyone, and sometimes it wanders off into the boonies, which is what happens to every program that tackles protein folding. But overall, AlphaFold looks pretty good, and the field overall is a real improvement.

But protein folding is far from a solved problem, fear not. XKCD’s take on this remains accurate! It’s going to be very interesting indeed to see the progress over the next few years in this area, but that progress is not going to be the discovery of some general solution. It’s going to be a mixture (as mentioned above) of better understanding of the physical processes involved, larger databases of reliable experimental data covering more structural classes, and faster/more efficient ways for searching through all these (both the possible structures and the real ones) and generalizing rules to tell us when we’re closing in on something accurate.

12 comments on “The Latest on Protein Folding”

  1. myma says:

    As always, XKCD for the win.

    1. John Wayne says:

      I’ve been laughing about this one for days:

  2. Daniel Barkalow says:

    Of course, there’s some interesting selection bias in this contest: in 2017, you weren’t going to win by predicting any structure that couldn’t be confirmed using 2017 technology. I wouldn’t be too surprised if looking for analogues does better than it would in the wild due to this: if the actual structure of a sequence is very different from the structures of similar sequences, chances are that nobody is going to find the actual structure, and the CASP won’t test you on that sequence.

    1. Kind of related: can someone point me to a paper that estimates the percent of proteins from the human exome that have known – or reliably predicted – structures?

  3. luysii says:

    It is still the time to give CASP the ‘glass eye’ test used to trap unwary medical students. See what the various algorithms do with a segment of a protein known to be unstructured. Even better make up a random sequence of amino acids, and see if the various algorithms converge (or better, if they say they don’t find ‘a’ structure). Remember, picking a protein with a known ‘structure’ is like picking one of the 30 components of today’s Dow Jones Industrials 30 years ago out of the myriad of stocks being traded back then.

    For the glass eye test, and how likely it is that a sequence of amino acids will form ‘a’ structure please see

  4. Anonymous says:

    I worked on a (bad) project loosely based on work from the Fasman lab. Gerald Fasman was an early player in primary to secondary prediction with the Chou-Fasman Algorithm (see wikipedia). These days, Chou-Fasman doesn’t even make the wikipedia page on “List of protein structure prediction software”.

    What about the group that used crowdsourced gamers to try to predict protein structures? Their software protein folding game was called “Foldit” (link in my name).

    1. James says:

      David Baker’s lab @ U. Washington–still leaders in this space. They had several contributions in the latest CASP as well.

    2. Chris Phoenix says:

      You’re not thinking of folding@home, are you? I don’t think that involved a game – just people donating spare CPU cycles. (That was back before real operating systems existed on PCs, so the CPU would just run the “idle” instruction over and over when it didn’t have anything better to do.)

    3. Susan says:

      Foldit (the game, from Baker lab) is still chugging along. One of the proteins included in CASP this year was designed in the game by a Foldit gamer/citizen scientist.

  5. Isidore says:

    I remember Gerry Fasman, I had a collaboration with him way back when. A true gentleman and scholar.

  6. Anon says:

    This blogpost from Mohammed AlQuraishi is a great read on this topic! Fun rants in there too 😀

Comments are closed.