Skip to main content

Analytical Chemistry

Disorder and Order

An interesting feature of many proteins is a disordered region down at the carboxy end. The reason for this feature has been obscure: if there’s part of the protein that just spends its days flailing around uselessly, why go to the trouble of translating it? Many of these tails certainly seem to have no defined structural role, because you can mutate their sequences in all sorts of ways with no apparent problems. How have such things persisted in the absence of any obvious function? The answer, as is so often the case in biology, is that there’s function but it’s not too obvious.

Here’s a paper looking at an enzyme called UDP-α-D-glucose-6-dehydrogenase (UGDH) that has such a disordered 30-amino acid tail. You can chop the whole thing off and the enzyme has what seem to be basically the same kinetics, so what’s it doing there? As it turns out, you have to look closer. The enzyme is allosterically regulated by UDP-α-D-xylose, a negative feedback system. It’s an unusual one, because the same active site serves both purposes – the enzyme forms an inactive hexamer that can break up into three active dimers, and this switch is dependent on competition between the UDP-α-D-glucose substrate and the UDP-α-D-xylose allosteric ligand.

And it’s true that the substrate behavior doesn’t seem to change in the absence of the disordered tail. But the allosteric behavior sure does: without those thirty residues, the affinity for the xylose ligand is tenfold lower. The group tried a whole range of mutations in that region to see if any affected this behavior, but it was pretty impervious: swapping out all the prolines, no effect. Switching all the lysines to serines: no effect. Turning the whole darn thing to nothing but thirty serines in a row: no effect. Varying the length of that chain showed a simple exponential-decay relationship with the UDP-α-D-xylose affinity: the longer the better, out to around thirty or so. But even a four-serine tail has an effect.

What the disordered tail is doing here, regardless of sequence, seems to be tied to entropic effects. The authors note that if you hang an unstructured polymer off a surface, you generate an entropic effect down at the point of attachment. That’s because the surface itself excludes some conformations of the polymer, and thus reduces entropy. Hydrogen-deuterium exchange experiments (a method that shows how exposed and mobile a protein’s regions are) shows that the tail region completely exchanges within about two minutes, and that its presence alters the exchange rates of several other parts of the UGDH protein, especially around the binding site. The entropic cost of constraining that disordered tail, in other words, energetically biases the protein towards conformations that are more favorable for binding the allosteric regulator. Doesn’t matter what the tail consists of, so long as its disordered and about that length.

That’s interesting enough by itself (well, to me anyway – I admit that mileage may vary on a topic like this one!) What’s odd is that the tail region is nonetheless highly conserved. This in the face of compelling evidence that its sequence doesn’t matter for this function! The best explanation is that it may have still other (unknown) biological functions that require a particular structure/sequence; it’s just this one isn’t one of them. Which means that in some ways it’s important that this region be disordered, any old way, and in other ways it’s important that it has a particular sequence and presumably particular interactions. That’s what three billion years of whatever-works-dude tinkering will get you. As an unavoidably snarky aside, if anyone can provide an analogy to this behavior to what’s seen in coding, hardware design, or any sort of human engineering at all, please speak up.

37 comments on “Disorder and Order”

  1. SP says:

    “That’s what three billion years of whatever-works-dude tinkering will get you.”
    Change 3 billion to 30 and you’ve described the Windows OS code base.

    1. Mat says:

      I’m somehow reminded of the leaked version that had three or four empty include statements at the beginning of the first file. That if you removed them, the whole build broke for reasons that STILL don’t make sense to me.

      1. Kent G. Budge says:

        More or less the first thing I thought of.

        Programmers like to flatter themselves that they are building lofty architectures. In practice, they’re evolving code against a test problem set that includes some very weird corner cases provided by unhappy customers.

    2. Mat says:

      So in otherwords the insanity that gave us several empty #include() statements that if aren’t there, break the whole build, but have no obvious real purpose?

    3. Morten says:

      Pretty much. The leaked NT code had people in tears until reasons for some of the strange misfeatures were found. For instance, a network driver was ginormous – thousands of lines of code with some very strange conditionals. Turned out it had to be that way to handle the many, many strange networks out there, past and present. So most of the code was “dead” most of the time, until someone used a strange old network card with some strange old quirks…

  2. Gareth says:

    There’s an important lesson for crystallographers who routinely remove disordered regions in this paper. It also reminds me of this story:

    1. Carl W says:

      The mention of multiple purposes for biological components actually reminded me of Mel the Programmer, who used the same memory as both instructions and data, in this story:

  3. Frank Adrian says:

    I think the best example of the “three billion years of whatever-works-dude tinkering” in action can be found in the earlier works of the architect Christopher Alexander – “The Timeless Way of Building”, “A Pattern Language”, and “The Oregon Experiment”. Of course, he was trying to create architectures that were alive and could grow with the populations and communities within them, so there is that…

  4. sgcox says:

    I am surprised there is no ITC experiments in the paper. It would provide a direct measure of binding enthalpy/entropy. Very sloppy reviewers 🙂

    1. Zachary A. Wood says:

      I can promise you that the reviewers were quite thorough and careful, and it is a better manuscript because of their criticisms. They had an impressive understanding of structure, intrinsic disorder, enzymology and polymer physics. While ITC will give you a direct readout of enthalpy, the entropy is calculated. We have begun doing ITC studies, but as I am sure you are aware, cooperative binding with 10 sites is a bit challenging to model. Rest assure that measuring Ki is a reasonable way to model affinity, as is the transient state binding studies that we included as orthogonal evidence in the manuscript. Best Regards, Z

  5. luysii says:

    The work and post and comments show exactly why chemists have a far more intuitive concept of entropy than anyone else (including physicists). You don’t have to calculate how many conformations there are, because you just know they are there.

    1. Humble Scrivener says:

      While I have forgotten much else of undergrad physical biochemistry, I remember S = R ln omega.

      1. Anonymous says:

        S = k_B ln(omega) where k_B is Boltzmann’s Constant.
        k_B = R/N
        S = (R/N) ln(omega)

  6. Sok Puppette says:

    Coding examples… well, let’s see…

    UUIDs, which are used all over the place, are generally constructed completely at random. What’s important is that your UUID not be the same as any other, and if it’s long enough and properly random, then the need is satisfied. You don’t want to construct it according to a predictable pattern, because it will probably end up colliding with something.

    However, once you assign a UUID to something, everything else in the system will use that UUID to locate it, so any change in the UUID will break the whole thing.

    That’s an example of a general pattern in identifiers: you don’t care what the identifier is, as long as it’s unique within some particular name space… but once you’ve assigned it you can’t change it, because other stuff is now recognizing it. So it has to be perfectly conserved, because other parts of the system have “adapted” themselves to the random choice you made and will now break if you change it.

    And I’m pretty sure that I’ve written code more than once where a library demanded a string of a certain form in a certain place, and I just made something up.

    There’s also the hack where you’re storing data in something, but what it thinks it’s storing isn’t really quite what you want to store, so you encode your real data in something that looks like what it expects. That’s not rare when you’re, say, trying to connect some ancient mainframe cruft with a new Web front end. In fact, end users use that hack sometimes. They’ll set some weird field to some specific oddball value and use that to mean something that the program program doesn’t know about or want to represent. Those aren’t the same thing, but they have the same feel.

    Look, you’re RIGHT that computer types vastly underestimate how hard it is to figure out biology. And I think you’re right that one big reason for that is that they’re used to working with sanely designed systems that follow principles and are deliberately built to be understandable.

    But you can find almost ANY little motif in human engineering. The issue is SYSTEM complexity.

    The problem with biology is that there’s no rhyme or reason to which motifs it chooses. It invents new ones all the time. It uses the same ones for different things. It uses different ones for the same thing. It sets up 100 overlapping and contradictory control systems for the same thing… or for *almost* the same thing. It uses the same signal in directly opposite senses. It creates crazy interdependencies that would get you fired as an engineer. It builds layer after layer of weird repurposings, and then does it all over again a billion times.

    … but that’s not about the details. Details get to be comprehensible if they’re small enough. If you start arguing about one molecule, you’ve pulled attention away from the main point and onto ground where you’re much less likely to be right.

  7. For what it’s worth, there is a weak analogy in coding. Suppose the desired result is something like
    1. set X to something
    2 if Z might be needed, set Y to f(X)
    3. set X to something else
    4. if Z is needed, set Z to g(Y)
    Code like this will sometimes be queried or even rejected by the compiler, on the grounds that at Step 4, Y may not have been assigned. One solution in this situation is to make an unconditional assignment to Y before step 1. The value assigned can, of course, be anything at all, but it has to be something.

  8. Anonymous says:

    1. Paywall – there should be an arrangement that if an article is blogged about on Pipeline that the journal (Science, Nature, whoever) should open up access immediately.

    2. A skinny, snaky tail (free 30AA tail) flops around more easily than a 30AA tail with some large, heavy, regulatory, mystery molecule bound to it. (What about ions in vivo? AAs + ions + solvent shells could also affect floppiness in vivo.) Although they found lots of changes that gave no effect, was that in vivo or in vitro? Maybe there are things in the cell that modulate the floppiness (entropy signal) in a sequence sensitive manner?

    3. Over the years, some very important crystal structures have come from truncated and modified proteins. As I looked at the “races” to get the first or best structures, I couldn’t help but wonder how much of it was pure luck to truncate just the right bits here and there. Group A truncates an entire tail and gets no xtals; Group B leaves just one or two AAs exposed and gets no xtals; Group C …; and so on until Group X truncates just the right tails, loops, bits and pieces to get a xtal, a structure, and maybe even a Nobel Prize. The structural info is usually VERY important and VERY helpful (GPCRs, ion channels, pores, motor proteins). Further studies allow refinements and improvements in the details.

    (Historical note on protein crystals: many people were trying to get the first protein crystal structure; Perutz and Kendrew spent a lot of time on PURIFICATION which enabled them to get myoglobin xtals good enough to refract and provide a structure. Insufficiently purified myoglobin foiled the chances of others to be first. … And, hence, a lesson on the importance of purification methods in chemistry and biology.)

  9. Isaac Grosof says:

    The closest equivalent to “we need something here and it doesn’t matter what, but it needs to be a certain length” in coding is padding. Paint shows up in date serialization, memory structures, all over the place. Often padding is set to all zeros, not because it’s useful, but because it’s easy to set things to zeros. Sometimes people will fill memory with the value 0xDEADBEEF to make it super clear that the value should not be used. 0xDEADBEEF is preserved by tradition, but it has a ton of longevity – people have been using it to indicate passing space or otherwise unused memory for many decades

  10. Daniel Barkalow says:

    My guess for why that region is conserved would be that it has nothing to do with the protein at all. It could easily be a vital marker in the DNA for regulating expression, for example, and the fact that it ends up producing a particular one of a great many perfectly acceptable tails is just convenient.

    1. tlp says:

      Didn’t see this comment but it’s almost exactly what I’ve thought. One way to check that would be substitution of the DNA sequence by synonymous codons and checking protein expression.

  11. tlp says:

    Does it mean that the sequence of disordered piece is actually more important for reproduction/survival than the very enzymatic function of the protein?

  12. Derek Jones says:

    Reuse of the same memory bit-patterns was common back when computers had limited memory, e.g., 64k. Games developers sometimes used the bit-pattern of the programs code as the map of the galaxy (or whatever).

    The most impressive reuse I have ever seen is the Apollo computer, where 64k of memory would have been a luxury. To implement two sets of functionality, they did not have two separate programs, they had two different cpus. The hardware stayed the same, but the instructions were changed (via microcode). So the same bit pattern in memory could be executed as two different programs, depending on the microcode that was loaded.

    1. Jim says:

      First I’ve heard of that. I’d love to read about it. Got a cite?

      1. Derek Jones says:

        My sources are knowing games’ developers in the 1980s and knowing somebody (many years ago) who worked on the Apollo software. A quick search through the pdfs and books I have failed to locate a specific mention of this technique. It is probably tucked away in somebodies reminiscences.

        Emulating old computers has become quiet a thing.

        Here for the Apollo guidance computer:

        The Apollo 11 source code is available:

  13. Judith Bush says:

    In crufty code stories, the best analogy i can think of was when a bunch of useless conditional cases proceeded the current meaningful case. Useless conditionals were cleared away. Code broke. Conditionals back. Code worked. Heads scratched. Eventually someone thought to replace conditionals with a delay. Code worked. Turns out the delay of the useless conditionals was sufficient to prevent a database deadlock.

  14. luysii says:

    Pop quiz. How would you make an enzyme in a cold dwelling organism (0 Centrigrade) as catalytically competent as its brothers living in us at 37 C?

    The answer involves entropy. Figure it out before you peek —

  15. Hrolf says:

    The previous coding examples were spot on, and illustrate that sometimes it’s easy to have a bias about programming when approaching it from the outside, because it appears as though programming design and implementation should be “simple” because there are no physical constraints, when most implementations of large-scale software-intensive systems are actually staggeringly complex because the world is complex and programs have to handle lots of those complexities. So as the illustrations above point out, this kind of complexity is the norm in coding, hardware design, and other kinds of human engineering.

    Additionally, when we bring in the fact that lots of organisms have, not just some unstructured tail, but this specific unstructured tail, the analogy to programming becomes even closer. As you say, “[t]he best explanation is that it may have still other (unknown) biological functions that require a particular structure/sequence; it’s just this one isn’t one of them.” If only I had a dollar for every time I looked at someone else’s code (or mine from six months ago), saw apparent disorder, and slowly, very slowly, came to recognize that it *has* to be that way for very particular reasons. I promise you every programmer has had *that* experience.

    1. Some idiot says:

      As a process chemist, I always catch myself when looking at the experimental section of a good paper, and seeing some really wacko wierd workup procedure, and start thinking “that’s just plain stupid…” I then reset my brain and think “ok, no sane person would have done this experiment this way the first time, so this indicates that they have used a fair bit of time finding something that works…” 🙂

  16. eub says:

    Dumb programmer question: why the COOH end specifically? Is there some general reason *that* end would be thinking about some binding you’d want to tune the entropic cost of?

    But on the other hand, if it’s related to the processes of protein synthesis rather than the function afterwards, that seems easier to handwave. Something something at the end of the protein translation something?

    1. eub says:

      If I were designing biology for fun, I’d make this a random identifier code like a UUID: so things could concisely match against the id and target that protein. Or if it stores (in arbitrary dense form) information to allow handling classes of proteins, like the 3×5″ cards they used to notch/hole so you can do database logic by a needle lifting out all the cards with a hole at that location, and leaving the ones with a notch.

      — Whoa, I read some Wikipedia and I just learned that proteins do have “signal peptide” at the N-terminal end that directs what compartment to transport them to, and the C-terminal carries “retention signals” that filter it further. So the N-terminal ‘addresses’ it to the endoplasmic reticulum and the C-terminal directs it to be held in the ER rather than getting packed into vesicles for secretion. Cells are amazing!

  17. Somhairle MacCormick says:

    My Dad did a similarthing when he named me, my name is pronounced Sorley but the spelling has a few redundant letters in it. At least that’s what I say because I don’t understand Gaelic.

  18. “That’s what three billion years of whatever-works-dude tinkering will get you” By far this must be one of my favorite lines I’ve read in a few days!

    I don’t have a specific example for you, though this general theme does remind me of how the homo sapiens eye evolved in such a characteristically unordered way (for a programmer).

  19. The nucleotide sequence of the tail of this particular isoform of UGDH is conserved at >94% identity from humans to (among many others) chimpanzees, pigs, horses, elephants, sperm whales, rodents, bats, and even pangolins and armadillos.

    Although the protein-level allosteric/entropic effects are an interesting biophysical observation, I suspect that some sort of nucleotide-level regulatory process is more likely to be a physiologically relevant explanation for the extreme (nucleotide, and hence protein) conservation of this tail.

    1. tlp says:

      Messing with those random sequences the authors must have altered DNA sequence, right? Either there’s nothing special about DNA/mRNA sequence or the effect is not noticeable in their expression systems.

      The paper has one of my pet peeves though – comparing Ki values in the linear scale (fig. 2c,d)

      1. The proteins were expressed in E. coli. The relevant context for the mammalian 3’-coding *nucleotide* sequence (other than its polypeptide translation) has (presumably) thus been lost.

        1. Michael Nute says:

          Great post. I’m assuming that the coding frame starts with position 1 but given that, looking down the alignment I can’t find a single mutation that is non-synonymous. Looks like selection for that specific AA sequence to me.

  20. Multiple sequence alignment of nucleotide tail. NCBI says link should be good for only 1 to 2 months…

  21. HannahMa says:

    I once read a National Lampoon parody called “Popular Evolution”. The “advice column” went something like this: “Dear Dr Darwin, I’m a mammal with fur, but I lay eggs, have a duck’s bill and webbed feet. What gives?” “Dear Platypus, seems like your ancestors just couldn’t bear to throw away any of those old genes!” Maybe this is the explanation for a lot of oddballs.

Comments are closed.