Directed Evolution: Engineering Enzymes

Directed Evolution: Engineering Enzymes

Frances Arnold couldn't design a better enzyme. So she evolved one.

In the early 1990s, Arnold was trying to engineer subtilisin—a protease enzyme used in laundry detergents—to work in organic solvents instead of water. This would be useful industrially, but it was a hard problem. Enzymes are exquisitely shaped for their natural environments. Changing that requires changing the protein structure, and we didn't understand protein folding well enough to do it rationally.

So Arnold did something different. Instead of trying to design the right mutations, she created millions of random mutations and let selection sort them out. She mutated the subtilisin gene randomly, tested the variants for activity in organic solvents, kept the winners, and repeated. Mutation. Selection. Repeat.

After a few rounds, she had enzymes that worked in conditions no natural subtilisin could tolerate.

She hadn't designed anything. She had evolved it.

In 2018, Arnold won the Nobel Prize in Chemistry for directed evolution. The technique has become one of the most powerful tools in synthetic biology—a way to engineer proteins without fully understanding them.

When you can't design, evolve.


The Design Problem

Let's understand why designing proteins is hard.

A protein is a chain of amino acids that folds into a specific three-dimensional structure. That structure determines the protein's function—an enzyme's active site has a precise shape that fits its substrate, like a lock and key.

If you want to change the protein's function—make it faster, more stable, or active on a different substrate—you need to change the structure. To change the structure, you need to change the amino acid sequence. But which changes?

A small protein might have 200 amino acids. Each position could be any of 20 different amino acids. The number of possible sequences is 20^200—a number so large it exceeds the number of atoms in the observable universe by a hilarious margin.

We can't search that space by brute force. We can't even sample it meaningfully.

Computational protein design tries to predict which sequences will fold into desired structures. It's made enormous progress—the AlphaFold revolution has transformed structural biology. But predicting function, not just structure, remains hard. And many desired functions aren't well-defined enough to compute.

The sequence space is astronomically vast. The rules for navigating it are incompletely known. Rational design works sometimes, but often hits walls.


Evolution as Search Algorithm

Evolution doesn't understand protein folding. It doesn't compute structures. It doesn't reason about function.

Evolution just tests variants against reality and keeps what works.

This is stupid. It's also effective. Over four billion years, evolution has produced enzymes of staggering sophistication—catalysts that accelerate reactions by factors of billions, with exquisite specificity. No human designer has come close.

The insight of directed evolution: we can use the same algorithm, but faster.

Natural evolution is slow because it relies on random mutations that usually don't help. Most mutations are neutral or harmful. Beneficial mutations are rare. It takes many generations for improvements to accumulate.

But we can speed things up. We can create huge libraries of variants—millions or billions—in a single experiment. We can apply selection pressure that's much stronger than natural selection. We can do in weeks what nature does in millennia.

Directed evolution is evolution on a human timescale.


The Basic Protocol

Here's how directed evolution works:

Step 1: Diversify. Start with a gene encoding a protein of interest. Create a library of mutant versions. You can do this with error-prone PCR (which introduces random mutations), DNA shuffling (which recombines segments from related genes), or targeted mutagenesis (which randomizes specific positions).

Step 2: Select or screen. Test the variants for the property you want. This is the critical step—you need a way to distinguish better performers from worse ones.

Sometimes you can use selection: engineer the system so that cells with better enzymes survive and cells with worse ones die. Sometimes you screen: test each variant individually and measure performance.

Step 3: Amplify winners. Take the best performers and use them as the starting point for the next round. Their genes become the parents of the next generation.

Step 4: Repeat. Each round accumulates beneficial mutations. After several rounds—typically 3-10—you may have a protein dramatically improved from where you started.

The magic is in the iteration. Each round searches the sequence space near the current best performers. Over multiple rounds, you navigate toward better solutions without needing to understand the landscape.


What You Can Evolve

Directed evolution can improve almost any protein property, if you can measure it.

Activity. Make enzymes faster. Arnold's original subtilisin work increased activity in organic solvents.

Stability. Make proteins resist heat, acid, or oxidation. Industrial enzymes need to survive harsh conditions.

Specificity. Change what the enzyme acts on. Evolve an enzyme that accepts a new substrate, or rejects ones it used to accept.

Selectivity. For reactions that can produce multiple products, evolve enzymes that make just one.

Expression. Some proteins are hard to produce—they don't fold well when overexpressed. Evolve variants that express better.

Binding. Evolve antibodies that bind tighter to their targets. This is how therapeutic antibodies are often optimized.

If you can measure it, you can evolve it. The design challenge becomes a screening challenge.


Screening vs. Selection

The bottleneck in directed evolution is usually screening.

Creating millions of variants is easy. DNA synthesis and cloning are cheap. But testing millions of variants?

High-throughput screening uses robots, microfluidics, and automation to test thousands to millions of variants per day. Each variant gets evaluated individually. This is powerful but expensive and labor-intensive.

Selection is more elegant. If you can couple the desired property to survival or growth, the cells do the sorting for you. A billion cells in a flask, each with a different variant—the ones with better enzymes outcompete the rest. You sequence whoever survives.

Selection is much more scalable than screening. But it requires linking protein function to cell fitness, which isn't always possible.

The art of directed evolution often lies in devising clever selections. How do you make the cell's survival depend on the property you're optimizing?


Arnold's Nobel Work

Arnold's Nobel Prize recognized not just the technique but its applications.

Her lab evolved enzymes to catalyze reactions that no natural enzyme performs. This is harder than it sounds—enzymes are optimized for their natural substrates, not arbitrary industrial chemistry.

One striking achievement: evolving enzymes that form carbon-silicon bonds. No natural enzyme does this. Silicon is used in chemistry and materials science, but biology generally ignores it. Arnold's lab evolved cytochrome P450 variants that could incorporate silicon into organic molecules.

Another: enzymes that form carbon-boron bonds. Again, unnatural. Again, evolved.

These "new-to-nature" reactions demonstrate that directed evolution can access functions outside the existing biological repertoire. Evolution explores; humans direct the exploration toward human goals.

We're not limited to what nature discovered. We can evolve enzymes for chemistry evolution never attempted.


The Fitness Landscape

Let's think about what's happening conceptually.

Imagine a landscape where each point is a protein sequence and the height is the protein's fitness (however you define it). Directed evolution is a walk through this landscape, always stepping toward higher ground.

The landscape is astronomically large. But it's not random—nearby sequences usually have similar properties. A single mutation usually doesn't completely change what the protein does. This local correlation makes the search tractable.

Each round of mutation explores the neighborhood of the current position. Selection keeps the steps that go uphill. Over time, you climb.

The danger is local maxima—peaks that aren't the highest, but where all directions go down. You might get stuck. Standard directed evolution tends toward the nearest peak, not necessarily the global optimum.

Various strategies address this:

Large jumps. Occasional large mutations or recombination can jump to different regions of the landscape, potentially finding better peaks.

Neutral drift. Allow some neutral mutations that don't improve (or harm) fitness. These let you explore sideways, potentially finding paths to higher peaks that aren't directly uphill from where you started.

Multiple starting points. Start evolution from several different sequences. They'll climb different peaks. Compare the results.

Machine learning guidance. Train models on the variants you've tested to predict which unexplored regions might be promising. Guide the search computationally.

The fitness landscape metaphor has limits—real landscapes aren't static, and multiple mutations can interact non-additively (epistasis). But the metaphor captures the essential challenge: finding good solutions in a vast space.


Industrial Applications

Directed evolution is not just academic. It's industrial.

Detergents. Enzymes that break down stains—proteases, amylases, lipases—have been evolved to work better in cold water, in the presence of bleach, and on specific stain types. Most laundry detergents contain evolved enzymes.

Biofuels. Cellulases that break down plant cellulose into fermentable sugars. Evolved versions work faster and survive harsher pretreatment conditions.

Pharmaceuticals. Enzymes that synthesize drug precursors or perform stereoselective reactions. The pharmaceutical industry increasingly uses biocatalysis, with evolved enzymes.

Food. Enzymes for processing food—cheese making, brewing, baking. Evolved for efficiency and specific flavor profiles.

Bioremediation. Enzymes that break down pollutants. Evolved to degrade recalcitrant compounds more effectively.

The companies Codexis, Novozymes, and others have built businesses around directed evolution. It's a mature technology with real revenue.


Sharing the Nobel

Arnold shared the 2018 Nobel Prize with George Smith and Gregory Winter, who developed phage display—a technique for evolving binding proteins, especially antibodies.

In phage display, each bacteriophage (virus that infects bacteria) displays a different protein variant on its surface. You expose the phage library to a target molecule. Phages that bind stick; phages that don't wash away. Amplify the binders, repeat.

This is selection for binding affinity, and it's extremely powerful. Therapeutic antibodies like adalimumab (Humira) were developed using phage display. The technique has generated drugs worth billions.

Phage display is conceptually similar to directed evolution—create diversity, select winners, repeat. The Nobel recognized the shared principle: using evolution as an engineering tool.


Beyond Enzymes

Directed evolution works on anything made of protein.

Antibodies. The immune system naturally evolves antibodies. Phage display and related methods do it faster and for targets the immune system can't naturally handle.

Biosensors. Proteins that change their properties when they bind a target—fluoresce, change shape, alter enzyme activity. Evolved for sensitivity and specificity.

Transcription factors. Proteins that control gene expression. Evolved for different DNA binding specificities.

Structural proteins. Proteins for materials applications—silk, elastin-like polymers. Evolved for desired mechanical properties.

The principle is general: if it's encoded by a gene, you can evolve it.

And increasingly, we're combining directed evolution with machine learning. Train models on the data from evolution experiments. Use the models to predict promising variants. Test them. Train better models. The search becomes more efficient.


The Philosophical Angle

Directed evolution raises an interesting question: is it design?

In one sense, yes. A human defines the goal—the selection criterion. A human chooses the starting point, the diversification strategy, the screening method. The process is intentional.

In another sense, no. The solutions emerge from random variation and selection, not from understanding. The engineer doesn't know which mutations will work. The process discovers what the engineer couldn't design.

This is design without a designer—or rather, design where the designer specifies the goal but not the solution. Evolution does the creative work.

It's a different kind of engineering. Less like architecture, where the designer specifies every detail. More like gardening, where you set up conditions and let growth happen.

Frances Arnold didn't design better enzymes. She created conditions where better enzymes could evolve.


Further Reading

- Arnold, F. H. (2018). Nobel Lecture: "Innovation by Evolution: Bringing New Chemistry to Life." NobelPrize.org. - Packer, M. S., & Liu, D. R. (2015). "Methods for the directed evolution of proteins." Nature Reviews Genetics. - Romero, P. A., & Arnold, F. H. (2009). "Exploring protein fitness landscapes by directed evolution." Nature Reviews Molecular Cell Biology. - Chen, K., & Arnold, F. H. (2020). "Engineering new catalytic activities in enzymes." Nature Catalysis.


This is Part 3 of the Synthetic Biology series. Next: "Xenobots: Living Robots Made from Cells."