Niko McCarty

Synthetic Cell Eats, Grows, Divides

Niko McCarty — Wed, 01 Jul 2026 14:04:00 GMT

Humans have long wanted to create life. This seems to be an eternal pursuit, driven by some perpetual brainworm nestled deeply in the human mind. And when I say “create life,” I don’t mean to simply have sex and then, nine months later, watch as a baby emerges into this world, kicking and thrashing. I mean instead to literally build an organism piece by piece, molecule by molecule, until inert ingredients give rise to life. Such a feat would demonstrate total mastery over nature and refute all the haters who say life is “inscrutable,” “irreducible,” or “hopelessly” complex.

Alas, no one has yet created a living cell from purified molecules, and probably won’t for many years to come. But scientists have tried!

Most claims of success, though, come from hype-men who cut corners and don’t actually know the raw ingredients of their organisms. Anybody can take an existing lifeform, cut out some pieces, and then claim to have made something new; or, rather, to suck the cytoplasm and nucleus out of one cell, inject them into another, and then claim to have made a “chimeric” lifeform.1

(Even credible claims are often misinterpreted by a sensationalist press. When German-American biologist Jacques Loeb used salts to coax a sea urchin egg to divide, the Boston Herald headline declared, “Creation of Life… Lower Animals Produced by Chemical Means.” The subheading was even better: “Immaculate Conception Explained.”)2

But in a preprint released today, scientists claim they’ve made the first cell capable of “feeding, growth, replication, division and selection as a single cycle, coupled to one genome, entirely using components scientists put there.” The cell is called SpudCell, and it is made entirely from known molecules. And although it is definitely a cell (in that it has a membrane with molecules inside) it is definitely not alive, because it cannot grow indefinitely, it cannot survive without human help, it cannot make its own ribosomes, it cannot break down or recycle waste, and it “dies” after a few divisions.

Despite its wimpiness (their term, not mine), SpudCell will be a useful model for future research. Now that researchers have proven they can build a primitive cell capable of division and growth from simple molecules, others can build upon their work.

One reason that biology is so difficult to engineer is because it is infinitely convoluted. Every organism has billions of years of evolutionary baggage, which means every organism behaves a little bit uniquely, which means a discovery made in E. coli doesn’t always work in Bacillus subtilis, and vice versa. (“The only rule in biology is that there are exceptions to every rule.”) Each biologist studies their own organism, with its own little quirks, using slightly bespoke tools. This makes comparisons between laboratories frustratingly difficult, and means that progress is far slower than it ought to be.

Contrast this with physics, where scientists have performed thousands of experiments using hydrogen, a humble atom with one proton and one electron. During the 20th century, it was hydrogen that enabled physics to develop a quantum theory of matter, simply because hundreds of scientists were using precisely the same type of atom for their experiments. Scientists could directly compare, criticize, and argue over their results, because everybody was working in the same medium.

Biology has no hydrogen atom. E. coli may be the most well-studied organism of all time, but hundreds of its genes still have unknown functions. So perhaps SpudCell will become that hydrogen atom; the model that finally enables biologists to fully dissect the mechanisms of life.

II.

Synthetic biologists have been rebuilding parts of cells from the bottom up for decades, or reconstituting cellular functions using purified components. It’s just that nobody had yet managed to put these pieces together into something that behaves — even a little bit — like a living cell.

In 2001, a Japanese group purified dozens of proteins from living cells, recombined them in a tube, and showed that this mixture was sufficient for protein synthesis, meaning the purified molecules could assemble proteins from genetic code. Other groups have reconstituted feeding in test tubes. In 2004, for example, researchers built a small liposome (or bubble enclosed by a lipid membrane), studded its membrane with protein pores, and showed that this liposome could import nutrients. Scientists have even reconstituted cell division! In 2020, a German team stuck bulky proteins onto a lipid membrane and found that those proteins repulsed each other, bent the membrane and, eventually, coaxed the cell to split.3

But these are isolated examples. None of these reconstituted mixtures could eat, grow, or express genes at the same time. The more promising route toward synthetic life, then, has been to work from the top-down. Instead of assembling a cell from chemicals, researchers can instead take an existing organism and ruthlessly cut out its DNA, gene by gene, until arriving at the minimal set required for life.

In 2016, for example, the J. Craig Venter Institute released JCVI-syn3.0, a Mycoplasma mycoides cell it had reduced to 473 genes and 531,000 bases of DNA (far smaller than any self-replicating cell found in nature. Its natural genome has about 1.2 million basepairs of DNA.)4 About 149 of those genes don’t have a known function, but the cell has a metabolism, disposes of waste, and keeps dividing indefinitely; about once every three hours.

These two approaches — bottom-up and top-down — are complementary. Stripping an organism down to only the genes it requires, and then working out what each one does, is a perfectly valid way to build a simplified lifeform. But the genes that remain, after all that chopping, are what’s necessary for the cell to divide, and necessity is not the same as understanding. Kate Adamala, corresponding author of the SpudCell paper, argues that the only way to truly understand how life works, at a molecular resolution, is to assemble it from scratch.

“If we lose track of what’s in [the cell], then what’s the point?” she says. “The whole point of engineering a synthetic cell, for me, is that I need to know all the molecules.”

III.

SpudCell is a lipid blob filled with proteins and DNA. It’s probably easiest to understand how it works by looking at each “module” independently.

SpudCell with an encapsulated genome. Both the DNA and lipid membrane have been stained with fluorescent dyes. Credit: Orion Venero

Let’s start with the genome, which spans seven plasmids, or loops, built from 90,000 total base pairs of DNA. Each plasmid encodes one or more genes. One plasmid encodes T7 RNA polymerase, an enzyme that transcribes DNA into RNA; another encodes Phi29, an enzyme that copies DNA; and other plasmids encode α-hemolysin (more on this later), green fluorescent protein, and multiple “accessory” proteins that help with protein synthesis. Each plasmid also has a promoter, or short snippet of DNA, that signals RNA polymerase to switch on the gene.

DNA replication is simple; the Phi29 protein diffuses around SpudCell, randomly grabs onto plasmids, and then copies them. This happens continuously, and the energy and nucleotides needed for DNA replication all come from outside, not from the cell itself. Transcription, or converting DNA into RNA, is also simple. The T7 RNA polymerase floats around, binds to promoters, and turns each gene into messenger RNA.

Protein synthesis is more complicated, because dozens of proteins (and RNAs) are involved. There is the ribosome, of course, but also tRNAs, enzymes that make amino acids, and more.5 SpudCell does not genetically encode some of these other molecules, including the ribosomes. Instead, the researchers pack them into the lipid bubble at the start of the experiment and then, to keep the cell growing, supply additional ribosomes as food.

But SpudCell doesn’t eat food like most living organisms do. Whereas a normal cell pulls in nutrients through protein channels embedded in its membrane, SpudCell instead feeds by merging its body with lipid bubbles.

Remember the α-hemolysin protein I mentioned earlier? Well, that’s what SpudCell uses to eat. Specifically, the researchers engineered the α-hemolysin proteins to carry histidine tags, or short strings of histidine amino acids, on the portion that sticks out from the cell. These tags latch onto feeder liposomes — basically lipid bubbles filled with energy, nutrients, and ribosomes — and coax them to merge into SpudCell and dump that food inside.6

The authors argue that this is a “genetically encoded form of feeding,” because one of the proteins encoded on the plasmids (α-hemolysin) mediates it. Which, sure, that is technically true. But in reality, SpudCell has no metabolism. It can’t break down waste, recycle nutrients, or reuse atoms, which means it can’t survive for long (a problem I’ll return to later).

And what about cell division? That’s probably the hardest thing to reconstitute from the bottom up, because it requires so many steps: SpudCell must copy its plasmids, partition those copies to each half of the cell, and then pinch the membrane in two. The cytoskeleton does this in normal cells, but SpudCell has no such cytoskeleton, so it instead divides in a much cruder way, using the same α-hemolysin protein that feeds it.

The trick relies on a beautiful idea from membrane biophysics. If you stud the outside of a lipid membrane with bulky proteins, even at a low density, then the membrane will start to bend. The bulky proteins crowd and repel one another, thus forcing the membrane to curve away. As more and more proteins pack the membrane, this strain eventually forces it to split.7

SpudCell does much the same, again using those histidine tags on α-hemolysin. But this time, instead of grabbing a feeder liposome, each tag grabs onto a small linker molecule that the researchers supply into the surrounding environment. One end of that linker binds the histidine tag; the other end carries biotin. And biotin, in turn, binds tightly to streptavidin — the bulky protein that actually does the work (and which is supplied by the researchers along with biotin). As more and more streptavidin proteins grab onto the α-hemolysin proteins, the more they crowd the SpudCell surface. The membrane bends further and further, until finally the cell pinches in two.

Fluorescent microscopy images of SpudCell dividing. Credit: Kate Adamala

But this doesn’t work reliably! SpudCell only divides a couple of times in this way before dying out. “It’s not a very good system,” says Adamala. “It’s wimpy.”

Wimpiness aside, the fact that α-hemolysin — a protein encoded by the SpudCell genome — drives both feeding and cell division means that SpudCell is subject to selection, whether natural or artificial.

To prove this, the researchers mutated the promoter next to the α-hemolysin gene to make it “stronger,” or more likely to attract RNA polymerase. Cells with this mutation made more α-hemolysin mRNA, and thus more protein, and thus were more likely than cells without the mutation to consume food and recruit streptavidin for cell division.

When researchers mixed the mutated SpudCells with their normal counterparts at a 50:50 ratio, and then ran them through five generations of growth and division, they found that the mutated variants outcompeted the others and grew to about 60 percent of the total population.

This is artificial selection, and not open-ended evolution. But simplicity is the whole point of this paper! At every step, the researchers chose to implement a simplified biochemical system rather than mimic real cells. Their goal was to get something that works first, and then find ways to make it better over time. But with that in mind, it’s worth thinking about what needs to happen next to make SpudCell capable of long-term growth and division.

One issue is the DNA, which is currently split across seven plasmids. There are many valid reasons to split the genome in this way; it reduces crosstalk between genes, for example, and makes it easier to debug genes when they don’t work. Phi29, the DNA copier, also works best on small pieces of DNA and might struggle to replicate a single 90-kilobase genome.

But because SpudCell doesn’t have a cytoskeleton, it partitions its plasmids randomly during each division. In other words, each plasmid ends up in a random daughter cell, and there is always a chance that one of the daughters won’t get all seven plasmids. Without all the plasmids, the cell will “die.”

(Say the cell has five copies of each of the seven plasmids. In that scenario, there is only an 80 percent chance that a daughter cell will get at least one copy of each plasmid. After five rounds of cell division, only 30 percent of cells will have all seven plasmids.)

Metabolism, though, is probably the biggest issue for long-term growth. Remember that SpudCell can’t make its own ribosomes or energy, so the researchers pump in fresh supplies constantly. This is fine, in principle, but SpudCell also can’t recycle its broken proteins. Ribosomes wear out and proteins have half-lives. When these proteins “break,” they simply sit there and fill up space.8 With each cell division, a larger and larger fraction of the cell’s internal volume becomes packed with junk. The next step to make SpudCell grow for longer, then, is probably to find a way to recycle or remove broken proteins.

This is particularly important for α-hemolysin. Remember that these proteins sit in the cell membrane, where their histidine tags link up with either feeder liposomes or streptavidin. This linkage to streptavidin, in particular, is really difficult to undo! The used up α-hemolysin proteins just sit in the membrane and take up space, then, and as SpudCell divides, this useless junk studs more and more of its membrane.

If the researchers figure out how to solve these problems and stretch the division cycles to 20 or 30 generations, far more interesting experiments will become possible. For example, what if you could swap in an error-prone DNA polymerase — an enzyme that randomly introduces mutations — and then follow the mutating cells over multiple generations to see if they evolve new functions? You could even feed the SpudCells random genes, coax them to incorporate that DNA, and see whether they repurpose it for new functions.

IV.

With all these caveats, and all the comments on SpudCell being “wimpy,” it’s worth thinking about why this paper matters at all.

One reason is that this is the first time anyone has combined all the in vitro modules into a single system that works. SpudCell only sortaworks, and the authors admit as much, but that’s okay; at least biologists now have a platform to add new modules, like a metabolism or a cytoskeleton, and see whether they work as part of a complete system.

For this reason, SpudCell really does feel like it could become the hydrogen atom for biology. And to make that vision a reality, Kate Adamala (together with Drew Endy, Jan Jedryszek, and Chris Raggio) are launching a nonprofit organization, called Biotic, to turn bottom-up cell assembly into an engineering discipline. Biotic has already raised more than $6 million in philanthropic funding.

“The main goal is to unify [the field],” says Adamala. “People make awesome modules, but you try to combine two different modules and you pretty much have to re-engineer half of it from scratch, because it wasn’t made to operate on the same chassis. It’s kind of like writing apps for an operating system. You’ve got to declare which operating system your app is going to work on.”

Biotic will release step-by-step protocols for building cells, open-source its plasmids and components, and fund research that directly improves SpudCell. For example, Adamala plans to consolidate the seven plasmids into a single chromosome, solve the partitioning problems, and engineer these cells to make their own ribosomes from scratch.

Progress, one hopes, will be swift. Biology has too many unknowns; most experiments don’t replicate; and few people truly work on the exact same organism using reliable, replicable methods. If we’re ever going to understand an organism completely, we may need to build it from the bottom up, using solely molecules we put there by design.

Thanks to Richard Murray for helpful comments. Mistakes are my own.

In 1970, biologists in New York published a provocative paper titled “Reassembly of Living Cells from Dissociated Components.” They basically sucked the nuclei out of Amoeba proteus cells, used a micro-needle to extract cytoplasm (about 75 percent of the total volume), and then injected both cytoplasm and a nucleus from another Amoebaback into the dead cells. The “reassembled” Amoebasurvived about 80 percent of the time.

Even presidents fall victim to the hype. When Arthur Kornberg made some viral DNA and showed it was infective, Lyndon Johnson told an audience, “Some geniuses at Stanford University have created life in the test tube!”

The bulky proteins repulse each other, forcing the membrane to bend away from them.

Its natural genome has about 1.2 million basepairs of DNA.

Ribosomes are more than half RNA by mass; so they are really less protein and more ribonucleotide machine!

The feeder liposomes are studded with a molecule called nickel-nitrilotriacetic acid, which latches onto the histidine tags.

The approach was inspired by work from Reinhard Lipowsky, a German physicist. As Adamala puts it, “if you stuff giant GFP proteins onto the outside of a liposome, the membrane starts bending, and eventually dividing.”

The α-hemolysin pores can export some small molecules, but not large proteins.

Hyperspectral Biology Fund

Niko McCarty — Tue, 16 Jun 2026 18:24:42 GMT

I’m launching a $75,000 microgrant program to grow the field of hyperspectral biology.

Hyperspectral biologists aim to study life through the full spectrum of light that organisms (and their molecules) reflect. They also engineer organisms that emit molecules which absorb or reflect light in particular ways, such that we can “read out” their molecular states from far away, including from satellites orbiting Earth.

Whereas a normal camera collapses light into just three channels (red, green, and blue; mimicking the human eye), hyperspectral cameras capture the full spectrum of wavelengths for every pixel in an image.1 These special cameras enable us to see aspects of biology that are normally invisible to the naked eye. (Unfortunately, hyperspectral cameras are also expensive, costing anywhere from $10,000 to $100,000 each.)

Hyperspectral cameras collect data from many more wavelengths of light, and can thus resolve features from an image that would otherwise be invisible. Credit: Lucas Bosch, Wikimedia

NASA scientists built the first hyperspectral cameras in the early 1980s to map mineral deposits and algal blooms; each type of molecule absorbs and reflects light in a distinct way, and so you can figure out what is happening on Earth by running algorithms that deconvolute the images. Hyperspectral cameras are widely used today to quantify shifts in chlorophyll in plants, for example, which strongly absorb blue and red light, and to monitor moisture levels in soil.

It wasn’t until last year, though, that MIT scientists (led by Yonatan Chemla, a friend of mine) merged hyperspectral cameras with synthetic biology.

For a study in Nature Biotechnology, Chemla and colleagues engineered two strains of bacteria to overproduce pigments (called biliverdin IXα and bacteriochlorophyll a) that absorb light in a distinctive way, meaning they each have a unique hyperspectral fingerprint. The researchers sprayed these engineered cells onto patches of soil at Fort Devens, a military base in Massachusetts, and then flew a hyperspectral drone overhead. By using a computer algorithm to separate the pigment’s signal from the background noise of dirt and sand, they could detect the locations of these microbes from up to 90 meters away. (In unpublished work, they did the same with a satellite orbiting hundreds of miles above Earth.)

Remote detection of bacteria emitting hyperspectral molecules from 90 meters away. Credit: Chemla et al. (2025).

But this is just the beginning of what’s possible!

Therefore, I’m giving out $75,000 in microgrants — ranging from $5,000 to $12,500 each — to help grow the field of hyperspectral biology. These grants are supported by the Experiment Foundation, and you can click here to apply for funding by July 10. I’ll entertain a huge range of ideas, but here are some things that I think would be particularly useful…

Apply for Funding

First, we need cheaper and (ideally) open-source hyperspectral cameras. This is a major bottleneck on people’s ability to enter this field. Hyperspectral cameras usually cost tens of thousands of dollars, as I said, and most academic labs are not going to shell out that kind of money to pursue a risky research project. Most labs, similarly, don’t have military connections to access satellites with hyperspectral cameras. Ideally, there would be an open-source initiative to make cheap hyperspectral cameras, give away the blueprints, and also sell them (pre-assembled) for a profit. (There was an open-source hyperspectral camera initiative, called OpenHSI, but it hasn’t been active for at least a year.)

Second, we need foundational datasets of hyperspectral profiles! We literally need to point hyperspectral cameras at different molecules and varied types of cells, collect their spectra, and then release those data publicly.

When Chemla started his experiments, he spent a few weeks searching through chemical databases to find hyperspectral data. He figured that someone must have collected spectral plots for various molecules, just to see which wavelengths they absorb most strongly. But no! He could only find about 40 molecules with any spectra at all.

This is bad. We need to collect much more spectral data on molecules and organisms. Indeed, we should collect full spectral profiles for every naturally occurring biomolecule and release the data publicly. Some molecules will have unique fingerprints that increase the resolution of this technology and let us see things more clearly. The dataset would also enable us to build machine learning models that can be used to design new types of molecules with desirable spectra.2

Third, we should “port” hyperspectral reporter genes into plants and other organisms. Chemla’s work was limited to microbes, so I would love for someone to engineer a plant (maybe tobacco or something simple) to emit a hyperspectral molecule, and then fly a drone overhead to see if you can detect the signature from far away. The reason this is important, I think, is because there are stringent regulations around releasing engineered microbes into the wild. It’s almost impossible to release microbes outside of containment vessels, unless it’s for agriculture (like biopesticides) or it’s a food, such as a probiotic. The regulations are more navigable for plants.

And finally, there’s a lot of room to improve algorithms. We need better image detection algorithms for hyperspectral data, and especially open-source computational tools. I’d like to fund tools that enable scientists to upload images and deconvolute the data for their biomolecule of interest. It’d be cool if people could upload their “hyperspectral molecule” of choice, submit data on that molecule’s spectrum, and then automatically run an algorithm to see if the signal is visible in an image. AI tools could help a lot here. On a related note, I suspect there’s also a lot of work that could be done to use AI to design molecules with desirable spectral plots.

Fortunately, the things holding back hyperspectral biology seem to be the actual biology, and not all the other infrastructure required. Many companies have already launched hyperspectral satellites into space; Pixxel, for example, is building what it calls the world’s highest-resolution hyperspectral satellite constellation. There’s also Orbital Sidekick, Planet, Wyvern, and a bunch of government projects doing hyperspectral stuff. I’d estimate that there are about 60 satellites with hyperspectral cameras already in orbit.

These satellites are being used to monitor crop health and soil moisture, as I said earlier, but also to track methane emissions and oil spills. And, to make money, these companies sell hyperspectral images to speculators searching for minerals and other resources (especially lithium and copper deposits).

My dream for hyperspectral biology is to use it as a tool to monitor all the things we care about on Earth. We can engineer organisms to sense explosives, for example, and then spray them in places where unexploded landmines are known to exist. When these organisms detect an explosive, they would emit hyperspectral molecules which are visible from satellites. In Laos alone, the U.S. dropped more than 270 million cluster bombs during the Vietnam War, and about 80 million of them did not detonate. More than 20,000 people have been killed or injured by these bombs since the war ended. Similarly, we could engineer plants to sense pathogens and release hyperspectral molecules in response. Then, we could use drones with hyperspectral cameras to detect outbreaks before they spread. The beauty of biology is that it can be engineered to sense just about anything, from ions to metals and pollutants to minerals. We can “hook up” this sensing capability to hyperspectral molecules, and thus use living organisms as living biosensors (visible from space!) for anything we want.

The end goal is a planetary-scale, autonomous biosensing network, where molecules emitted from lifeforms are used to monitor the health of Earth as a whole.

My plan is to give out 8 to 12 microgrants with this first tranche of funding. Note that every recipient must release their data publicly, and I’ll check in every couple of months to see how things are going. It’s perfectly fine if experiments fail, since some of these proposals will be risky. The funds can only be used to support experiments, not events or workshops. I’m being advised on this program by Yonatan Chemla and Andrew York, a physicist who builds microscopes at the Chan Zuckerberg Biohub.

Of course, I also know that $75,000 won’t grow this field nearly as much as it deserves. It’s only a starting point, but I hope these microgrants grow interest in this field and also help to alleviate some of its major bottlenecks. DARPA has also been a major funder of hyperspectral biology; they supported Chemla’s work, for example, and have also funded groups at Purdue studying whether plants show observable spectral responses to synthetic chemicals. It’s good that people are already pushing on both fronts: engineering organisms to emit hyperspectral signatures, and also probing natural organisms for signals we can read out using hyperspectral cameras (but which are normally invisible to the naked eye and normal cameras).

If these grants work out, I hope larger philanthropists will follow and support this field. And finally, if you’d like to support these microgrants, please email niko@asimov.com to donate. Every dollar will go directly to scientists; there is no overhead.

In other words, every pixel in the image has a 2D chart, or “spectral plot,” attached to it, where the x-axis is wavelength, and the y-axis is reflectance.

By finding more molecules with unique hyperspectral fingerprints, we’ll also be able to do “multiplexed” imaging. Say we want to engineer plants that detect 10 different pathogens. If a plant detects pathogen A, it could be “programmed” to emit hyperspectral molecule A, which has a unique fingerprint, and the same for B, and so on. You could then deconvolute many different hyperspectral signatures to record even more information in a specific spatial region on Earth.

The Bitter Lesson for Biology — Adam Green on Virtual Cells and Scaling Laws

Niko McCarty — Fri, 12 Jun 2026 16:40:31 GMT

Markov Biosciences, a startup in San Francisco, is betting that biology is about to have its GPT moment. In this episode, founder Adam Green explains the "bitter lesson" for biology, the idea borrowed from Richard Sutton that large unbiased datasets and the right training objective tend to outcompete models with hard-coded rules and human priors. Adam thinks, in particular, that the virtual cell field overinvested in collecting expensive perturbation data. Green’s counterargument is that the data needed to train useful virtual cells is not limiting, but rather compute (and the loss function) are. By treating single-cell RNA-seq as a ranking problem rather than raw counts (a century-old idea traceable to a 1927 psychophysics paper), they found that virtual cells pre-trained on plain observational data show clean scaling laws, getting monotonically better at predicting unseen perturbations as the models grow, and beating a state-of-the-art model built specifically for that task.

If you’d like to sponsor a future episode, please email nsmccarty3@gmail.com. To listen to this episode, search for “The New Biology” on your favorite podcast app.

Timestamps

00:00 — Cold open

01:58 — First prospective clinical predictions from a virtual cell model

05:38 — What is a "virtual cell"?

08:01 — The problems with single-cell RNA-seq

11:31 — The urns analogy

19:54 — Why RNA, and observational vs. perturbational data

23:29 — The bitter lesson for biology

29:06 — Generative ranking and geometric Plackett-Luce

38:27 — Ablations and loss function

47:23 — Cells as specimens

59:26 — The Antibody-Drug conjugate case study

1:11:16 — Will we ever understand biology?

Transcript

Adam Green (00:00) Yeah, I think we’re talking past each other.

Niko McCarty (00:02) Like what Markus Covert is doing—saying, I want to simulate a cell using mathematical equations. Is that kind of thing useful?

Adam Green (00:11) I think it’s a fun abstraction. The sort of unsupervised pre-training and scaling we saw in text, images, protein sequence modeling is not going to work in the same way when we bring it to single-cell biology, and therefore we need a new approach. If you said that in 2018, it’s insane. When I say something like that now in 2026 about biological world models, people think you’re insane. As you scale the model up and it saw more and more observational data—then you fine-tune a tiny bit of perturbational data, and then you evaluate it on perturbations it has not seen. It gets monotonically better at that task. So much so that it beats the current state-of-the-art model that was pre-trained on perturbational data with multiple injected knowledge sources, specifically for the task of perturbation prediction. Yeah, I think long term the ambition is solved biology.

Niko McCarty (01:03) Today’s guest is Adam Green. He’s the founder of Markov Biosciences, a company building a virtual cell for biology. And he has some viewpoints about how to train these models that differ from the mainstream. One of the things I really want to get at in this conversation is this idea of: will we ever develop a complete understanding of the cell? And if so, how will we do that? Will we do it using black-box models with sparse autoencoders, where we can interpret the outputs of that model? Or will we ever be able to build a bottom-up mechanistic understanding of the cell? But before we go there, I want to ask Adam about a recent paper that Markov put out, where they made very specific predictions about a class of drugs known as antibody-drug conjugates in cancer. So Adam, welcome.

Adam Green (01:58) Yeah, good to be here. So we put out a paper—if you could call it that, a Twitter article currently—on a particular antibody-drug conjugate. So antibody-drug conjugates are the hottest modality in oncology right now. There are hundreds of clinical trials ongoing. The basic concept is: small molecules are promiscuous, they’re hard to target any particular cell type with. But antibodies are quite specific. And so, what if you were to conjugate an antibody with a small molecule or some kind of payload, and use it as a kind of precision-guided payload delivery system to a cancer cell? People have been trying this for, I guess, two-plus decades. And we looked at one of the most popular targets for these ADCs, which is TROP2. TROP2 shows up on many of these epithelial tumors—lung, breast, bladder. And the surprising thing: we found no one really knew how the complex of the antibody bound to the receptor internalizes into the cell to deliver the payload. It’s something you’d think would obviously be known, given that thousands of patients have been dosed. There are already approvals for these ADCs. And so we took a virtual cell and we queried it, and we said: what is providing the ride for this receptor across the membrane, and then after that, how does it traffic inside the cell to reach its destination? I think our model came up with a pretty clear prediction. It’s falsifiable. It seems to converge with other lines of evidence from clinical pharmacokinetics, tumor expression. And what makes it interesting is, I think it is the first prospective prediction from a virtual cell with real clinical stakes and large sums of pharma revenue on the line. It could pan out, it might not pan out, but as a class of thing that can be done with virtual cells, I think it is unique and the first of its kind.

Niko McCarty (04:03) What were the actual predictions, and do you know of anybody testing these? Are you planning to fund experimental studies to do this, or are you just kind of hoping that the big pharma companies will test your predictions?

Adam Green (04:18) We’ve scoped experimental packages with CROs to test the predictions, all the way from the initial mechanism—which we believe is the co-localization of this receptor with a particular tetraspanin, which is a special type of protein that we think organizes the trafficking, or the internalization, of this complex into the cell. And so we have a bunch of experiments looking at two different drugs: Datroway, which is AstraZeneca’s drug, and Gilead’s drug, Trodelvy. And what we think explains the difference in their two pharmacokinetics, their clinical outcomes.

Niko McCarty (05:03) And just to clarify: you made these predictions using a virtual cell model that Markov trained, and you did so without any underlying biological knowledge about ADCs. What I want to ask, basically, is: okay, ninety percent of drugs fail, and everybody talks about Eroom’s law. My question is, in which ways would virtual cell models, like the kind that Markov is training, actually increase that efficacy? Because we haven’t really seen evidence yet that virtual cell models actually improve clinical success rates of drugs.

Adam Green (05:38) Yeah, it’s a big question. Maybe it’d be useful to step back and define this term “virtual cell,” because it’s pretty nebulous. Personally, I’m not in favor of the term. I think it’s been so debased as to be beyond use. But generally, what people are gesturing at with this term is a machine learning model trained on some sort of biological data that does something. That’s not a very useful definition, but it’s such a big umbrella that that’s basically what it means. Two axes that you could parse this at are: the scope of the system you’re concerned with—so maybe it is actually at the level of the cell, as the name might imply; maybe it’s at the level of a spatial tumor biopsy; maybe it’s even at the level of clinical response. That’s one axis. And then the other axis is: how do we relate to these things as scientific objects? What do we expect of them? The distinction I like here is between simulators and specimens. A simulator, I think, is the dominant view of what a virtual cell is. It is a stand-in for experiments we’d otherwise have to run in the lab. And the thinking goes something like: experiments are really important for biology, for whatever reason. They’re slow, they’re noisy, they’re often costly. If you had a computational stand-in for these experiments that are costly, take a long time, and are really noisy, and you could run it at basically zero cost, it would somehow accelerate biomedical progress. The alternative view, and the one I think we subscribe to at Markov, is that virtual cells are going to be more useful as specimens. By this I mean: if you train a machine learning model on biological data the right way, it should learn—in making the loss go down—something about the nature of the underlying biological system you’re trying to model. The tough part is, how do you actually extract that understanding from the model and make it useful?

Niko McCarty (08:01) I want to understand how Markov trains a virtual cell. Obviously you have some belief that single-cell RNA-seq is the right sort of data to collect, but you feel that there are serious flaws with that data. So I want to go through all that, and then along the way kind of understand what other people are doing and how Markov differs. So let’s just start with a discussion of how people capture single-cell RNA-seq data, and what are the sources of bias in that data collection when somebody measures the transcripts within a cell?

Adam Green (08:38) So when you do this process of encapsulating a cell in a droplet and lysing its contents, there are a few sources of technical noise that can emerge. In the case of these polyadenylated capture methods like 10x 3’, one step is just: how many of the transcripts do you capture? You can imagine a cell is quite literally a bag of molecules. It’s a bunch of RNAs, proteins, et cetera, floating in solution. And if you want to get an accurate representation of that cell, the thing you’d want to do is capture all of its contents. But due to quirks of the library chemistry, you usually only capture a subset. Initially this is quite low, on the order of sub-ten percent. With modern library prep techniques, you’re getting thirty-plus percent. So it’s better.

Niko McCarty (09:31) So how many transcripts does a typical human cell have floating around?

Adam Green (09:37) Yeah, it depends on the cell type—is it highly differentiated or not. But say, on the order of a hundred thousand is a rough estimate.

Niko McCarty (09:47) So you might capture twenty thousand of those transcripts.

Adam Green (09:50) The big question is, how many of the transcripts do you capture? And then downstream, what proportion of them get sequenced and show up? The field realized pretty quickly, by doing some really crafty studies, that we were not capturing all the transcripts. Now that we’ve improved capture rates pretty substantially, the following step is—

Niko McCarty (10:14) But sorry, where are we today? So it was twenty, thirty percent. How many transcripts do we capture today with the most current methods?

Adam Green (10:22) I think with the 10x GEM protocols it’s on the order of thirty-plus percent. The technical factor I think is more important is sequencing depth. Suppose I’ve captured thirty thousand transcripts—thirty thousand unique molecules. Now I need to read them out. Long story short, the amount of sequencing you do of the transcripts you capture directly determines the distribution of RNA counts. And the field knew this—

Niko McCarty (10:27) Okay. So not a huge boost.

Adam Green (10:51) —knew this in terms of expression and absence of expression: counts of zero versus ones or greater. This is called dropout. There are various terms for this, but the field was obsessed with dropout. They were saying, if I have a vocabulary of twenty thousand genes, why am I only seeing five thousand of them in any given cell? Why are fifteen thousand at zero? Is it because they aren’t expressed in that cell, or is it because I’m not capturing and sequencing them? And so that conversation, and the statistical models that followed trying to explain this, is the beginning of where our approach differs from all these other groups and how we train these models.

Niko McCarty (11:31) So I think one useful way to understand this distribution when you’re doing single-cell RNA-seq would be—you’ve told me before about this analogy about urns. If you imagine each cell is an urn, and it has balls of twenty thousand different colors inside of it, and each ball is present in a different amount corresponding to the RNA transcripts. Tell me that story. Tell me about that analogy, and how you think about what these distributions actually are when we do single-cell RNA-seq.

Adam Green (12:04) Yeah, it’s a vivid metaphor—colored balls and urns. It comes from probability theory, but I think it illustrates the problem well. So imagine you have an urn. It has twenty thousand different colors of balls—red, midnight blue, green, taupe, if you like—at varying concentrations. Just assume the urn is a hundred thousand balls. You don’t know the underlying proportions of the different colored balls. This is our cell. I ask you to draw, say, a thousand balls from the urn. You get, let’s say, a hundred red, five blue, et cetera, et cetera, zero taupe. You look at the taupe and you’re like, why? Does this urn not have taupe? Does it not like taupe? Or did I simply not get lucky enough to pull a taupe ball? You do this across replicates. So I have ten urns, they all look the same. They’re the same type of urn, they have the same urn behavior. And then for some of them I get taupe balls. So if you do this across enough replicates and you actually plot the distribution of counts you get for taupe from each urn—let’s say I have a thousand urns, I do a thousand draws from each, and I just plot how many taupe I get in each cell—what it’s going to look like, assuming taupe actually is a pretty low proportion in the cell, is I’m going to have this massive distribution at zero, and then some kind of right-skewed distribution of one, two, three, four, five, et cetera. And what the field tried to do is they said, well, what if we fit distributions to these count data? What do they appear to be? And so the nature of these statistical distributions—and in particular, do we see more zeros than we should expect?—was one of the major questions the single-cell biology field concerned themselves with. What the field did not realize, and this is somewhat surprising given that you’d think they’d concern themselves with such questions, is that the number of draws you take from the urn determines not only the rate of zeros you see, but the rate of all the other integer counts you see.

Niko McCarty (14:29) What do you mean by that? You mean that in some cells that gene is actually not expressed? What do you mean by the amount of zeros that you would expect to see?

Adam Green (14:38) So the field is wrestling with this question: we see a lot of zeros for many genes. Is this indicative of the gene not being expressed in these cells on average, or us just not capturing and sequencing it? Much of the field leaned toward the former explanation, and came up with increasingly convoluted statistical models to explain why we saw so many zeros. But what Valentine, and later Serge and Stevens, formalized in a really great model is that if you look at the actual distribution of expression where we should expect it to be, and you mix it with this measurement model—this model of what our drawing process is like from the urn, how do we take out the balls?—the resulting distribution explains pretty well the number of zeros we see, and the full tail of counts. And then the big unlock for us was realizing that these two key levers I’m talking about—transcript capture and sequencing depth—affect not only the zero rate, but the full distribution of counts. And where this gets to virtual cell models is: if I have two identical urns and I take a thousand draws from one and ten thousand draws from the other, even though the underlying expression proportions might be identical, my view of what’s going on inside those urns is going to be radically different. And that informs how you want to train models on these data.

Niko McCarty (16:05) Just to take a quick synopsis here. So you’re saying that the field collected single-cell RNA-seq data from hundreds of thousands of cells, hundreds of millions of cells—okay, so the single-cell RNA-seq field has data on hundreds of millions of cells. And you’re saying that some of those cells have different chemistries than others, which means that their capture rate is very different between cells.

Adam Green (16:15) Hundreds of millions.

Niko McCarty (16:34) And some of the cells have been sequenced much more shallowly than others, and some have been sequenced very deeply. And the big problem is that when people use all these data to train virtual cell models, that introduces biases, essentially. And we have to deal with those biases using new statistical frameworks.

Adam Green (16:53) Yeah. So the big question is: cells vary biologically, but they also vary due to some of these technical factors. Maybe the lab assistant was having a bad day, the mouse was having a bad day, it was warm out. And the question is, when you observe two cells, how do I know what is the signal and what is the noise?

Niko McCarty (17:16) People were treating all of them as just the same. So they were training these models using just aggregated data.

Adam Green (17:22) Right. So people tried to get around this in ways that were kind of naive. One thing you can do is say: rather than trying to model the individual counts, I can model frequencies. So maybe I only have a thousand transcripts I captured, but if I look at the rate—if I say I have five blue out of a thousand, call it a rate of point-five percent—maybe that’s more robust than modeling the individual counts. There are lots of reasons why this doesn’t work. Firstly, what do you normalize against? Secondly, as you get to the low-count regime of one to four, there are actually really great papers that come out of ecology showing why these sorts of transformations do not properly deal with the noise issue. Really, the crux was: the field tried, around 2023, to train some of these early virtual cell models, saying, given the expression of some genes in a cell, predict the other genes and their expression. The field tried that. And the conclusion they came to is that this approach is doomed to fail.

Niko McCarty (18:22) When was that? Twenty twenty-three?

Adam Green (18:48) Yeah. The Geneformer paper was accepted into Nature in 2023. scGPT came out maybe a bit later. But these were pretty naive. Geneformer was prescient in that it was trying to do some kind of rank-based prediction. But in general, the conclusion the field arrived at was that fundamentally there is something wrong with these data, because when these models were benchmarked—and there are a couple of bearish benchmarking publications that came out late 2023, early 2024—it looked like these models did no better than really naive methods of just predicting the mean expression in the cell, for tasks like cell type classification, whatever. And I think the field then conflated the failure of these models to scale with a failure—or a poverty, some kind of innate badness—of the data.

Niko McCarty (19:17) Biology could not have scaling laws because it’s just more complicated than other domains.

Adam Green (19:22) Or some kind of appeal to causality. A lot of these lines of thinking have not been made clear, and they’re pretty ill-formed. When you push people on them, there appear to be gaps. But the basic position that was arrived at is that the sort of unsupervised pre-training and scaling we saw in text, images, protein sequence modeling is not going to work in the same way when we bring it to single-cell biology, and therefore we need a new approach.

Niko McCarty (19:54) At some point, the field decided that a cell is a bag of RNA transcripts. And if we capture good data and train models on just those RNA transcripts, we’ll be able to understand the cell as a whole. In other words, RNA is the Goldilocks molecule to capture in the cell to build virtual cells. And of course, there’s a lot of discussion today around building multimodal models that incorporate proteins or spatial information. But at some point the field decided that single-cell RNA-seq was going to be the data on which to train these models that do much more than just predict gene expression. So I’m kind of curious about that philosophy. Why did the field decide that this was the right medium with which to train their models?

Adam Green (20:44) Yeah, I think the evidence that people subscribe to that view is that there are billions of dollars going both to for-profit startups and nonprofits to generate RNA readout data. That definitely seems to be the consensus. But I do actually think there are reasons to believe that RNA might be preferable to protein, even if you had similar technology for doing single-cell proteomics. And this basically gets back to this statistical view, the idea of compressed sensing. Like we said before, you have these urns; in any cell, maybe you have on the order of a hundred thousand balls, and you do some draws from them. If you need to look at the proteome, you are talking two to three, maybe four orders of magnitude more balls floating around, just because every RNA transcript can code for multiple proteins. Meaning that if you want to get an accurate representation of what’s going on in that cell, you need to do more draws. But I think the claim would be—and this is the strong claim I subscribe to, and I think our white paper gestured at it—they encode this state in an RNA, in the epigenome, in the proteome. But if you learn a good enough model of a modality that has enough signal in it, it will converge to a shared statistical reality that allows you to predict something like the proteome, and maybe even subcellular localization of proteins, as we showed in our paper. Does this mean that RNA is the modality to rule them all in the limit? Probably not. There are lots of reasons why it’s suboptimal. The field has converged on RNA as being the thing they’re collecting. And then the key distinction becomes—and this relates to the earlier conversation about the nature of these data—do you just collect observational data of cells? Just cells doing their thing, hanging out? Or do you need perturbational data? I have a cell, I have some kind of exogenous intervention, I apply a small molecule, I knock out a gene, I overexpress a gene—what happens? In order to learn a world model of the cell, or a virtual cell. Given that we’ve converged on RNA as being the modality, the question is, which of these two data sources do you need? Is observational sufficient, with a little bit of perturbational sprinkled on top? Or do you actually need massive amounts of perturbational data to learn a so-called causal model of the cell?

Niko McCarty (23:29) What is the thesis? So Arc Institute, of course, is training their virtual cell models using perturbation data, right? The pre-training is done using perturbation data. What is their thesis as to why that’s the right framework? Because my naive guess, as an outsider, would be to say that of course we should pre-train the model on just observational transcripts, and then fine-tune for whatever task we want that virtual cell to do. And so I’m genuinely curious as to why such a big segment of the field in virtual cells is training specifically on perturbation data.

Adam Green (24:07) Yeah. I won’t speak about specific actors, but I’ll speak about the camp as a whole—and this is the dominant camp. So I’m going to give an extended analogy to the field of NLP, natural language processing, because I think it has already undergone this kind of transition that the biological world models field, of which virtual cells are a subset, is currently undergoing. And in retrospect, why people went down this path will become clear. Imagine it’s the mid-2000s, you’re working in NLP, you want to build machine learning models for language. Say you’re really concerned with translation: I want to build a model that can translate from English to French. Naively, you might think, well, how about I go talk to a bunch of translators, collect a bunch of parallel texts, and then train a model directly on that? Or you might think, language has grammar and syntax, and we can decompose it into these Chomskian trees—maybe I need to go collect a bunch of that data to teach my model what proper syntax and grammar are. And so this is really the dominant mode of NLP for many decades. You might call it big parallel corpora—dating back all the way to the Rosetta Stone—big treebanks, these trees decomposing sentences into their structure. But what people started to attempt in the early 2010s is, they said, maybe we can get good representations of language at a general purpose, and then fine-tune them on a tiny bit of data to do these tasks we care about, like translation. And so there’s a long history of these unsupervised language learning approaches. But basically the watershed moment was GPT-1, by Radford et al. in 2018, where they showed that if you take a large corpus of data—BookCorpus—you take a general-purpose model, the Transformer, which was invented by Google a year prior, and you do this basic prediction task of “predict the next token,” the model developed general-purpose representations that could then be transferred to downstream tasks such as translation, allowing you to get away with far less translation data to be good at translation. Now we look back on this, now that everyone is scaling-pilled—and here I’m referring to Richard Sutton’s bitter lesson. The potted version is that the history of machine learning shows that approaches that try to introduce how humans think about things should work, ultimately end up losing to approaches that leverage computation. The ML field has completely swallowed and assimilated the bitter lesson pill, and now quite directly believes that as you scale these models up on a big observational or unsupervised dataset with the right objective—in this case, with Radford et al., predicting the next token—they will develop arbitrarily capable representations useful for downstream tasks. If you said that in 2018, it’s insane. When I say something like that now in 2026 about biological world models, people think you’re insane.

Niko McCarty (27:32) What is the bitter lesson for biology, in your worldview? You’re saying the field is training these virtual cells on paired data, these perturbation data, but instead we need what?

Adam Green (27:45) Our belief is that biological world models—this idea of training machine learning models on data collected from biological systems—is going to undergo, or already is undergoing, the same evolution that NLP underwent, or image modeling, or protein sequence modeling, which I’d say is distinct from biological world models. And there’s going to be this sloughing off of inductive biases, of human priors on what the model should know. The things we care about will ultimately be learned through a bunch of unsupervised pre-training, followed by a tiny bit of post-training on the things you care about, like perturbation data. And so Yann LeCun’s metaphor of the cake is quite useful here. He says, if you’re making a cake, the big thing is the sponge cake—stack it up; the icing on top is secondary; and then the cherry is tertiary, on top of that. But you don’t have a cake if you don’t have the spongy layers. And so the field’s just going for straight icing. We’re saying, no, in principle you can bake a cake using these data. You just have to understand the data-generating process and match it to the right objective—just as in the history of NLP, image modeling, you had to find the right objective for the data.

Niko McCarty (29:06) And what is the right objective? Okay, let’s talk about that. What is generative ranking? This has to do with how you deal with the zeros, right?

Adam Green (29:09) Generative ranking. The zeros, and the whole distribution of counts. So to return to the colored balls and urns: imagine I have two identical urns, they have the same content inside them, the same number of balls. I do two draws of a thousand each from the two urns. I look at the data, and I’m like, okay, this one has one of this color, this is zero of that color, and so on. I know this is due to technical artifacts. How do I train a model across such data? Our claim is that if you want to train in the manner of a GPT-1-style model—large, unsupervised, learns with more data—you need a loss objective that is able to abstract away from the noise in these data, the randomness in two sets of draws from the urns, and pay attention to the signal. And so the punchline is that the ordinal structure—that is, the ranking structure of “gene A is expressed more than gene B, more than gene C, more than gene D”—is more robust to these library chemistry technical artifacts.

Niko McCarty (30:20) You mean more robust than either counts or frequency of counts? It’s better to just say gene A is the highest, gene B is the second highest.

Adam Green (30:29) Yes. And so in our paper we test this pretty rigorously, doing these ablations, giving the best possible chance to these other types of models at different loss functions, and we isolate pretty clearly that the loss function appears to be the missing ingredient that unlocks scaling on unsupervised data and allows you to bake this cake. Now, the way we actually got there is a really long story, and it was kind of accidental, but—

Niko McCarty (30:55) Tell me that story. How did you come up with this approach? It’s called geometric Plackett-Luce.

Adam Green (31:02) Maybe to set it up with the balls and urns, and to make it more concrete, the task is something like this. I have a hundred million urns. I do draws of varying depths from them. Now I have this task: I want to learn a general model of “urn-ness.” Like, in this type of urn, I tend to see this color ball; these two colors tend to covary across different types of urns. This is a machine learning problem. The challenge you run into is, again, the underlying signal for learning this is really noisy. So maybe in one urn, if I pay attention to the zeros, I’m just paying attention to technical noise, and that’s going to dilute my signal. Maybe there’s variation purely in who’s doing the draws from the urns, and I pick up on that. Or which lab it’s being done in. And so the task becomes: given a set of draws from an urn—let’s say I draw a thousand balls, I put them behind a veil, I don’t tell you the colors. I reveal to you the first color is red. I’m going to say red is in the middle of the ranking distribution. Given that, I want you to predict the rest of the balls and their relative orderings. So when you do this, what does your model have to learn? It’s not predicting the true underlying proportions. You can’t see those unless you draw all the balls out of the urn. But it’s trying to predict, given something I know about the library chemistry, given I know how many balls I drew from the urn, what does this type of urn tend to look like? Which balls does it tend to express? If I now know that it expresses this ball at high rank, it gives me information about the rest of the balls in the urn. So the way you actually parameterize this—and this is the evolution of the loss function that took multiple years, just iteration empirically—and the funny thing is—Mark—yeah, and all credit here due to Glenn, brilliant ML engineer who figured all of this out—in the end we converged on what turned out to be a hundred-year lineage in ordinal ranking, dating all the way back to this 1927 psychophysics paper by Thurstone. He’s asking this question of inter-rater reliability. If I have balls of different weights, or if I have sounds of different pitch, and I ask person A which ball is heavier, or I ask person B which ball is heavier, which sound is higher-pitched or not, there’s going to be variation in that process. But generally speaking, these people are going to concord on which balls are heavier, which sounds are higher-pitched. Whereas if you ask them to tell me the weight in grams of this ball, or tell me the absolute pitch of that tone, there’s going to be much more noise.

Niko McCarty (33:57) I see. So you eliminate noise by asking people for just ordinal rankings.

Adam Green (34:03) Yeah. I wouldn’t say you eliminate noise, but you’re more invariant to it. And so this is a 1927 paper from psychophysics. I didn’t actually know about it until very recently. But it turns out there’s an extremely rich lineage pursuing this idea of ordinal ranking, scales of measurement, types of measurement in psychology, for instance, that eventually converged on this model we ended up using, calling geometric Plackett-Luce. So there’s this really rich, nearly century-long history, dating all the way back to Thurstone, on ordinal comparisons and modeling them—preferences, Likert scale data, “rate this on a scale of one to seven,” “what is the severity of your symptoms?” And ultimately, what it converged on, in the model that we use, is geometric Plackett-Luce, which has found most recent application in ranking chocolate puddings and golfers. So it’s not a very popular concept. It’s pretty niche. But it allows you a way to model this kind of ranking problem and express direct likelihoods of “my model predicts here’s the actual permutation of rankings of colors we expect to see,” including with ties—which is a big leap in what makes geometric Plackett-Luce workable.

Niko McCarty (35:30) I see. And just to clarify, Markov—so when you train a virtual cell, you’re taking existing public single-cell RNA-seq data, you’re applying this geometric Plackett-Luce thing to all the data, and then you’re doing pre-training on that. And I guess my question is just, tell me about the scaling laws that you’re seeing. Does it seem like biology has enough data? Or do we actually need more data of this kind? What are the existing flaws in the amount of data we have, and the structure of that data, that you’re observing?

Adam Green (36:12) So if you go back a couple of years, when I started training these models—I don’t have a background in ML or biology. I tried to figure out how to do this, I cracked open the PyTorch textbook, I’m hand-coding reshapes on tensors. You just try the naive thing of, maybe let’s predict raw counts, or maybe let’s do this ranking thing, but in a much more naive way of just predicting the names of the genes. And at that point in time, it wasn’t clear if these scaling laws would emerge. And so what has transpired—and again, all credit to Glenn, who has figured this out—is that once you find the right loss objective to match the data, they do appear to obey clear scaling laws. And there is plenty of juice to be squeezed out of these observational data. They learn general-purpose representations that transfer to downstream tasks. And this is—again, to map back to the NLP space—GPT-1 showed that in principle you can train a model on a bunch of these data and it transfers to these downstream tasks. What GPT-2 then showed is that as you scale up the amount of pre-training and the size of the model—the more knobs that are available to learn—the model gets better at these downstream tasks. So we looked at this for our model as we scaled it up to a billion parameters plus, and we said, how does it do on downstream tasks like perturbation prediction, which is really the motivating task that most people think about when they think about virtual cells? And what we found, quite excitingly, is that as you scale the model up and it saw more and more observational data, then you fine-tune a tiny bit of perturbational data, and then you evaluate it on perturbations it has not seen in a given cell type—it gets monotonically better at that task. So much so that it beats the current state-of-the-art model that was pre-trained on perturbational data with multiple injected knowledge sources, specifically for the task of perturbation prediction.

Niko McCarty (38:27) So to clarify: when you were initially training virtual cell models, you tried to train them using just count data, you tried frequency data—and you’re saying that you did not see scaling laws with these other ways of treating the data?

Adam Green (38:45) Not as robust. And so in the paper we do a really clean ablation, where we say, let’s keep the compute budget the same, the data the same, but only vary the loss function. And then we train negative binomial models, which is a way of parameterizing the counts; geometric models, which is the simpler case of negative binomial proportions; mean squared error over the counts; and then geometric Plackett-Luce. And then you can say, given a fixed compute budget—number of gene tokens it’s seen—how well does the model then do on some held-out evaluation set? So I think it was healthy lung, or we had a couple of cell types we held out from pre-training. And then you ask, say, I show you fifty percent of the genes in the cell, predict the remaining fifty. So that’s a binary prediction task of, is it a one or is it a zero? We also looked at a ranking evaluation using Spearman loss. And what you see is that the geometric Plackett-Luce scales the best monotonically. It increases on the Spearman metric. It gets better and better as you scale it up with more pre-training. Surprisingly, the mean squared error and the proportional do pretty well. I wouldn’t have expected that beforehand, but they don’t scale as well as the geometric Plackett-Luce. And then I think the truly surprising fact for the field is that the models that did worse were those that explicitly parameterize the count distribution. So geometric collapsed mid-training at multiple scales. And then negative binomial, which provides more distributional flexibility—there are more knobs you can fit—collapsed even more catastrophically. It’s somewhat puzzling, because the field has been climbing this distributional complexity ladder of “let’s try to fit the integer count distributions better and better and better.” But our claim, what we think the paper shows, is that no—that is simply the wrong abstraction layer to operate at. You’re climbing a ladder on the wrong wall. In fact, the right objective is going to be fitting the ranking structure, not the raw counts. And when you do that in the right way, the model seems to scale. So that gave me conviction that there’s something there. Now, did we know earlier on? Yeah, we saw with earlier models and earlier ranking approaches that there’s some kind of scaling. But this is so robust, and scaled into the multi-billion-parameter range, that it led me to believe that this truly might be the final approach for this sort of noisy multi-set modeling.

Niko McCarty (41:23) So just to clarify, you’re saying the industry norm for training these models is still to include all the zeros that we see. So if an urn has twenty thousand balls and you only pulled out five thousand of them, when they do the pre-training, they keep the fifteen thousand zeros.

Adam Green (41:41) In most cases, yes. So if you’re predicting proportions, you’d say, I have gene A, I’m going to predict some proportion and then compute a loss. If it’s a zero, compute a loss against zero. Or if I have some kind of parameterized count model like geometric, I’m going to predict my single parameter for the geometric, and then I say, what is the likelihood of observing a zero given this distribution—and that goes into your loss. But yes, no one has taken ranking seriously and actually extended it to a full closed-form likelihood. Geneformer was a preliminary attempt, our earlier models were a preliminary attempt, but this is the first time it has been done performantly at scale and shown to work.

Niko McCarty (42:27) Okay. So you put out this white paper, basically saying, here’s how we should treat our data using geometric Plackett-Luce. Do you have any evidence that the other people in this space are now adopting that to train their own models? Or is Markov kind of doing it on its own?

Adam Green (42:47) As far as I know, we’re doing it on our own. The implementation isn’t easy. We try to provide some guidance in the paper—it’s quite a long supplement. But I don’t think it’s been adopted.

Niko McCarty (42:58) Presumably people are taking what you say seriously. When you put out this white paper, you have evidence that people in the field are reading it and talking about it.

Adam Green (43:07) Yeah, I know the paper’s being read. I think some interpret the views expressed as hostile to the perturbation prediction crowd, but I think they’re complementary. But—

Niko McCarty (43:25) The only thing you’re really changing that is presumably antagonistic to the existing field is, you’re saying we should pre-train on just observational data, and then fine-tune on the perturbation data if you want your cell to do perturbation prediction. But presumably people should not take exception to the geometric Plackett-Luce stuff, because that’s just how you treat the data, right?

Adam Green (43:49) Yeah, I don’t think people take issue with the loss function. But on the perturbational data—and again, these issues are kind of related, in that what motivated collecting all this perturbational data was, in part, the assumption that training observational models will not work. And so if you show that, no, in fact there’s signal in these data that you can extract, that slightly impugns the motivations for perturbational data, saying maybe we don’t need to be allocating so much money to generating these data. And there’s a lot riding on them. Nonprofits are putting hundreds of millions of dollars into generating these data. There are startups that have raised a billion dollars-plus and built bespoke models for these data.

Niko McCarty (44:37) And tell me what their thesis is. We don’t have to name anybody, but the field is collecting perturbation data because—the way I understand it—they think that drugs, for example, are perturbations. So if we could figure out how a drug changes the transcriptome of a cell, maybe we could figure out ways to change the state of that cell back into a healthy condition. And so their assumption must be that in pre-training on perturbation data, we’ll be able to make drugs better. In silico, we’ll be able to predict which drugs are more likely to push a target cell into a particular state. Is that kind of the thesis of these players?

Adam Green (45:28) Presumably. And it gets to this specimen-versus-simulator dichotomy. But I think where that point of view—that if we had a better simulator it would accelerate drug discovery—goes wrong is when you actually play this out and you think about what leads to a drug candidate getting sent down the pipeline and succeeding. What are the actual decisions being made? And is experimental velocity the rate-limiter currently? Or is it something more like understanding, or how quickly you can use experimentation in the service of navigating through drug space? A lot of these things are blended together. And again, it’s hard to figure out people’s real motivations. They appear to be of the belief that more experimentation will lead to better drugs faster. I am of the belief that—

Niko McCarty (46:08) It—

Adam Green (46:23) —you could hand me a black-box simulator of “here’s what would happen if you take this cell state and you perturb it with this drug,” and it wouldn’t meaningfully move the needle on actual drug success rates currently.

Niko McCarty (46:35) What do you mean by “search the space”?

Adam Green (46:37) Yeah. Imagine I have all possible cell states—thousands, tens of thousands of different cell types and different microstates—and I apply, of the space of all possible small molecules, the set that are drug-like, and I can predict what would happen. Does that actually get you to drugs that make patients healthier? Well, if you can only model a single cell, probably not. If you want to extrapolate to larger and larger systems, you need some notion of mechanism. So I think the era of virtual cells as simulators will be short-lived, and probably not contribute all that much to downstream drug R&D productivity.

Niko McCarty (47:23) So your thesis, just to clarify one final time, is that by pre-training on just observational data and then building sparse autoencoders on top of it—which I’ll ask you about—we’ll be able to build up a deeper understanding of cells. So they won’t only be simulators of drug perturbations; they’ll actually be world models, or cell models—so not simulators, but what you’re calling specimens, almost like a model organism in silico. So the end goal of what you’re doing is not necessarily to make better drugs—that’s a possible outcome if it works. It’s more to understand the mechanism of how cells work.

Adam Green (48:04) Yeah, I think long term the ambition is to solve biology. Short term, make useful world models of biology that can be applied to everything from discovering better drug targets, figuring out the mechanism of action of existing drugs, patient stratification, biomarker selection. But the core philosophical belief undergirding specimens—and this is going to sound kind of highfalutin or philosophical—is that cells are biological agents. They are embedded in their environment, and they’re each doing some kind of task. This is encoded in the distribution of molecules inside them. So you’ve got a neighbor, he’s butting up against you, you get sent a paracrine signal, you send one back. But these are fundamentally agentic systems—not in the sense of reasoning agents, but technically agents that are trying to minimize free energy. So we can talk about the free energy principle. But they themselves have some model of the world encoded in the distribution of molecules. And if you can recover, using machine learning, a model of these molecules, you can in some sense recover what the cell believes, has hopes, dreams, aspirations—maybe. Probably not. And then with that, if you are able to dissect that black-box model, that’s a proxy for the cell’s internal model of its environment. Then you can figure out the best ways to poke and prod it to get it to behave how you want.

Niko McCarty (49:36) So I think what would be helpful is if you just walked me through the full steps by which you train your virtual cell. Where does the data come from? How do you treat the data, and so on from there?

Adam Green (49:48) Yeah. So back when I started training these models, and other people started training them, like 2023, data is extremely messy. Luckily, now there are these massive standardized datasets like scBaseCount from the Arc Institute, CELLxGENE from the Biohub, that give you pretty easy access to these data. So you get these data, and your individual data object is a cell and an associated set of counts for all the genes in the vocabulary. And then you have some metadata, maybe—the patient had this disease, this is their sex, this is the tissue of origin, maybe cell type labels. And then to do the actual generative ranking under the hood, it’s just a standard transformer architecture. And then the question is, how do you feed in the data? And how do you loss it? What is the way you extract the signal from it? So the basic setup is: I have a cell, I’m going to look at its genes, I’m going to randomly permute them. Let’s say I’m going to randomly permute the expressed genes—the genes that have counts of one or greater—and I’m going to show you one of them. I’m going to tell you its rank. And then I want the model to predict the rankings of all the other genes, given this first gene. And so you can actually set this up in an autoregressive manner to make it nice and fast. And then after I see that, I make my prediction. And then I calculate my loss using this geometric Plackett-Luce objective: given the strength parameters, the thetas predicted by my model, what is the likelihood of observing the actual gene expression ranking? And then you reveal another random gene, you do that, and then you do that successively across hundreds of millions of cells. And in so doing, the model learns a surprising amount.

Niko McCarty (51:41) And how do you do the fine-tuning? I guess it depends on what you want to achieve, right?

Adam Green (51:45) The only fine-tuning task we look at—you have this model, it’s learned representations, you can do a lot of different things with them. Perturbation prediction, as I said, is the main task the field has concerned itself with. It’s the one they really care about for benchmarking. Personally, I don’t care that much about perturbation prediction, but if you want to show your model’s the best, you have to do it. So what you do—people abuse language here, they mean lots of different things by what is out of distribution, what is in distribution—but the basic task for cross-cell transfer is saying, in the Replogle dataset, I have four cell types. Okay, I’ve seen roughly 2,000 perturbations in each cell type. And then per perturbation, per cell type, I’ve seen 100 of that—I’ve seen 100 cells, they’re HepG2, they’ve been perturbed with gene X, knocked it down. Now I’m going to show my model that. So I’m going to show it knockdown of gene X in HepG2, another cell type, another cell type. And then I’m going to ask you to predict what would happen if you knock down this gene in a cell type where you haven’t seen that perturbation. And so the basic fine-tuning, in this case—we want to go back to counts so we can compare our model against these other models on the same sorts of metrics. So you just take a new head at the end of the model, and instead of trying to predict ranks, it just predicts counts. And so you fine-tune it on these data, and then you evaluate it on the held-out perturbations in the cell type you’re evaluating on, then see how that performance improves as you take bigger and bigger models and do this fine-tuning.

Niko McCarty (53:19) And so what is the current state of the art on perturbation prediction with a virtual cell? And you’re saying that you actually beat that with the new Markov model?

Adam Green (53:28) The previous state of the art—it came out a couple of months ago on this particular Replogle benchmark—was Excel, from Xaira. I mentioned this model before, but it’s quite cool. They tried to scale the model up. It’s a really cutting-edge diffusion architecture. It does have these injected knowledge sources; they pre-trained on perturbation data. But when evaluated on this exact task—we matched the way they set the task up and the way they evaluated other models—we found that on the metric of mean absolute error, which is the most basic metric you can think of (what is your average L1 error in predicting gene expression in the held-out cell-by-perturbation pairs), our model beat state of the art.

Niko McCarty (54:19) And so you’re training this model solely on single-cell RNA-seq data, but it’s learning things about the cell that are not RNA-based, right? It’s learning more than just gene expression data. So tell me about that. What is the model actually learning from the single-cell RNA-seq data?

Adam Green (54:39) Yeah. This is, I think, the most exciting thing, and really the driving motivation for me since I first started believing in this kind of agentic view of biological systems—that they themselves are modeling their environment, and if you can learn a model of that, you’ll learn something about what they are computing internally. So first we just looked at basic regulatory genomics. This has been done before. You can collect data on, say, the binding of a transcription factor—a specialized protein—upstream of a gene on the DNA. Or you can say, if I look at the DNA and I look at these flanking regions around genes, do I see certain motifs that match the transcription factors? There are nucleotide patterns that are predictive of whether a transcription factor will bind to that region, and therefore whether that transcription factor will regulate that gene. So we looked at the ENCODE dataset, which is pretty old—it came out in 2012, I think—which collects some of these functional genomic readouts. And the simplest task you can look at is: I have a set of transcription factors, I have a set of target genes; which of the transcription factors actually regulate which genes, as indicated by binding of the transcription factor to regions around that gene? So we evaluated our model on this. Using a really naive method, you just take the embeddings, the learned embeddings of the genes, you do a cosine similarity between them—between the transcription factor and the target gene—and then you do an AUROC, trying to find the threshold that separates the target genes versus the non-target genes. And surprisingly, our model did decently on this. It got the highest AUROC out of any of the models we tested. This is nowhere near, I should say, the performance of models built specifically for this task. And if you talk to the functional genomics people, they’ll say that’s not impressive, 0.57. But to my mind, it’s the proof of principle that a model never trained on this task was capable of doing it—that is most exciting.

Niko McCarty (57:02) But your thesis presumably is that with more scaling—more compute, more data—the AUROC on these tasks that are not, like, regulatory tasks will just continue to improve. But nobody has tested that yet? Are you seeing that these improve with scaling laws? Are you getting better and better at these other tasks?

Adam Green (57:21) Yeah, where this departs from the perturbation prediction stuff—where we see really clear monotonic scaling—is that when we’re probing the model here, we’re just looking at the earliest layer, the embedding layer. What is the general representation of each gene? And surprisingly, you do a simple cosine similarity—do these vectors point in the same direction?—and that is predictive of this. But as you scale the models up, I was somewhat surprised to see this—though there’s precedent in NLP—it seems like that approach does worse and worse. Just looking at the embeddings—the knowledge seems to migrate into further layers of the model.

Niko McCarty (57:56) So you’ve written in prior essays about this idea of biocompute. My question is, are we biocompute-limited or data-limited? What do we actually need to build better virtual cell representations? Is it just that we need to start integrating multimodal data? What is your thesis about the new frontier of virtual cell models?

Adam Green (58:18) Yeah, my somewhat contrarian opinion is that we are, on the current margin, limited by engineering and compute, and there’s tons of juice we can squeeze out of these models if you train them the right way.

Niko McCarty (58:33) You mean with the existing data?

Adam Green (58:34) With the existing data. And there are billions of cells coming online by the end of 2026. I don’t think we’re lacking for data anytime soon. Now, the space of questions you can actually ask with a virtual cell model is a very small part of the space of broader questions about biology we want to ask, or that are clinically relevant. But what I think the ADC biomarker nomination—and using our model to predict how TROP2 is internalized and trafficked—led me to believe is that if you circumscribe the problem cleanly enough and you make certain assumptions, and you try to operate in a regime where you can parcel out the effects of certain variables, then there are many clinically relevant questions you can ask and get interesting answers to, with current models and current data budgets.

Niko McCarty (59:26) And that’s a good segue. Let’s talk about the antibody-drug conjugate stuff. To clarify: you built a model, you pre-trained on just observational single-cell RNA-seq. And then—I don’t know how you did the fine-tuning, if any—you made these predictions. Tell me about that.

Adam Green (59:46) Yeah, no fine-tuning. So the setup was, we were looking at: does our model know where drug targets localize in the cell? Does it know the functional coupling of different signaling molecules? And then we just asked—I mean, the original motivation of this was commercial. I think pharma partners would care about this. They don’t just want to see a cool toy. I think they’d care about antibody-drug conjugates. I didn’t know what those were; I had to look it up about a month ago. And so I found—okay, there are a lot of drugs targeting these things, lots of clinical trials. Surely we know how they work, right? So, to rehash, what is the basic process? You have an antibody, it’s coupled with a linker—a chemical tether—to a payload, or multiple of that payload, and then the goal is for the antibody to bind a receptor on the surface of the tumor. It gets internalized, endocytosed. And then, dot dot dot, endosomal biology makes its way to deliver the payload, in most cases to the lysosome. And so I looked at TROP2. I was like, okay, I didn’t find any literature explaining how it worked. For HER2, we know how it works pretty well—it’s clathrin-mediated endocytosis. Sorry—

Niko McCarty (1:01:07) Just to clarify, what is its role in a cancer cell, for example?

Adam Green (1:01:11) So TROP2 is—you could call it an adhesion molecule. It’s kind of a receptor. It shows up on a lot of these epithelial tumors—lung, bladder, breast—and it’s a desirable drug target because it’s pretty specific to tumors. If you deliver a payload to a specific cell type, it’s good to choose a target that’s specific to that cell type, expressed at sufficient density, shows up on the surface.

Niko McCarty (1:01:38) So these companies were building antibodies that bind to TROP2, and they’re carrying a payload that then kills those cells right after they get internalized. Okay. So you’re interrogating what about TROP2?

Adam Green (1:01:50) Yeah. So you can take TROP2 and you can say, which other genes in my model are functionally coupled to it—using this really simple method of, what is their cosine similarity? And again, the motivation was, I looked in the literature, I’m like, surely people knew how this works. How does this internalize? Apparently they didn’t. You look at the slides from the pharma companies, like AstraZeneca and Gilead, who are building incredible drugs here targeting this. Do they know how this works? You look at the slides, and it’s an empty box labeled “internalization” and “delivery to the lysosome.” And I’m like, that’s curious. Okay, so what does our model have to say about it? You look at its functional neighbors, and the ones that appear to show up are related to what you might call lipid rafts, or even caveolar—that’s not quite the right word—but the neighbors appear to be things on the surface of the cell that are in these small domains called tetraspanin microdomains. And in particular, there is this tetraspanin, this special type of glycoprotein I think, TM4SF1, that showed up. We said, okay, that’s interesting. What is this protein doing there? So you look at what this protein is, and it turns out it was discovered 40 years earlier in the same sorts of epithelial tumors. The literature shows this protein plays a role in organizing these little vesicles that internalize in the cell. So it’s like a little organizational, structural protein that shepherds cargo inside. So our model said this is the top tetraspanin. That’s one line of evidence, right?

Niko McCarty (1:03:47) My question—and this is a detour, more about the business case—but if these big companies like AstraZeneca have already taken these antibody-drug conjugates to phase three trials—

Adam Green (1:03:58) Some are approved.

Niko McCarty —okay, some are approved. Why is it important that they know about the tetraspanin that associates with TROP2?

Adam Green (1:04:08) The motivating mystery is that you have a few of these TROP2-targeting ADCs. You have Gilead’s Trodelvy—and these are just the brand names—AstraZeneca’s Datroway, and I believe Merck has one as well; it might be approved. But what you see is a fascinating clinical double dissociation. So in the case of TROP2 ADCs, you see something quite interesting. If you restrict what you’re looking at to breast cancer, and you look at HR-positive breast cancer—which is the most common type of breast cancer, it has hormone receptors—versus triple-negative breast cancer, highly malignant basal-type cancer, and you look at the trials for these drugs and you try to match along lines of therapy, are they immunotherapy-eligible: you see that Trodelvy, Gilead’s drug, and Datroway, AstraZeneca’s drug, both do decently well in triple-negative breast cancer in terms of overall survival rate hazard ratios. But in the HR-positive cancer, Gilead’s drug does really well, and AstraZeneca’s drug doesn’t seem to do very well at all. So they’re targeting the same TROP2 on basically the same cancers—you can look at the trials and try to match for which line of therapy, the other characteristics of the patients. The question is, why? And then you look at the drugs and you say, how do they differ? And the main way they differ is in their chemistry, and the downstream pharmacokinetics.

Niko McCarty (1:05:41) And so you’re saying one of these drugs is not doing well and one is doing well, and your model can figure out maybe why that is?

Adam Green (1:05:49) Yeah. One of the drugs is doing well in certain patients. The other drug is doing well in two types of patients. But the trade-off you have is efficacy versus toxicity. So Gilead’s drug is highly labile. It is this Goldilocks linker that releases its payload in plasma—it’ll do it once it gets into early endosomes, it’ll definitely do it if it gets into late endosomes and into the lysosome. I’m reading this, I’m trying to say, does our model explain this in some way? What is the difference between HR-positive cancer, where Datroway doesn’t seem to do well but Gilead’s highly labile drug does, versus triple-negative breast cancer, where both seem to do pretty well? And then we asked our model about that.

Niko McCarty (1:06:36) The question is, can your model actually say something about how we should alter the drug to make it work against this other class of cancer? Are you making concrete predictions about the strength of the bond in that linker and its efficacy against different types of tumors?

Adam Green (1:06:55) I think our model is suggestive toward that. The thing I’d focus on first, and what we focus on in the piece, is not how should we alter the linker chemistry, but: can we find a biomarker that predicts whether a patient will be responsive to one drug or the other? Because there is an efficacy-toxicity trade-off here. Trodelvy has a high rate of toxicity events. It is a brutal drug to take. Datroway is more of a scalpel. When it hits, it hits very hard, but you’ve got to find the patients where that happens.

Niko McCarty (1:07:27) And so your thesis is that this biomarker might be this tetraspanin associated with TROP2?

Adam Green (1:07:33) I’m not committed to TM4SF1 as the biomarker that is going to explain all TROP2 ADC efficacy. I think what I was trying to do with the piece is say that virtual cells, biological world models, can be treated as specimens and then make predictions about biological wiring with clinical implications. TM4SF1 is part of that. It’s part of the internalization equation. What the pharmacokinetics get at in the linker chemistry is the downstream trafficking. So imagine the plasma membrane is like a border. Maybe you have someone who shepherds you across the border. You get inside. Okay, well, there are a lot of places you could go. Where do you end up? If you look at the tumors in which these drugs are being applied, many of them are epithelial tumors in apically polarized cells. All of them express this other protein at high levels that our model nominated, RAB25. And so this is the second axis of ADC biomarkers, which is trafficking. So once you get past the border, the question is, where do you end up? This is endosomal biology. This is really well-trodden territory. We know a lot about how this works. And the basic idea is, you have this vesicle that comes inside, and it needs to make a series of membrane fusions with endosomes—which just means “soma inside the cell,” like inner body—to eventually make its way to the lysosome. RAB25 is, I believe, a GTPase, this kind of marker that tells where the initial vesicle should go—which kind of endosome you should end up in.

Niko McCarty (1:09:32) Okay. You’ve nominated a target. You’ve shown that your virtual cell can do this sort of thing. Pharma companies might care about that in terms of investigating mechanisms of which populations are treatable by their—

Adam Green (1:09:47) Yeah, maybe I’ll ground it like this. When you look at a big patient breast cell atlas and you plot the different subtypes of cancer along these two axes, you look at expression of TM4SF1, which we argued is this organizer protein that helps shepherd TROP2, the ADC target, into the cell. And then you look at RAB25, which I’m arguing is a marker of: are you going to get to the lysosome, or are you going to get recycled back to the surface? You find that HR-positive versus triple-negative breast cancer patients are in opposite quadrants. HR-positive patients tend to be low TM4SF1, they tend to be high RAB25—there is high recycling going on. If you look at triple-negative breast cancer patients, they are low on RAB25, they are high on TM4SF1. They have a lot of the organizing protein to shepherd the ADC into the cell, and not a lot of the marker that tells it to recycle back out. So presumably they’re getting more directly routed to the lysosome. We believe this explains the clinical dissociation you see between Gilead’s drug and AstraZeneca’s drug. And our claim is that there are tons of trials that are failing. We think there are subsets of patients in these populations who will be responsive to these drugs, because they are, say, lower in RAB25, higher in the tetraspanin required to actually internalize the drug.

Niko McCarty (1:11:16) Okay, so I want to zoom out a little bit. Clearly there’s a business case here, that these virtual cell models might be useful tools for digging into mechanisms of biology, or at least proposing hypotheses of mechanistic interrogations. You’re proposing actual experiments that people can do, related to how these antibodies get internalized. But my understanding about Markov, and your motivation—and you keep using this term “specimen”—is that you seem to have this thesis that virtual cell models, these agentic models, are the way by which we will understand all of biology. The goal is to solve biology. Whereas a lot of classical biophysicists, and the history of molecular biology, has been that the best way to build understanding is to make often reductionist observations and then try to piece together our reductionist observations into higher principles. So there are kind of two approaches people are taking to build virtual cell models. What I would call the top-down—what you’re doing—which is to say, let’s train these large statistical models and then interrogate them to understand what they’re doing. And then the other approach, like by Marcus Covert at Stanford, which is, let’s understand how the cell works, model it as a series of equations. So my question is, do you think that approach is wrong? Do you think it will be impossible to understand biology from bottom-up, reductionist observations?

Adam Green (1:13:03) Yeah, it’s a good question, and the one that’s motivated me for the past five years. Short answer, I think it’s going to be a hybrid between the two. The journey I’ve taken has been this kind of thesis, antithesis, synthesis toward: how do you extract mechanism from these black-box top-down models? Initially, when I got into the field and knew nothing about it, I became pretty skeptical of what was going on—just the rate of progress in the field, in this kind of infinite treadmill of “if we keep collecting data and recovering the mechanistic wiring of how these systems work, eventually that’s going to translate to real progress.” But you look at, for instance, what Francis Collins said thirty years ago and what he said ten years ago, and—

Niko McCarty (1:13:35) What do you mean by that?

Adam Green (1:14:03) —the field just never delivered on the progress it promised. And so I was somewhat puzzled by this.

Niko McCarty (1:14:09) Despite more and more data, right? Tons and tons of data, but you’re saying not necessarily a deeper understanding. And Eroom’s law continues, and most drugs fail in clinical trials.

Adam Green (1:14:20) Yeah. And the question was, if you only care about controlling biological systems toward salutary ends—like making people healthy—do you need understanding, or is some kind of black-box control sufficient? And so back in 2022, after reading hundreds of papers getting up to speed in biology, I came to the conclusion that the current approach is not going to work. It is not bearing the fruit we were promised. We need to go completely in the ML scaling direction. I became very bitter-lesson-pilled, and rejected what I called the mechanistic mind, which is this research ethos that pervades how most people think about biology—saying we want to carve nature at the joints, understand necessity, sufficiency. There are ideas like causality, if you believe in that. And people claimed that the more and more mechanistic understanding we had, eventually it would help us design better drugs and reduce cancer mortality rates. On the opposite pole is this black-box, “just throw more data at it,” top-down, as you called it, approach. I had not fleshed that out. But in pursuing this idea and learning more about how drug discovery and development is actually done, I’m like, okay, we have a lot of data, the models are decent, but I don’t know—maybe mechanism matters. You don’t just want a function that takes in patient state and desired resulting state and spits out a drug. For a lot of reasons, but primarily because drug discovery and development is iterative. You have to convince other people—like the FDA—of how these drugs work. And critically, we did not have such black-box models that bat out perfect drug candidates. So I’m like, okay, how do I reconcile that with my commitment to black-box, top-down machine learning models? And I came across mechanistic interpretability, which is this subfield of machine learning that says, we train a large language model, it learns to do interesting stuff—how does it do that? Can you actually crack open the black box, shine a light on it, and figure out the internal circuits it is using to compute the answer to a question? And when the ML people did this a few years ago, they found interesting concepts—like the Golden Gate Bridge is a concept inside the mind of one of these LLMs. And I thought, okay, what if we apply that to our model? And so we did that in late 2024. We applied this technique called sparse autoencoders, which says, when the model’s trying to predict gene expression, it’s moving around all these vectors of numbers—can you find a dictionary of features, or directions in space, that explain all these different vectors you see? And when we did that on one of these earlier models, it discovered what were plausible biological features related to gene regulatory modules, or, like I mentioned before, plasma cell differentiation state. And it even seemed to know some things about what transcription factors were predictive of one differentiation trajectory or the other. So that was pretty suggestive, I think—that this specimen approach, that these models recover something about biology, and if you know how to extract it from the black box, you can get a lot out of these systems. Now, where does that leave us? Are we going to—is that going to be the solution, just purely top-down probing these models? Or is there going to be some kind of complementary bottom-up approach?

Niko McCarty (1:18:01) What are the classes of things where you think we need the bottom-up approaches?

Adam Green (1:18:07) Yeah, I’d say it’s more, how do you combine priors from bottom-up approaches, or other models, with these top-down approaches?

Niko McCarty (1:18:14) So it’s not purely unsupervised. The winning solution would be to have an unsupervised—well, I don’t know the terms, but you’re saying in some way we need to feed in our priors.

Adam Green (1:18:25) I’d say you want to feed in information about other modalities, like maybe functional genomics, protein structural priors, and combine them with these predictions or priors you get from the black-box world models—to, again, point your evidentiary apparatus toward the interesting parts of biological space to run the experiments. The metaphor I like is: you’re in a fishing boat. One way to fish is to trawl a big net and catch everything and brute-force the space. Another way is to use sonar or whatever and try to find the interesting spots where the fish are swimming, and then go spearfishing there.

Niko McCarty (1:19:06) Just to try to summarize what you’re saying: we might have these black-box models that make initial, weak predictions, that we then guide using known biophysics or known biochemistry to refine the predictions. And that might be the way we make discoveries in the future. Is that kind of what you’re saying?

Adam Green (1:19:29) I think it’s the near-term approach. But in the data and compute limit, the biological world models become arbitrarily accurate representations of the true underlying biology, and it’s going to be one model to rule them all.

Niko McCarty (1:19:45) A biological world model. And then we’re going to have smaller models that complement this, right? Like we’ll have AlphaFold, we’ll have these other things that augment the predictions of the world model.

Adam Green (1:19:57) Probably not. No. My prediction would be there’s going to be a singular, unified model—we can debate what language it’s going to be trained in—that is going to be able to read out these different modalities we might care about. Like we showed in our paper, we train on mRNA, it learned something about protein, it learned something about DNA motif enrichment. This is going to be the generator of biological hypotheses. Maybe downstream, if you’re trying to select which of the possible experiments to run, you can bring to bear these other models about structural biology and get a different kind of prior. But I think the primary specimen that is going to be delivering not only new clinical hypotheses, but also basic biology knowledge, is going to be a unified biological world model.

Niko McCarty (1:20:50) Okay. And so you think the mechanistic guidance, your bottom-up—initially when I asked this question, you said you think it’ll be both, top-down and bottom-up. But you’re talking about bottom-up approaches of a specific kind. These are very narrow bottom-up tools, right? Like structural prediction tools. Are you against the vision of modeling an entire cell from the bottom up using probabilistic equations and hundreds of differential equations? You’re kind of saying that you think that won’t work out?

Adam Green (1:21:30) I just don’t think it’s useful. And I don’t think structural models or protein sequence models are bottom-up in this sense.

Niko McCarty (1:21:37) But why is the mechanistic model not useful?

Adam Green (1:21:40) Yeah, I think we’re talking past each other.

Niko McCarty (1:21:44) Like what Marcus Covert is doing, saying, I want to simulate a cell using mathematical equations. Is that kind of thing useful?

Adam Green (1:21:53) I think it’s a fun abstraction. Ontologically, what is going on at the lowest level in systems is this kind of bottom-up, right? Molecules collide, proteins fold. Sure, that’s happening. Can you design a system of differential equations that, if you run it forward in time, accurately predicts what a cell will do? Maybe. Would it be useful? How much compute does it require? Do you end up with a—

Niko McCarty (1:21:55) What do you mean by that?

Adam Green (1:22:23) —a one-to-one map of the territory? I don’t know. If you take that for, like, molecular dynamics simulation, sure, that’s probably pretty useful. Maybe there are some processes where it is impossible to develop a good machine learning model of the dynamics, and you just need to run it bottom-up. And if that system’s governed by a well-known set of equations, then just run it forward in time. We have a pretty good idea of how quantum mechanics works. I think biology at the cell level and above is so much more complex and contingent that it is not expressible in a set of differential equations. The dynamics of biology are expressible in a very, very large neural network. But even to capture something as simple as a conditional, or an AND gate, using differential equations—certain things are just beyond. You cannot express them in these terms. And so I am against mechanism insofar as it restricts the set of tools we use to try to express biology in. I’m not against mechanism per se. I think it’s important for making these low-inductive-bias, very general models useful—again, we don’t just want a black box, that’s not useful. But I think, and this is what I argued in the 2022 essay, human legibility—our ability to understand a system—is a hard constraint on our models. And I think it has been limiting us. It has gotten us quite a lot. We know a tremendous amount about the cell, a tremendous amount about biology, but to truly accelerate biomedical progress, we may need to discard these assumptions about legibility, and—man, I really want to know what’s going on inside it—in favor of something that is a bit more black-box.

Niko McCarty (1:24:08) Yeah. Or we only illuminate very specific parts at a given time, you’re saying. You know, I think the main motive of the mechanistic approach is that it guides experiments. You can understand your knowledge base; you can actually run a simulation with known parameters and known equations and see if it matches experiment. And then that tells you something about the next experiment you might have to do, or the next measurement you might have to take, to decrease the error between these two things. But presumably, if you have good ways to interrogate a black box using sparse autoencoders to guide those experiments, then it’s trying to do the same thing from a different approach.

Adam Green (1:24:54) Yeah. And then the question becomes, will our ability to hold all of this mech-interp knowledge derived from the black box in our heads at once reach its limit? I think we’re in this period of what I’d call liminal legibility, where we are going to crack open these systems of, say, a single cell, we’re going to discover—

Niko McCarty (1:25:08) And what do you think about that?

Adam Green (1:25:22) —laws of biological dynamics that make sense to us. We can map them to existing paradigms we have of, like, this is how functional genomics works, or this is how protein folding works. But eventually the complexity of the dynamics is going to exceed our capacity to hold them in our heads. And we’re probably going to be offloading a lot of this to AI agents—not models of biological agents, but reasoning agents, which are quite in vogue. And they will be the ones using the mech-interp toolkit on these models, being the AI scientist, if you will, doing the research. And so maybe this is three years, five years, but at some point we’ll just relinquish control and admit that our attempts to understand these biological systems were a stopgap, and they truly exceed our comprehension—especially as you go beyond the scale of the single cell, to bigger and bigger systems.

Niko McCarty (1:26:24) And your vision is that we’re going to have some kind of autonomous lab with agents, interrogating our black-box models and then designing experiments, in some kind of distant future?

Adam Green (1:26:35) Why not the not-so-distant future?

Niko McCarty (1:26:36) You’re just kind of agnostic about the experimental component, right?

Adam Green (1:26:40) I don’t know about agnostic. I think one interesting question is, how much can you get through just reasoning alone, searching the literature, versus how much do you need world models? If I’m trying to pick the next experiment to run, can I just run an AI scientist in a data center in Texas for long enough, and will it spit out the right answer that you then go validate in the lab? Or is it going to be more of this iterative process, where probably biological world models will play an important role? But verification is going to be necessary regardless.

Niko McCarty (1:27:20) I think another related question, on this issue of what is sufficient for understanding, would be to ask: if we had a magic tool—and some people are actually trying to build this tool—that we could put into a cell and it would measure everything inside of it. So it would sequence the genome and tell you everything about which genes are active, which RNA transcripts are present, which proteins and proteoforms are present. Imagine we could just quantify every molecule in a cell and its position. Would that be sufficient for understanding? If we had a magic molecular sensor, if we could read everything, is that sufficient to then understand the cell?

Adam Green (1:28:07) I don’t think to understand it, but to model it—given sufficient data and enough capacity to learn about the dynamics that govern those data, yeah. But again, I think imposing this requirement of understanding is too much, because it’s a hard inductive bias on what we expect stuff to look like. If you say understanding is cashing something out in the language of differential equations, then you’re going to capture a lot of it—gene A regulates gene B, I can model that—but there’s a lot of the structure that you’re going to lose. And so I think understanding is a bit too high of a bar to shoot for.

Niko McCarty (1:28:51) I see. But isn’t that your bar?

Adam Green (1:28:54) No, my bar is control.

Niko McCarty (1:28:55) So, predictability—yeah, what do you mean by that? What does control mean in this context?

Adam Green (1:29:00) I have a system, I want it to do something—how do you do that? Now, it might sound like I’m arguing for the simulator-based approach. But I’m not. I’m saying that understanding the dynamics of the system is instrumental to this purpose of control. But I’m not wedded to understanding. If you gave me a black-box model that spat out drug candidates that caused some desired shift in state—

Niko McCarty (1:29:09) Of perturbations.

Adam Green —I think that’d be pretty great.

Niko McCarty Well, yeah, I think that’s a good place to end. So thank you so much, Adam.

Adam Green (1:29:32) Yeah, it was fun. Thanks.

Why Are Cells Small?

Niko McCarty — Mon, 08 Jun 2026 18:46:49 GMT

I’m writing an interactive book about the cell. This is the latest installment. All essays are available at burrito.bio, and the website versions include interactive elements that do not display on Substack!

A human body is built from 30 trillion cells — excluding microbes — that each arise from a lone, fertilized egg. These cells come in a multiplicity of shapes and sizes, with internal volumes spanning five orders of magnitude. The smallest human cell, a sperm, has a volume of just 30 µm³, whereas an oocyte has a volume of 4,000,000 µm³, making it the largest cell in the human body.1

What accounts for this huge range? A simplistic answer is that evolution has made each cell the size best suited to its function. Maybe sperm are small because the body needs to make many of them, and tiny cells cost less energy to make. (Sperm consist of little more than DNA and a few mitochondria, which are necessary for providing energy to spin their whip-like tails.) By contrast, an oocyte needs massive reserves of mitochondria and nutrients to support early embryonic growth. In short, every cell is as large or small as it needs to be — within reason.

But we can derive far more satisfying answers from physics.

The first major limit on a cell’s size is its surface area-to-volume ratio. Assuming that a cell is roughly spherical in shape, its internal volume grows proportionally to the cube of its radius, whereas its surface area grows proportionally to the square of that radius. In other words, a cell’s volume grows much faster than its surface area.

This ratio has big consequences for cell survival. The cell’s membrane funnels nutrients into the cell and secretes waste. It’s also where the energy in a prokaryotic cell — like E. coli — gets made. If the interior grows too large relative to the membrane, the cell will not be able to produce enough energy or excrete waste quickly enough to maintain all the ‘stuff’ inside, and metabolism will slow down.

Subscribe now

A second constraint is diffusion, or the tendency for molecules to migrate from areas of high concentration to areas of lower concentration. This migration dictates how quickly enzymes find substrates, or how signaling molecules reach receptors, and how often ribosomes collide with messenger RNAs. Inside a cell, nearly everything happens by chance encounters amongst molecules! As a cell’s volume grows, though, the chance that these encounters will happen decreases (assuming the total numbers of molecules stay constant).

A molecule’s diffusion rate changes based on various factors. The cytoplasm is extremely crowded, for example, and so molecules spend lots of time ricocheting off obstacles, delaying their arrival at a distant location. Every protein in a cell collides with about 10 billion water molecules per second on average. The vast majority of proteins in a bacterium have diffusion coefficients of only 5 to 10 µm² per second (a measure of how quickly molecules spread through space). Some molecules also aggregate or stick to charged surfaces, further slowing their movement.2 In general, large molecules diffuse slower than small ones.

Metabolites in E. coli can diffuse from one side of the cell to the other in milliseconds, which means collisions — and cellular outcomes — happen quickly. A typical protein takes just 0.01 seconds to traverse a bacterium’s diameter (about 1 micrometer), but the same protein would take around four minutes to move one millimeter and more than six hours to move one centimeter. This is, in part, why cells are so tiny.

With these constraints in mind, we can begin to speculate as to why various cells are shaped the way they are.

Red blood cells are tiny and shaped like biconcave discs to aid with diffusion; by abandoning a spherical shape and evolving more toward a ‘donut,’ they increase their surface area without compromising their compact volume. This, in turn, enhances their ability to exchange oxygen with cells in the body. Their small size (just 8 micrometers across) also helps them move through narrow capillaries.

In contrast, oocytes can grow so large (around 100 micrometers in diameter), in part, because they are less metabolically active than other types of human cells — and thus don’t depend so much on random collisions. They stockpile nutrients during oogenesis to wait out fertilization. Eukaryotic cells also grow large, in general, because they’ve evolved compartmentalization; by modularizing specific functions into organelles, they bring molecules closer together to help get the job done.

Cell sizes are not fixed, however, even within a single species. Cells often swell as they increase their production of proteins and metabolites in preparation for division. This is in line with biology’s only rule: namely, there are exceptions to every rule!

Case in point: a giant bacterium called Thiomargarita magnifica can extend about one centimeter in length, so large that it can be seen by the naked eye. It does so by breaking the surface area-to-volume rule, filling between 65–80 percent of its internal volume with an empty vacuole. In other words, it pushes most of its molecules to the cell periphery, thus shortening diffusion distances.3

Thiomargarita magnifica is a bacterial species that can stretch about one centimeter in length, about 1,000-times longer than E. coli. These microbes are visible to the naked eye. Credit: Jean-Marie Volland

Bubble algae (aka Valonia ventricosa). Credit: Trident’s Cove

Despite their variety, these architectures still hinge on molecules bumping into each other, guided by the immutable laws of physics. Or, as D’Arcy Wentworth Thompson mused in On Growth and Form (1917), “The form of an object is a ‘diagram of forces.’” Cells bear witness to both internal and external forces; they are constrained by diffusion and shaped by the delicate trade-off between volume and surface area.

In other words, the egg is more than 100,000-times larger than the sperm that swims to it. But it’s difficult to wrap our heads around such large numbers, and so it’s often helpful to rescale entities into everyday objects that we can more easily imagine. If a sperm cell were ‘blown up’ to the size of a glass marble, for example, then an oocyte, scaled in the same way, would be roughly the size of a modern refrigerator.

Several experiments have shown that molecules move more slowly near the densely-packed nucleoid, for example, and that these location-dependent diffusion rates are affected by things like charges between proteins.

So-called ‘bubble algae,’ or sea grapes, do something similar. They are the largest single-celled organism, up to 10,000-times larger than an average microbe (and visible to the naked eye!) The bubble algae achieve this by filling up 95% of their internal volume with a vacuole. The vacuole is filled with big sugar chains, which the cell uses to repair damage on its cell extremities. Each cell also has dozens or hundreds of nuclei, pressed tightly against the walls.

Magnet-Controlled Medicines — Andrew York & Maria Ingaramo

Niko McCarty — Fri, 29 May 2026 17:36:56 GMT

Nonfiction Laboratories is building a technology called “magnetogenetics” that could be used to control proteins inside the body — such as antibodies or enzymes — using small magnets. In this episode, co-founder Maria Ingaramo and scientific advisor Andrew York explain how they engineered a protein, MagLOV, that responds strongly to magnetic fields, why most prior attempts in magnetobiology have failed to replicate, and how the mechanism of magnetically-controlled proteins actually works. They also get into the “dream” use cases, like cancer drugs that activate only at a tumor, which might have lower toxicity inside the body.

I’m really happy with how this episode turned out. I had help from a producer, Chris Gates, who set up the cameras and lighting. The video quality is much higher than my first podcast. I’m also getting better at building up context during the interview so that we reach, and deeply discuss, a singular thesis by the episode’s end. In this episode, the question I really wanted to answer was: “How do magnet-controlled molecules actually work (at the atomic level) and what, specifically, will it take to move them through clinical trials?” Please reply to this email to send comments or feedback.

This podcast is made possible by .

Watch on YouTube or listen on Spotify or Apple Podcasts by searching for “The New Biology”.

Check out the readings and notes for this episode on my website.

Timestamps

00:00 - Opening
00:54 — Introduction
01:35 — The dream
05:38 — Why magnets vs. light or ultrasound
10:05 — The physics
17:48 — On the name “magnetogenetics”
21:25 — Birds and cryptochromes
27:09 — Why is the field filled with so much junk?
29:51 — Adam Cohen’s molecule
33:24 — Markus Meister’s debunking
38:06 — The experiment
46:22 — Finding the LOV domain
54:11 — Singlets, triplets, and cysteine
56:54 — What the magnet is actually doing
1:05:13 — The conformational-change red herring
1:12:46 — The Quantum Biology Institute
1:19:31 — Founding Nonfiction Labs
1:24:38 — How to convince skeptical investors
1:29:39 — What a magnetogenetic medicine might look like
1:38:50 — First clinical indications
1:45:12 — The regulatory path
1:48:01 — What the field needs
1:54:30 — Appendix: Whiteboard lecture

Transcript

ANDREW: This is like a once-in-a-lifetime thing. This is the coolest science I’ve ever been associated with. Let’s take a big stinking swing at it. Scientists are professional beggars, so a means of doing something that matters so much that we might be decoupled from the begging cycle — we might become the people that get to decide what science happens.

ANDREW: We had a lot of discussions about, “Is it real?” The discussions about “Is it real?” stopped when we started showing people this figure. It is a photography time-lapse of a plate of E. coli. Maria has taken pictures of the plate in the fluorescence channel while I’m waving a magnet around underneath the plate. There’s this obvious, enormous effect. Prior to showing this picture to people, they want to debate with me whether or not magnetic effects are real. After I show them this picture, the discussion turns to practical details of how to make this into a technology. No further questions on veracity.

---

Introduction

NIKO: Today’s podcast is with Maria Ingaramo and Andrew York, who are two pioneers in the field of magnetogenetics — which essentially has this amazing promise of, what if we could control molecules inside the body using magnets placed outside the body? Both of them are affiliated with a company here in San Francisco called Nonfiction Laboratories. The thing I really want to get at in this conversation is what it will actually take to get magnetically responsive molecules into the clinic, and what are the bottlenecks to achieving that.

NIKO: So, Maria and Andrew, welcome.

MARIA: Thanks for having us.

ANDREW: Great to be here.

What magnetogenetics could mean for medicine

HOST: My first question is: if Nonfiction Labs were to invent a way to reliably engineer proteins to be responsive to magnetic fields — such that you could take this magnetically responsive module and attach it to anything, not just fluorescent proteins but antibodies and other types of molecules — what would that actually mean for medicine? How might it change the way we treat diseases?

MARIA: That’s a very good question. I think the dream of being able to control the activity of a drug in space, without taking other cues, would be fantastic. The example you gave with antibodies — the idea of having cancer drugs, which bring with them lots of side effects, and being able to minimize those by activating the drug just where you want it: that’s something we dream we can make possible.

Also, many drugs today are failing because of side effects and toxicity. Being able to rescue a lot of those would be valuable, because drug development is expensive. It would be nice to make drugs that currently aren’t working — because the patient cannot tolerate the side effects — into things that work.

HOST: So it’s not about just creating new types of drugs. It’s about: can we take drugs that failed clinically because of toxicity, make them magnetically responsive, and then somehow make the drug only active near a tumor, for example, but not toxic elsewhere?

MARIA: That is exactly the dream of Nonfiction Labs. It’s what we’re working towards.

ANDREW: The cancer drugs are the thing that occurs to us first. It’s the goal that’s quickest to explain. If I’m in an elevator with somebody, I lead with cancer drugs because everyone can imagine, “I’d want more medicine on the tumor, less side effects everywhere else.” The other applications might take a little longer to explain.

If you get an organ transplant: would you like the immune system suppressed in your whole body, or just near the organ? If you take a drug that has therapeutic activity through the whole body but damages one organ in particular — like if there’s a drug that damages your heart, or your liver — wouldn’t it be cool if we could take the edge off the drug’s potency just in the vulnerable organ and leave it potent everywhere else? Not that we’re working on small molecules or painkillers, but wouldn’t it be cool if Advil only worked in your head? I’d take more Advil if it didn’t damage the rest of me.

HOST: Do you have a sense of how many organ transplants fail because of — I know about xenotransplants, like there’s not a great track record of long-lived xenotransplants even when we take organs from humanized pigs. But regular transplants, I guess my biased sense would be that they’re quite successful.

ANDREW: And this is a fun one. So, taking successful therapies and just reducing the nasty side effects — whatever level of immunosuppression you require for any given transplantation, wouldn’t it be nice to have less? You see what I’m getting at. Whatever side effects you’re getting from your cancer drug — what if you got the same efficacy, less side effects? But I think it’s the rescue Maria was talking about that’s where the real chance to actually save some lives is: drugs that really could cure the disease, they just do something else that’s completely unacceptable. If we could rescue drugs out of the trash bin —

We’re basic scientists. A lot of where we’re coming from is a capability: what can we do with it? But the dream that we might actually save some lives — I’d like to do that before I die.

Magnets vs. light vs. ultrasound

HOST: Of course, you’re both working on magnetogenetics — I’m going to ask you about the name later, because I’m not sure it always makes sense. So you’re building magnetically responsive molecules. But there’s this bigger movement where people are making molecules that respond to ultrasound. Optogenetics is the classic that’s been around for 20 years. So we can use light, sound, magnetism — people are using electricity to control cells and tissues. What are the advantages of magnetogenetics, or magnets, that we don’t get with light or sound waves?

ANDREW: Okay, I’ll start with the easy one. As many people know — but I’ll pretend somebody needs to hear it — there is a powerful, mature toolbox of optogenetic proteins (I think I agree with you about the name when we get to that). Many cellular proteins have some clearly defined function; a scientist has gone to the trouble of engineering a modified version with a light-sensitive domain attached. When you shine light on the protein, you can turn the function up or down, on or off. Incredible tools for research — such a powerful tool for dissecting the function of a protein.

So optogenetics works in things that are transparent. Humans are opaque. If there was a disease where turning a protein on and off — and many drugs are made of proteins — turning a drug on and off in the human body in space and time was desirable, and the location in question was optically accessible (the back of your eye, or superficial in your skin), light can penetrate. Different people’s skin has different levels of opacity, so you have to carefully account for that.

HOST: People have built insulin sensors using light, right? For shallow applications.

ANDREW: Exactly. For shallow applications, there’s room for optogenetics to make powerful biotechnologies. But how light penetrates through tissue is a little unpredictable, a little out of control. Depending on whose tissue, how far it’ll penetrate. That’s the beautiful feature of magnets: they go through flesh like it’s not there. So that’s the obvious one.

The ultrasound one is a little more nuanced.

HOST: That’s what I was going to ask next. Ultrasound has lots of approved diagnostics, people use it for all kinds of stuff, it’s very safe. Does ultrasound have higher resolution than magnets? What are the trade-offs?

MARIA: I think ultrasound is amazing. By all means, let’s move the field forward in all directions. There are — this is not impossible to overcome — but at the moment the main problem with ultrasound, compared to our technology, is that we have a protein-based domain we can use to modulate the function of anything we want. In ultrasound, you have proteins that can respond to heat, for example, but that limits what you can couple to. There’s not a direct way to connect this domain to control the function of the protein it’s attached to. There’s beautiful work controlling transcription through heat-sensitive proteins, but all your cells respond to heat one way or another, while magnets are completely orthogonal.

The other way to couple it to the biology is by modulating ion channels, which is amazing for things like the brain, but that limits the functions you can control. With our system, we can take any protein — a structural protein, an enzyme — and change binding, change catalysis.

Another advantage: with focused ultrasound, you need to focus it, so covering a large area becomes a limiting factor. With magnets, we cannot focus magnetic fields, but we have a lot of control over the areas. We can do big areas, and we can also do areas of no magnets that are small.

ANDREW: This is cute. Just a little physics nerdery: Earnshaw’s theorem says there is no local maximum of a static magnetic field. That’s one of its consequences. The cool part is what it doesn’t say — there’s no local minimum. So we can’t focus a static magnetic field to make a single focus point the same way you could with ultrasound. Seems like a weakness, but — both Adam Cohen and Hunter Davis taught us this — you can put two north poles, north-to-north, and by symmetry right in the middle, no field. Everything cancels. As you move away from that point in any direction, you’re jacketed by a non-zero region of field. So you can have a point null surrounded by a volume of non-null. You can scan that null through a sample to 3D-define — to 3D-print — the region where you want your protein to function, if your protein functions in the absence of field.

So, much like focused ultrasound, we can pick out a point, we can scan around the point to hit a region, to paint a region. Unlike focused ultrasound, Maria’s technique can also just flood the area. If you want to activate a liter, a cubic inch, a cubic centimeter — you can do that with a magnet.

I had dinner with Mikhail Shapiro a couple months ago — fascinating dude, really enjoyed it — and it wasn’t like we were debating which technology was better. We were both nerds excited about what we could do with them. A static DC magnet does not require a battery. A static DC magnet is extremely cheap. Whereas if you needed a therapy that delivered focused ultrasound to one part of your tissue chronically for a week straight, I might prefer the magnet in that case.

MARIA: I’d also like to point out that the magnetic field strengths we’re using are actually quite convenient for therapeutics, because we’re in the millitesla range. A big magnet can reach through me with no problem. We do not need MRI levels of magnetic field, which would be a little more complicated.

HOST: What is millitesla range for context?

ANDREW: A strong handheld neodymium magnet can have a surface field of hundreds of millitesla. At a standoff distance, you can hit millitesla several inches from a decent magnet. If you’re trying to hit the whole torso, yeah, you’d be wearing a fairly elaborate device, but you wouldn’t be sitting in an instrument. There would be no liquid helium involved, no superconductors. Maybe try to stay home while you’re wearing one of the big boys. But for more localized therapies, the same magnet that holds your laptop shut — tape one of those to your chest, and that would define a region in the vicinity of the magnet.

MARIA: But it is also good that it is strong enough that you cannot activate it without really bringing a magnet close. Pretty much the requirements would be: if you’re wearing a pacemaker, you’ve seen all the signs — “Do not enter, be careful.” It would be the same type of precaution.

HOST: Just to build some mechanistic intuition for mechanosensitive proteins, or ultrasound- and magnetically-controlled proteins: the way ultrasound enters the body is via pressure waves passing through tissue. So the proteins you can open with that are usually mechanosensitive — ion channels that open when they’re tugged with pressure waves — or heat-sensitive. Focused ultrasound also heats locally, so people are making thermogenetic systems: you can put nanoparticles into the body carrying a payload, and they release the payload when stimulated with heat. So, just so I understand your argument: that sort of mechanism of what you can build at the molecular level to take advantage of local heating and pressure waves is more limiting than having a completely orthogonal channel, like a magnetically responsive domain where we could tune anything. That’s what you’re saying — for the sonogenetic stuff?

ANDREW: Something I want to emphasize: this is to my knowledge. We’re not specialists in sonogenetics, so if I’m speaking out of turn, I trust folks like Mikhail to correct me. But to my knowledge, for example, if you’re doing a heating-based sonogenetic perturbation — at 39 degrees Celsius, no effect; at 41 degrees Celsius, the transcription factor activates, your gene makes a lot of the protein you want; at 43 degrees Celsius, you damage the tissue, you can cook the cells. A plus-or-minus a few degrees range is a big range, but if your tissue — how well is blood circulating through that area? — if you get that a little wrong and you cook instead of activate (or do nothing), that is not a bio-orthogonal perturbation. That is a perturbation which is nuanced and related to the system in question.

Whereas the cool thing about magnets is they go through you like you’re not there. Predicting what perturbation you’re delivering at what position in the flesh is just an easier problem.

To emphasize the point Maria made earlier, the sonogenetic systems we’re familiar with are a heat-shock-protein for transcription changes in response to temperature, which is really cool. But it’s just transcription, and anything downstream of transcription is a slow process that has a lot of momentum to it. It’s not something you turn on and off in a second. The other one we’re familiar with is the channels that open in response to mechanical perturbations. That’s really stinking cool — being able to open a channel to influence whether or not a nerve is going to fire — but again, it’s limited to a channel.

The technology Maria has, when she’s saying she has this generic toolbox — there are a couple of different paths, but one of the most exciting is she has a luminescent protein. Think like a firefly — a protein that just glows from chemical energy — that turns down with a magnetic field. (At the moment, we hope eventually to have one that turns off; for now it turns down.) Take any optogenetic protein you’re interested in — the entire optogenetic toolbox, which includes enzymes, signaling molecules, cytoskeletal binding proteins that affect cellular motility, weird niche things — a huge variety. All of them that are optically responsive could be hooked, this is our vision, to this one protein she’s engineered. So essentially, 3D remote control of where blue light is expressed in the body.

HOST: So you’re using a magnet to turn on a light, which then turns on an optogenetic protein. It’s kind of like two layers of logic.

ANDREW: And there’s other ways to do it, but that’s a really promising one.

HOST: So you’re saying, even if you couldn’t make a directly responsive, magnetically controlled protein, there are ways to use magnetic control to turn on optogenetic proteins inside the body.

ANDREW: Which we can. The nanobodies that we’ve made are themselves responsive to magnetic fields, but they still need light. So one way or another, you put the light in, with either a luciferase that doesn’t respond to magnetic fields or a luciferase that responds to magnetic fields — and you get double the effect. You can stack the gates to get sharper control. If you have a 50% modulation on one and a 50% modulation on the other, put them together.

---

On the name “magnetogenetics”

HOST: We alluded earlier to this idea of the name “magnetogenetics.” I’m assuming the name came about historically because optogenetics is genetically encoded, optically responsive proteins. But of course, in the context of magnetogenetics, the thing that makes it so interesting is that the things you’re delivering don’t have to be genetically encoded. So is a better name just “force-controlled biology”? Why call it magnetogenetics?

MARIA: Right now it means something on its own that has diverged from where it originated. Back in the day, the first opto- demonstrations were with channels. And it’s very hard to deliver a channel and have it do its life.

HOST: You can’t inject a channelrhodopsin and hope it ends up in the right membrane.

MARIA: Exactly. You were required to genetically modify the organism in order to express the channel, for it to respond to light. Since then, there was gorgeous work by other groups where they looked to plants and found that plants have these domains that respond to light, and they’re very ubiquitous. Plants, thanks to the response of those domains, can grow towards light and control their metabolism — they’re involved in a variety of processes. The idea of taking those domains and, in a sense, getting them closer or attached to other proteins to control their function — it’s very analogous to what we’re calling today magnetogenetics. But the name for that was also optogenetics, even though that protein doesn’t necessarily have to be genetically encoded. They could be injected.

HOST: And what are those plant domains?

MARIA: Aha — those are, in a sense, the exact domain I’m using for engineering the magnetic-field response domain. So we started with these LOV domains — L-O-V, light-oxygen-voltage. They are used to build anything that is not a channel (although you can also make optogenetic channels by putting this domain in channels). But everything else that has been built in optogenetics has been built with LOV domains.

HOST: A lot of them.

MARIA: Yes. When we found that we could engineer magnetic-field response in the LOV domain — even if we saw it on other proteins — it seemed the right choice, because there’s already so much work that we can build on from optogenetics.

ANDREW: I was not optimistic when she started. Her initial discoveries were in jellyfish proteins, coral proteins, which are not the basis of optogenetic tools — optically responsive protein function. (Totally agree about the nomenclature, by the way.) So when she found that the LOV domain was also magnetoresponsive — which I didn’t expect, I was not optimistic about that — that was incredibly exciting, because it went from something that’s scientifically fascinating and maybe could be adapted into a technology, into something which is obviously central to a technology we have to pursue.

---

Origins: birds, cryptochromes, and a skeptical reading of the field

HOST: Before we go into the story of the discovery and the founding of Nonfiction Labs, I want to go way back in time to the origins of magnetic biology. My understanding is that the origins of the field were birds — basically zoologists or naturalists who would watch the migrations of birds and say, “How do birds know where to fly?” The Arctic tern flies thousands of miles every year. There are birds that fly over the Himalayas, where airplanes fly. People were studying this and saying, “What’s going on?” I’m wondering: have you studied the history of this field? What have you discovered about the origin?

ANDREW: Not as much as you might think. When it comes to really knowing anything about animal behaviors and certainly animal magneto-response, I’d say we’re humble civilians.

MARIA: That is true. However, I do have to say that I’m very thankful for all the work that has gone on characterizing mechanisms and the response of those proteins, especially because they are such tiny responses. Whether the cryptochromes are the protein that is thought to be —

HOST: Yeah, tell me about cryptochromes. When did people start to — where did cryptochromes come from? Were people, because they’re in the eyes of the birds, right — so people were dissecting bird eyes?

ANDREW: Four years prior to today, neither one of us knew cryptochromes from Adam. Not our field, not our specialty. We have read a certain amount of literature — enough to affirm some opinions and some suspicions, both about the technical realities of the field and about the social factors that led to those technical realities. But we can tell you, people have been reading about cryptochromes to some degree for a couple of years — that’s our level of expertise on cryptochromes. So with those error bars, hit it.

MARIA: Thank you. I do appreciate all the work that has gone in, because it kind of provides a framework through which to look at our own protein. There are a lot of studies on electron transfer and effects of magnetic fields on that that come from that literature. I sympathize with them because effects are very, very small. Whether it is the protein that is responsible for birds sensing magnetic fields, I don’t know, given the very small effect.

HOST: When you say small effect, what does that mean? How are people studying cryptochrome proteins, and what is the response being measured?

MARIA: Usually in terms of biochemistry — I’m a biochemist, I like to see responses that we can repeat in a tube — one of the best and most-used assays is transient-state absorption. It’s not necessarily just a change in absorption. I wish it was just a change in absorption. It’s an absorption that happens for a very short period of time after excitation. So you’re looking for a small change. You shoot with a pulse of light, you wait a certain amount of time — could be femtoseconds, could be milliseconds, could be minutes — you shoot with a second pulse of light that might be a different color, you check how much of that second pulse of light got absorbed and how did it change if I turned the first pulse of light on and off.

ANDREW: Transient absorption. I would call it a somewhat exotic measurement that does not necessarily measure a physiologically relevant quantity. It’s a great measurement. It’ll tell you a lot about a protein. It will not establish that a protein necessarily has a given biological function.

MARIA: When you see a 40% change, that’s not even a 40% change on the bulk of the protein. It’s a 40% change of that population that you had excited for a very short period of time.

HOST: But what does that have to do with magnetic fields?

ANDREW: It’s the only signal in many systems that didn’t totally ignore magnetic fields. So there was the hope — with all of the error bars I gave about my expertise — there was the hope, the observation, the belief that behavioral assays on birds suggested, how can the bird sense magnetic fields? It would be nice if we could connect that to a root mechanism: “This protein changes its behavior in this way.” A really beautiful assay would be like, “Ah, when I turn the field on and off, the phosphorylation of this protein changes by 50% in a super quantifiable way.”

I am not familiar with work that makes claims like that. We are familiar with work that says: when you turn the magnetic field on and off in a pretty big way, for a cryptochrome which is very cold, very acidic, in a weird reducing chemical environment that is not obviously physiologically relevant, you will get a tiny change in a tiny effect in an exotic measurement. That is the work I’m familiar with that shows cryptochrome does not entirely ignore magnetic fields.

HOST: Why do they study it in strange buffers?

ANDREW: When you read a paper, the paper doesn’t come right out and say it, but my suspicion — slash opinion — if you read between the lines: if it had worked at body temperature, if it had worked at physiological pH, if it had worked in physiological chemical conditions, that’s what they would have published. They don’t say they tried it and they failed, but you don’t do that stuff if it worked on easy mode. So my personal suspicion about a lot of the magneto-response field is that, whether the effects are real or not real, it’s really hard to study. And the people doing this work are heroic and working really hard.

HOST: It feels like there’s so many people who have studied this in such a rich history. Why would so many people get sucked into that if the effect is so minor — and if it doesn’t even seem like that’s the real mechanism?

ANDREW: So, before we knew anything about this, before we touched this in any way, I was asked, at a company we both worked at, to give an entertaining talk for everybody at the company — not just the scientists, the purchasing department, the security staff. Just tell a story that anybody can appreciate. I was handed an article from *Scientific American* that said, “This is how birds navigate.” I was expected to basically just read the article for a five-minute talk.

But I think one of our superpowers is — for better or worse — we don’t have a lot of respect for authority. So rather than simply recite the article as gospel truth, I figured I should at least read the source article. The source article was, I think, a *Nature* paper or a *Science* paper — in a reputable journal — that I was utterly unconvinced by. I thought, “Surely this isn’t the basis for this belief. Let me go look at their foundations.”

So I go to their list of citations and I go through looking for the smoking gun. It was like those dreams where you’re trying to throw a punch and your arms don’t work. You get to this layer of citations, I was like, “Well, none of these is obviously true. Let’s go to their layer of citations.” And you go all the way down to the bottom and find a single solid thing.

Now, perhaps because I’m a bad reader, perhaps because I’m not very perceptive, or perhaps because I didn’t find the right papers — we found one paper that was absolutely, definitely true. It had nothing to do with biology, nothing to do with biochemistry. It was a synthetic organic fluorophore. It was striking — seeing the strongest magneto-response that’s ever been published is 80%, clearly visible by eye, when a human hand just enters the frame of the video and holds a magnet up to a tube with a synthetic organic fluorophore, and it gets obviously about twice as bright. The second strongest effect is like 0.01% in a super exotic measurement with no obvious biological relevance. Just the staggering gap between the two claims was striking.

HOST: So the cryptochrome stuff wasn’t convincing, but you found stuff in organic chemistry that *was* convincing — which made you believe, presumably, “Oh, molecules in some contexts can be magnetically responsive.”

ANDREW: And luckily that context was fluorescence. And luckily Maria is a biochemist protein engineer who engineers biologically fluorescent molecules.

---

Adam Cohen’s molecule

HOST: Before we go to that — in your leap to say, “Oh, I wonder if this shows up in proteins” — what was the actual experiment that Adam Cohen did at Harvard that had this big effect? What was the experimental setup?

MARIA: So Adam made a molecule where he engineered, as well as he could, a molecule that would do electron transfer. He assumed that you could potentially control that with magnetic fields, and it worked. The only problem with that molecule is that it only works in a very specific solvent, and it’s a little finicky to maintain the magnetic-field effect, but it is very, very large. He was actually the first one to use that molecule and to make that magnetic null and be able to image through — like, a piece of glass that has the molecule in it.

HOST: What do you mean by this null imaging through glass?

MARIA: If we could have a molecule — let’s say it’s fluorescent — if you added a magnetic field, it would turn off. We know that if you put two magnets that are north-to-north, or south-to-south, then in the middle there’s going to be no magnetic field. And that determines where your signal can come out from. So if you have, let’s say, this fluorophore inside of a mouse, when the light tries to come out of the mouse, it gets distorted because of scattering.

ANDREW: I’ll show this demo. You’ve probably all seen this — light is quite good at passing through tissue. It is quite bad at passing through tissue in straight lines. So you know if light is generated inside the tissue, it’ll come out, but it’ll come out in a random direction at a random position. So the innovation in Adam’s case was: if you can also further define where it must have come from — because it will only glow at the magnetic null — here’s this point where, if any light came out, it must have come from here. So you just measure how much light comes out, and then you scan that null around. You can build up a map of where your sources were, where your emitters were, without any light ever traveling ballistically, without any light traveling in straight lines, which is normally what we require to see through a transparent object. Light has to go in straight lines.

MARIA: Exactly. So, in a sense, a lot of the fluorophores we were developing — you could think of them as a way to improve imaging deep in human tissue.

HOST: The chemicals would not go in the body, would they?

MARIA: The ones that he synthesized, no. But the ones that we could make without protein could.

ANDREW: I remember very vividly — whenever Maria’s about to embark in lab work, I try to spend a little time on the phone first. Maybe somebody knows something that could save us a little time. So we called up Adam. We love Adam. And we told him, “Hey, your work’s incredible. Why aren’t you still working on this?” And he described the buffer to us, he described the conditions, he said how difficult it was to get the effect robust and strong. And Maria says, “Oh, do you think it could work in fluorescent proteins?” He says, “If you can find anything where the fluorescence response to magnetic fields is in liquid water at body temperature, you call me.” And that’s what happened. She found it. We called him and he immediately jumped right back in. He says, “Awesome.” And it’s just really beautiful work, obviously inspired by him. Then we got to play our part and pass the ball back. I love that guy. It’s nice working with people that make you glad you built on their work and glad they built on your work.

---

The Markus Meister critique

HOST: So you were giving a talk at Calico. Did you also work there, Maria?

MARIA: Yes. So we’ve been working together for a very long time.

HOST: What’s interesting is you had arrived at this thing of, like, “These papers seem fake.”

ANDREW: That was my opinion. Yeah.

HOST: Well, what’s interesting, of course, is that there’s a famous 2016 paper from Markus Meister at Caltech where he quantifies the same thing — he says most of the magnetic effects reported in papers cannot be true. For example, there was a paper where they attached ferritin to a channel and then used a magnet to open the channel by pulling on the ferritin. And Markus Meister was like, “You would need 100,000 times more iron atoms to exert enough force to open that channel.” Thermodynamically impossible. So people were already kind of criticizing this field when you came into it. It’s just kind of surprising to me that you were like, “Oh, let’s try to find this effect in biology” when Markus Meister had already said the effects in biology are kind of garbage.

ANDREW: This is a good one. So, dead on — my default position was, I would not expect to find a magnetic response in a biological system, based on both what I know by physics and what I saw in the literature. And I know you meant this in a nuanced way, but the word “fake” — I do think a lot of magnetogenetic effects were not real. I didn’t get the impression the paper was written in bad faith. I got the impression they’re attempting a difficult task and failing in a very human, reasonable way. I know that’s what you meant.

The work from Adam — the effect was so huge, and it was a system where, sure, the chemistry is different, the context is different, but the photophysics of fluorescence is fairly universal. Proteins from jellyfish and corals are fluorescent; the synthetic organic fluorophore is fluorescent. So this clear demonstration that it was not physically impossible — it might not be biologically compatible, but it was definitely not a violation of thermodynamics, not a violation of energy conservation — gave us the confidence that we’re not doing something which is fundamentally insane. We’re just doing something I’m not super optimistic about.

One of Maria’s secret weapons here is the way she does science. She is incredibly good at making many bets in parallel, and she’s an obligate gambler. She’s a degenerate gambler. She likes to wager. When you’re doing protein engineering, there’s a lot of “mutate and screen” — you do a bunch of things that, yeah, no one of them should work, but just make it cheap and fast enough, and very reliably some of them will. I don’t know anybody that more reliably produces value every month, despite any single thing she does being borderline crazy. So just throwing one more crazy thing in the hopper was a pretty light lift for her. She’s engineering fluorescent proteins — might as well wave a magnet past them while she’s doing all the other work she’s doing.

In hindsight, the real work wasn’t protein work. The real work wasn’t the biochemistry. The real work was getting really good at measuring very small effects. But the cool thing is, a lot of her existing skills — the work she was already doing — involves, in engineering, you always start with going from zero to one. So you measure some small effect. She’d done that with biosensors — make something which is sensitive to calcium, make something which is sensitive to metabolites like NAD. They start off insensitive, and your very first hit, where it goes from insensitive to sensitive, could be garbage, could be noise. The garbage costs you very little the way she does her science. The hits are priceless. So taking a tiny effect and engineering it to make a big effect is a skill she had been practicing for years. The additional labor to add a magnet to it, as crazy as it sounds, was pretty marginal.

---

The actual experiment

HOST: So tell me about the experiment.

MARIA: I remember this story slightly differently — because it actually took me probably at least three months to get good at measuring magnetic-field effects. The first thing I did was to wave magnets at proteins and I didn’t see anything. But I was pretty obsessed with finding it. So I had to master the art of measuring fluorescence changes induced with magnetic fields in the systems I had access to, which were flavins with tryptophan, or —

ANDREW: Oh, that’s a good point. It was well established prior to anything we did that, in addition to the system from Adam, if you just take flavin and mix it with hen egg white —

MARIA: Lysozyme.

ANDREW: Not a biochemist. A fairly boring protein mixed with a vitamin that’s in most foods we eat. It was known that the fluorescence from that system was weakly magneto-responsive, which gave her a reference to get good at, to practice with.

MARIA: We needed a positive control, and on top of it to rig something up to control the magnetic field. So it was a little bit of an investment, but I’m glad we pursued it. Once I got into the universe of being able to remotely see things that responded to magnets, then waving the magnet worked.

HOST: So tell me about the experiment. You saw it in flavins with lysozyme, and you said, “I want to find a single protein that might have the same effect.” Walk me through that. What was the experiment that you ran?

MARIA: Oh, so I had been working with lots of fluorescent proteins before — I was doing a lot of engineering in that area. So I had lots of very interesting and unusual photophysics proteins, and I expressed them all in *E. coli* and put them on the microscope. I had a little servo motor with a magnet on a popsicle stick that was moving on top of it. So I could see the *E. coli* and the lysozyme —

HOST: Wait — so you’re growing cells in a dish and they’re expressing a protein?

MARIA: Yes. So I’m growing *E. coli* like normally, and then I just scoop some *E. coli*, put it on a slide to look in the microscope, and then on the microscope I had a little servo motor that would move the magnet in and out.

HOST: And then you’re just recording this to see if there’s any fluctuations?

MARIA: Exactly.

HOST: And you have some image segmentation?

MARIA: Yes, usually I draw a little square and I’m like, “Do I see any changes?” And out of that, we got — GFP, the most common fluorescent protein ever, the one that is most widely used — was wiggling a little bit. And that became super, super interesting. The problem is, I’m a biochemist, so the first thing I did was, “I want to purify this protein and I want to look at this protein without all of the extra *E. coli* junk.” And the moment I do that, there’s no more magnetic-field effect. And the effect that I was seeing in *E. coli* was really small, but it was not noise. It was definitely something I could —

HOST: I want to ask about that, because — I’ll put the chart up on the screen — there’s a famous, well, niche, cult-classic chart, and the fluorescence is, let’s say, 44,000 units, whatever it is, and with the magnet it’s a change of like 100, which is a fraction of a percent. So most people would look at that and be like, “I don’t know — it’s so tiny.”

ANDREW: Here’s the nice part. One: totally agree, everything you said is fact. The cool part about the signal — we place a sharp distinction between signals that look like *this* and signals that look like *this*. Even if it is a one-part-in-400 change, which I think is about how big it was, it was a one-part-in-400 change that was massively bigger and cleaner-looking than the noise. So we look at it and we say, “We know damn well it isn’t noise.” That does not mean it’s real. Holy cow, did we spend a while establishing that it was something we could actually trust enough to speak aloud in public.

The major anxieties were: okay, it’s not noise, but it’s probably heating, because we use an electromagnet. Oh — well, hence the popsicle stick. No heating from the popsicle stick. Oh, well, it’s probably vibrations from the popsicle stick. Well, the electromagnet doesn’t vibrate, and the signal looks the exact same if it’s a permanent magnet or an electromagnet. Oh, maybe the magnet’s pulling on the optics. We swap in a sample — for example, Maria said the purified protein is still fluorescent but is not magnetoresponsive. We don’t see the effect. The effect follows the chemistry, not the mechanics of the setup, not the thermodynamics, not the vibration. The effect follows the chemistry of the sample. That left us feeling a lot more comfortable, a lot more confident, taking a tiny-ass signal and attributing it to the thing we wanted — which is dangerous — rather than the 10 other fake things it could easily be.

HOST: True. But then you took it out of the cell and it doesn’t work. And then did you start to question yourself again?

MARIA: Yes. Yes. The beginnings of this project were a lot of questioning. At that time I was working with Rebecca, who was staying with me for the summer.

ANDREW: From Adam Cohen’s lab. She was interning with us. Lucky us.

MARIA: Yes, we were very lucky, and she’s a very calm, very methodical person. We saw the effect together, and she’s like, “It’s there. We just have to figure out how to bring it back.”

ANDREW: She’s used to signals that are there one day, gone the next. “Why didn’t I go work on a more robust signal?” — maybe consistent with the point you’re making. And Rebecca said, “It was there yesterday. It’ll be there tomorrow. Keep looking.”

MARIA: So what we did was screen a library of metabolites, because we were like, “There’s something inside of *E. coli*.” I knew that, because if I mushed *E. coli* and added it and mixed it with my pure protein, I would get the effect. Like a cell-free extract. And pure *E. coli* was not giving me the magnetic-field effect — which, by the way, is not what I learned now, looking back. I wasn’t in the right universe to — the imaging conditions need to be very good, like your power levels, where you’re looking in your spectra. They need to be right in order for you to see the effect in fluorescence. So it was pretty lucky that way. I have seen now that the autofluorescence of *E. coli* is magnetic-field dependent. You just need a lot more power than what I was using at that time. So there are actually flavins that are interacting with random proteins inside the *E. coli* cell, and they’re modulating the fluorescence a little bit.

ANDREW: It might be worth emphasizing — when we say you turn the field on and off and you see the effect go up and down, what we don’t see is a square wave. We’re not, like, “The brightness just follows the magnetic field,” because the magnetic field is basically following a square wave. What we see is this curving, sawtooth shape. So to us, that usually means: we’re not directly changing the brightness with the field. The light is photoswitching the sample. The light is changing the sample from one state to another, and that rate is what the magnetic field is affecting.

So earlier when Maria said power has to be right — a lot of the time when we send samples to other people and they don’t see a signal, the thing that goes wrong most frequently in someone else’s hands is their light wasn’t bright enough. Because the light has to be bright enough in the first place to drive switching between these two states to a substantial, measurable degree, and then the rate of that switching is what the magnet will turn on and off. So the exponential curve is what the magnet will modulate. When she says she had to be in the right universe — lucky for us, she happened to be using a bright enough light to drive the effect she observed.

---

Directed evolution and MagLOV

HOST: So after you found this protein and you said, “Okay, I’m convinced that it’s real,” the next thing you did was directed-evolution experiments where you said, “Can we make the effect much larger?” Tell me about that.

MARIA: At that time, we had identified — the result from the screening of the metabolites was, you could restore the effect on GFP by adding a few metabolites. Flavin was one of them. We like flavin because it’s cheap, easy, and there are more enzymes and genetic tools that make it nicer to work with. If you wanted to, let’s say, attach a flavin on a protein forever, there are tools for that.

ANDREW: Super biocompatible.

MARIA: There are other metabolites — probably the ones inside *E. coli* are not necessarily flavin, they’re some of the other ones — that, when brought in close contact with GFP, would make you see that magnetic-field response. But I also identified another fluorescent protein called the LOV domain, which is what we use today. And that protein, when I purified it, was still magnetoresponsive.

HOST: Without flavin?

MARIA: Without flavin. But the LOV domain has a cofactor, in a sense. The heart of the LOV domain is a flavin. So the LOV domain is just really good at sequestering its own flavin and having it be part of it.

HOST: But to clarify — the LOV domain grabs flavin from the cell?

MARIA: Yes.

HOST: And are there human cell types that do not produce flavin?

MARIA: We’re bad at making flavin, but we’re good at eating it. So it’s in you, but not because you synthesized it.

HOST: So if you were to put a LOV-domain therapy into a human, it would find flavin somehow?

MARIA: It works in mammalian cells. It definitely would find — most likely it would find its own flavin. It’s a ubiquitous vitamin used for transfer of electrons inside the mitochondria.

HOST: So you did this directed-evolution experiment on the LOV domain, and you found a protein called MagLOV that has very strong magnetoresponsive effects. Do you have a sense of what those mutations actually did that makes the protein so responsive to magnetic fields? I love this — because I literally don’t understand that. Does anyone understand the mechanism of what’s happening here for very strong effects?

ANDREW: Oh, let’s lead off with: we have a lot of information about the mechanism. A lot of data, a lot of constraints, a lot of thoughts, a lot of opinions. I will also state: we know we don’t really know. We would love to know. But it’s the hilarious thing about directed evolution — you need to be really good at measurement.

HOST: Before we go into this — how did you actually do the evolution experiment? You’re mutating the gene encoding LOV, and then you’re screening in your same assay with microscopy, and then you’re picking cells that seem to have a strong effect?

MARIA: Yes, that is pretty much it.

HOST: Wow, so tedious.

MARIA: I am a gambler. So every time I do it, it’s like, “Yes, I got a good one!”

HOST: How long did that take?

MARIA: The way I have to do this is, because I want to maintain identity between a phenotype and the DNA that encodes for that protein — because I can’t just screen for brightness and take whoever is brightest — I have to switch the magnet on and off and ask the same *E. coli*, “Are you responding?” So I do it in a photography system. You take pictures of a plate of *E. coli* with colonies, and they’re all clonal colonies, so each one represents one variant of the protein. I make libraries that are relatively small. It just means I have every single substitution at every position, but not more than one mutation per LOV domain.

HOST: So it wasn’t random mutagenesis?

MARIA: No. I don’t usually do random mutagenesis because it tends to lead to very large libraries, which is amazing if you can screen millions of them. But because I have to take a picture and move the magnet and see what happens, it becomes a little bit of a slower process. But the data is very, very rich. It moves forward. In my experience, it’s been very productive.

ANDREW: She can do 40 of these in parallel. One iteration takes between one and two weeks. And it’s something she has taught undergrads how to do, and it works well in the hands of undergrads. So on the one hand, it’s tedious. On the other hand, taking the *Avena sativa* LOV domain straight out of the oat plant and turning it into this staggeringly magnetically responsive engineered protein — three months. Once she was good at the measurement, which took a long time, the actual engineering — snap.

That’s kind of a beautiful thing about directed evolution. As tedious as it sounds, I think there’s a reason more people don’t do it, and one of the reasons I got animated about this: Maria is one of the very few — possibly the only — people on the planet who is currently actually engineering proteins to be more immediately magnetoresponsive. It’s not that hard. It’s not trivial — there’s an activation energy, there’s a skill, there’s a craft. But why aren’t more people doing this? Tons of academics are downstream of her work, taking her proteins and studying them, asking cool questions. It sounds intimidating, but science is hard. Everything in science is a slog. Any wet lab — mouse work is soul-crushing. As tedious as this is, one to two weeks to make a substantial improvement that hundreds of thousands of people are going to build on and cite and be excited about? Could be worse.

MARIA: They’re relatively cheap experiments. I like cheap experiments because it means I can try more things at the same time.

HOST: So you found a protein that had five amino-acid substitutions from the wild type, and the effect went up to like 75% responsiveness. Where were those mutations happening? Let’s get at the mechanism.

MARIA: Evidence that we’ve gathered since we started working with these domains tells us that what we’re modifying is electron transfer. So we are, in a sense, controlling how long a molecule remains charged by adding the magnet to it. Based on a huge, very long history of studies on the LOV domain, you can trace some of those mutations and generally attribute them: “Yes, now I see why this appeared, and I see why I need this mutation.” The first mutation you always get breaks the photoswitching cycle, right?

ANDREW: Yes.

MARIA: So the LOV domain has kind of a complicated photo cycle. The LOV domain takes a photon, it gets excited. Then you can either fluoresce, or you can go into a triplet state — a lower but super-long-lived energy level. And those triplet states —

HOST: Now very reactive.

MARIA: And in the wild-type LOV domain, that can react with a cysteine, and it reacts relatively quickly. And that initiates a huge conformational change.

HOST: But to clarify — so there’s this charged — it doesn’t emit fluorescence, it enters a triplet state, and that makes it reactive. And then it dumps that energy onto a cysteine, which is just an electron transfer. When you say electron transfer, it’s just the movement of an electron. Why would the movement of one electron completely change the shape of the protein?

MARIA: That’s actually quite interesting. And I don’t necessarily have a pretty good explanation for it.

ANDREW: Yeah, we’re not specialists. We were not specialists in the LOV domain three years ago.

MARIA: On how to simulate those changes and how that translates into a changing structure.

ANDREW: But that’s like the native protein, before she did any of the engineering. And the first mutation she finds every time she does directed evolution is: destroy that.

HOST: Destroy what?

ANDREW: The cysteine. The cysteine.

MARIA: Yeah, it’s the first mutation every time.

HOST: So it sits in this charged triplet state?

ANDREW: Exactly.

HOST: And it can’t dump its electron?

ANDREW: Yeah. It probably does charge transfer to somewhere, but we suspect the charge transfer does not result in that classic change of the bond and change of the shape of the protein. We spent a while actually trying to establish whether or not the protein still changed shape in her magnetosensitive engineered versions, because it’s believed to do so in the default wild-type version.

HOST: But in the wild-type version, you’re listing these two states — saying it can either release fluorescence or enter a triplet state. Is the triplet state exceptionally rare? Is it usually — it just gives off light?

MARIA: Yes — except in LOV domains.

ANDREW: Yeah, that is weirdo. LOV domains are weirdos. They have a —

HOST: So this is just stochastic?

MARIA: Yes. The protein is like, “Depending on local fluctuations, it might do this or it might do this.”

ANDREW: In GFP, the branching ratio is 100 singlets — 100 normal fluorescence events for every one triplet. In the LOV domain —

MARIA: It’s like 60% quantum yield.

ANDREW: According to the literature. We didn’t personally verify this.

MARIA: So in the LOV domain, most of them — half of them — are going into the triplet state, and they’re available to do this type of electron transfer that we’re controlling with the magnet.

ANDREW: Of all of those steps, the only one we believe is magnetically sensitive is whether or not that charge transfer occurs. That’s the thing we think goes up or down as we turn the magnet on or off.

HOST: Wait, so just so I understand — so you’re biasing — you can engineer the proteins to bias which state it enters, so that instead of fluorescing it will mostly drive electrons to this triplet state?

ANDREW: We hoped that would be magnetosensitive. We don’t have evidence of that, and we don’t currently believe that to be true. What seems to be the case is: if you end up in the triplet state, what you do next is magnetic-field dependent. And the cool part there is, the lifetime of the singlet is like nanoseconds; the lifetime of the triplet for the LOV domain is tens of microseconds. So way longer than fluorescence, but still pretty transient.

Whether or not this charge-transfer event happens — which we believe is often an internal event, the electron goes from this spot in the protein to that spot in the protein, that’s our cartoon model — whether or not that occurs depends on the magnetic field. If it occurs, the resulting charged form is super long-lived. Like instead of nanoseconds or microseconds, it’s seconds to minutes, depending on which mutant of the protein.

HOST: What is the magnet doing? We have a protein. It’s in this triplet state. You’ve caught it in that time where it’s like the few microseconds. You put a magnet on it. What is actually happening?

MARIA: This is where we get into quantum. I can tell you, and I can repeat my understanding. When you’re in the triplet state, you have different energy levels where your electrons can be, depending on their spin — which is the thing that magnetic fields modify. They perturb the spin a little bit. So when you add a magnet, you actually separate those energy levels a little bit more, so you can’t go back to your singlet very easily. So you’re stuck in that state for a little bit longer. So we’re pretty much capturing things.

ANDREW: It might be worth emphasizing: up until now, everything we’re telling you, if you dig a little deeper, I can tell you, “Here’s the experiments we did that convinced me this is true. Here are the supporting facts in the literature I have a lot of faith in.” What Maria’s describing now is the very root question: of all the processes that we think are not magnetosensitive, the one we think is magnetosensitive — whether or not this charge transfer occurs — the internal details of “How is it the charge transfer can be magnetosensitive?” There is exactly one known theory for how a charge transfer could be magnetosensitive. It may or may not be correct, but there’s no second theory. It’s called the radical-pair hypothesis. Magnetosensitivity of charge transfer always gets attributed to the radical-pair hypothesis because we don’t have a second hypothesis.

What I rarely see in the literature, or in our discussion with colleagues, is quantitative predictions that follow directly from the radical-pair hypothesis that are falsifiable. What everyone seems to say is, “Ah, the charge transfer seems to depend on the magnetic field — must be the radical-pair hypothesis.” More than one group, I believe, is trying to not just attribute this to a model, but falsify the model if it’s false, or support it if it’s true. I think one of the predictions is: how will the probability of charge transfer depend on the magnetic field? Because what we see is — not enough field, you don’t see anything; in a field, you see an effect; too much field doesn’t improve on it. The effect saturates. It’s not “I add more field, I get more effect.” It’s like, 10 millitesla, good effect; 0.1 millitesla, no effect; 50 millitesla, similar to 10. Diminishing returns.

One of the predictions of the radical-pair hypothesis is that below some threshold, the sign of the effect ought to reverse. I’m not personally familiar with this, but my colleagues have told me these are the predictions they’ve made. So if someone wanted to falsify the radical-pair hypothesis — quantitatively, how that probability of charge transfer depends on the magnetic field strength — that’s how you’d falsify that hypothesis. For that reason, I tend not to say the words “radical-pair hypothesis,” because I know I don’t know what I’m talking about. I strongly believe, with lots of evidence, that charge transfer depends on the magnetic field. And the literature will predict the thing that the physics Maria is telling you right now.

HOST: But why is this actually changing the fluorescence of the protein? You’re telling me it’s not fluorescing, it’s entering another state — and then you shine a magnet and then the fluorescence goes down. What is happening there?

MARIA: Because you’re in a sense — if you’re controlling how much is getting stuck in this other state, the long-lived charge form —

ANDREW: Super long-lived compared to any of the other states in question.

MARIA: Yes, which we believe at that point is not fluorescent. Then you’re stealing from the population that would normally be doing fluorescence.

HOST: Oh, I see. So it’s just that a subset of the proteins in your sample are not fluorescing. That’s it? And that’s why you see a dimming effect.

ANDREW: Yep. They end up charged, they end up dimmed.

HOST: So if you see a 75% effect, it just means you’re coaxing more proteins into a charged long-lived triplet state?

ANDREW: Yeah, exactly. Or a long-lived charged state — the triplet state is super short-lived. Sorry for the pedantry.

HOST: But doesn’t that mean, then, that you’re biasing the protein away from the singlet? Is that the magnetic effect?

ANDREW: This is the one that might be worth — these are all the main characters of our story. Basically: ground state, default form of the protein. Shoot the ground state with — in the case of the LOV domain — blue light. It gets excited. Default excitation is singlet. In the case of the LOV domain, there’s a hell of a lot of singlets that turn into triplets. (This is our cartoon model.) Lifetime of the singlet, few nanoseconds — goes back to the ground state, gives you a photon. Lifetime of the triplet, tens of microseconds — one way or another, goes back to the ground state. Both of these have a lot of energy in them. One of the things the triplet can do is participate in charge transfer. If it participates in charge transfer, then it ends up in this super-long-lived state — the seconds-to-minutes state. That seconds-to-minutes state, in many contexts, we believe is essentially just a non-fluorescent state that doesn’t participate in the photo cycle anymore. And just like you said, that shelved population just doesn’t participate anymore. So you go dark.

We don’t know as many of the properties of that state as we’d like to. So it’s a good question — when she’s mutating her protein, is she changing rates of triplet generation? Is she changing rates at which the triplet turns into the charge form? Is she changing rates at which the charge form returns to the ground state? We don’t always stop to —

This is a cool thing about directed evolution: you don’t need to understand a goddamn thing. You can just keep engineering without looking at what you’re doing at all. We love mechanism. We love understanding. We don’t *need* understanding. So a really common workflow for Maria is, she’s making a ton of progress — she and her colleagues at Nonfiction are making a ton of progress engineering proteins, making them do incredible new things, not necessarily understanding any of how they work. And in parallel with that, when we’re drinking a beer, we talk about what the data has taught us.

MARIA: And we love to work with people, because these are very specialized measurements that, even if we wanted, it would take a lot of years of expertise to develop. So, in a sense, working with people that are doing this kind of transient spectroscopy —

ANDREW: She tends to pass the things she’s engineered — that are wild and exotic and huge-effect-size — to academics. And there’s a beautiful synergy there: these academics that just kind of by identity would not have done the mindless engineering — it’s not mindless, but maybe feels to them mindless — the engineering work of just optimizing the protein, they wouldn’t do it. But now we’ve handed them a really big effect size that they can study in a very precise, relaxed way. We’ve got lots of prototypes like this.

Another nice thing — this is something I didn’t realize, but it was kind of forced on us. Good luck engineering the hell out of a system without coming to understand it by accident, just in passing. So I think it’s often regarded that a side effect of curiosity-driven basic science is invention and technology. And sometimes, yeah. A side effect of *just* engineering the damn thing is you learn a lot about the basic science, whether you want to or not.

MARIA: It is true, because before looking into this, we had no idea — we never heard about the radical-pair mechanism. We had never read any magnetic-field-related literature. From scratch, we did this whole go-around-the-table, and before we heard about the radical pairs we were like, “This is all consistent with electron transfer from a long-lived triplet state.” It’s just funny, either way. It’s nice to have the literature to refer to and put more of a framework on things. The fact that we came back all the way to flavins — it’s funny, because I never went — and like most people would start with flavins, and they just came out from a screen.

ANDREW: And a beautiful thing is, so many of these people that are doing the hard work of studying these tiny effects in endogenous systems — to win arguments with other academics — she just handed them easy mode. And okay, they didn’t want to understand this engineered system; they wanted to understand native biology. But it turns out it’s really productive in many cases for them to start by getting really good at and understanding the engineered system. After you’re good at easy mode, now do medium mode, now do hard mode. Just like she couldn’t see the signals when she first looked for them — now it’s second nature.

---

The conformational-change red herring

ANDREW: At one point, when she’d engineered the LOV domain — MagLOV — to be extremely magnetically responsive: we know *Avena sativa* LOV, the thing straight out of the oat plant, is believed to change shape when you shine light on it. And it’s commonly stated in the literature that that shape change is how you make other optogenetic systems. You take this light-responsive protein that changes shape when you shine light on it, you attach it to a cellular protein, and perhaps the change in shape of this protein disrupts the function of that protein.

We wanted to control the function of proteins. So we thought — we’ve changed our thinking since then, but we thought — “Oh, we need to show that the magnetic field controls the shape of this protein.” I don’t believe that anymore. We need to show that when you attach this protein to that protein, we turn the function on and off. The cool part about that approach is, just measure it. You don’t need to say anything about the mechanism. But for quite a while, several months, we were hung up on the mechanism. We wanted to show the shape change.

HOST: Because the idea was you could attach MagLOV to anything, put a magnet on it, and turn off that thing inside the body, basically?

ANDREW: That’s it. It would be changed — because at that point we thought it had to be a conformational change.

So Maria designs this beautiful, elegant assay to establish once and for all that the shape of this protein is changing not just when she shines light on it (which is already a literature-based belief) but also when you turn on a magnet. There’s this family of calcium sensors where a fluorescent protein from a jellyfish is attached to a calcium-binding domain — a fusion. Proteins from two different creatures, but you fuse them together. The idea being: this calcium-binding domain changes shape depending on the calcium, and it warps the shape, presumably, of the fluorescent protein. By looking at the changes in the fluorescence, you can infer changes in the calcium.

So she thought, “I’ll make a similar system. I want to establish a shape change. Okay, it’s not a calcium-binding domain — it’s this light-sensitive, magnetosensitive, hopefully-shape-changing domain. I’ll attach it to another fluorescent protein.” This one already fluoresces green, so she used a red fluorescent protein. She uses mScarlet. So she’s got her hopefully-shape-changing magnetosensitive domain, and her red fluorescent protein. And she attaches the two parts that are supposed to change shape to two different parts of this protein — but she doesn’t know where to attach it a priori, because we’re not protein designers. So she just tries a bunch of things — she does an insertion library, makes a bunch of different versions of the protein where they’re attached in different ways. She screens the library to check for the red fluorescence.

How is a magnetic field going to affect the red fluorescence if it doesn’t change the shape of this one? So she’s looking for hits where you shoot this guy with his color, you turn the magnet on and off, presumably change his shape — a hit would mean, “Oh, this shape change affected the red fluorescence,” which the magnet shouldn’t do anything to. This one alone is not magnetosensitive, especially if there’s no flavin. But she puts them together. She gets a hit! Super excited. I think it was right before Christmas. Leaves for Christmas. Comes back after Christmas. “Okay, let’s sequence the hit.” It’s a terminal goddamn fusion.

HOST: Why is that so disappointing?

ANDREW: It’s a one-point connection. There’s no way — pardon my language — there’s no way for the shape change here that we can imagine to change the shape of this guy. So a hit doesn’t make any sense. It’s boiled nonsense. And we realize, “Oh, goddammit — it’s probably charge transfer.” That’s how we got onto the charge-transfer kick. Good luck explaining that with a shape change, which was our dogma — our guiding light — for a while.

It really emphasized for us: focus on what you can measure. Hypotheses about intermediate states are awesome. Focus on the observable, focus on the quantifiable. Since then, Maria’s screening philosophy has shifted from reasoning about an intermediate quantity — shape change — to focusing more on a terminal quality. For example: she wants to make a drug that only works near a magnet. Just check if the drug works or not. Measure that, and then mutate the protein to turn the thing you wanted up and down. Don’t screen for an intermediate proxy that you think will lead to what you want. Screen directly for what you want.

HOST: Have you found that when you’re trying to engineer other proteins, do you always need a flavin-like thing? Or are there proteins that you’ve now found that have no cofactors that are just magnetically responsive?

MARIA: What we’ve been doing since then has been working with this LOV domain, which has the flavin cofactor. Flavin — there are other molecules that could potentially replace flavin. This is just what we’re doing because of convenience, and because we’ve engineered based on this particular protein. It’s actually turning out to be easier than I expected, in the sense that we’ve been able to control lots of protein-function activities. I assumed that each one of those activities would be an enormous feat of engineering and time, and they’re not necessarily — any of them — done amazingly, amazingly well. But the fact that we’ve had hits relatively easily, and we can control enzymatic activity, control transcription, control binding — it’s a lot of activities that very easily we get a small effect, but it’s there. So that has made me pretty excited. And then there’s the engineering to make it bigger, which takes a little bit of effort and time.

ANDREW: The hallmarks for magnetosensitivity, for everything that she’s seen — everything shares at least two features. She got the system into a very energetically excited state — the triplet state. And the triplet state participated in some type of charge transfer. That’s a magnetosensitive step. And then that charge transfer had some measurable consequence. In the case of fluorescent proteins, it’s been known for a while that when you change the charge state of a fluorescent protein, you do modify the fluorescence — usually you just turn it off.

We believe — I won’t say we have the same evidence for this — but the antibodies that she reported recently, we strongly suspect the mechanism there is: she’s generating triplets in a LOV domain which is attached to these antibodies, and whether or not they participate in charge transfer at the antibody, we suspect that is the magnetosensitive step. We suspect that a charged antibody and an uncharged antibody have different binding affinities for their target. I don’t want to say we have evidence for that, but it’s not the most insane speculation I’ve ever made.

HOST: And you trust the results?

ANDREW: It definitely — and also with enzymes.

HOST: Sorry — did you say you always have some kind of cofactor in the LOV domain?

MARIA: Yes, there’s always flavin. Always has flavin.

ANDREW: Okay. So those two hallmarks: a triplet state, some type of charge transfer, and then some consequence of charge transfer. Everything she’s doing with a LOV domain is a flavin, no matter what. In her original discovery in proteins from jellyfish, proteins from corals, flavins were certainly capable of being the cofactor. But as she mentioned briefly earlier, there were other non-flavin cofactors that we suspect they’re all implicated in charge transfer. We suspect anything that’s good at exchanging charge with a triplet in a GFP could be a cofactor.

The reason I got super-excited when you asked the question is something I don’t think she necessarily has the luxury of pursuing — the company is trying to cure diseases, so that’s something of a focus. But there are quite likely endogenous biological systems that produce energetically excited states as a side effect of metabolism, let’s say. Perhaps some of those are triplet states. Perhaps some of those triplet states participate in charge transfer. Perhaps those charge transfers have some measurable impact. There may well be strongly magnetoresponsive systems in the body if there are excited triplets that participate in charge transfer. It just may be that what’s downstream of them is kind of hard to measure. You see what I’m getting at? Quite possibly this particular form of magnetosensitivity is present in the body. It’s just not upstream of something that we’re good at seeing a one-part-in-400 change in.

MARIA: I was going to correct the word “strong” — I think what you mean is there are some responsive systems, but not strong.

---

The Quantum Biology Institute

HOST: Well, it’s also such a beautiful segue, because my next question is: there is this growing literature from the Quantum Biology Institute and elsewhere that show very weak magnetic fields — weaker than the Earth’s natural magnetic field — are enough to do things like change the lag phase of *E. coli* growth, or these other physiological effects that they’ve seen with development. What is sort of the plausible mechanism, or reason, that evolution would create such a thing if these types of weak magnetic fields don’t even exist? Is it just that this is a feature of biology that has never been selected against? What is actually going on here?

ANDREW: Maria’s results are consistent with what you just said — meaning, I think, consistent with what you said: biology did not have access to 10-millitesla fields to evolve in the presence of. And in her hands, some systems seem to try to evolve them. It doesn’t work very well. If you try to make GFP brighter, you’re almost bumping into energy conservation — that’s tough. But maybe another reason it’s tough is, evolution already, for whatever reason, tried to make GFP pretty good at being bright. So making it a lot brighter, you’re already at the peak of a fitness landscape. Everything she’s engineered for magnetosensitivity has all the hallmarks of something which is nowhere near the peak of a fitness landscape — because why would it be? Why would it ever be able to be selected for that? When there’s something which depends on the protein sequence, isn’t forbidden by physics, and hasn’t previously been optimized by evolution — in her hands, it usually responds like crazy to mutate-and-screen, whereas things which have been optimized don’t.

Actually, shout out to the Quantum Biology Institute. One of the things I really love about them: whenever you say “magnetosensitivity,” a lot of folks hear “tinfoil hat.” They hear crazy stuff that you shouldn’t trust. And you heard me express some skepticism about some other work earlier in the conversation. For magnetosensitivity and biological systems in particular, I think peer review is great, but reproducibility is the only thing we can really trust. Anybody that makes a claim — if it works in somebody else’s hands, that’s when you should start taking it seriously. And what I love about Maria’s work is, she makes the effects so big that it just trivially works in everybody’s hands.

What I love about the Quantum Biology Institute is, these guys are taking reproducibility very seriously in a way that I think a lot of academics don’t have the luxury of. Felice knows: if she wants to be taken seriously, she can’t just share some data and make a claim. She has to reproduce claims that have been made; she has to make claims that are reproducible. It’s the only way I’m ever going to trust that stuff. It has to work in more than one lab.

HOST: Have people reproduced that — like the lag phase of *E. coli*? It’s clear though that nobody knows the mechanism, right? It’s like we’re measuring effects, and they’re like, “Well...”

ANDREW: I don’t know shit about that. I guess what I love is, unlike a lot of groups I’m familiar with, I think they’ve done something really public. “Hey, let’s announce what we’re doing. We’re trying to reproduce this claim. We’re going to share everything we can.” I think it’s the only way to take the field out of the dark ages and into the “I actually believe it” phase. Sorry, maybe that was off-topic.

MARIA: I would take different approaches, because they’re very, very small effects. I don’t know — even selecting for *E. coli* that — pick a metric and engineer. My heart comes from an engineer. Why would we want *E. coli* that responds to magnetic fields? I don’t know — we can think of something.

ANDREW: Don’t win an argument about a small effect. Make a big effect, and then there’s no argument.

HOST: Oh, you’re saying that the effect is small, and so there could be sort of unplanned selection effects that are changing the lag phase?

MARIA: That’s fun. I didn’t think of that. Yeah. But if they are, can we make them respond to — like, be very slow, very, very —

HOST: I see. So you’re saying engineer *E. coli* to have super-strong magnetic-field effect. No one’s ever done that, right? Has anyone done organism-level selection for —

MARIA: No. But *E. coli* is perfect for that.

HOST: Yeah, you could totally put *E. coli* into a continuous bioreactor with some kind of magnetic-field setup. But I don’t even know what to expect would happen, I guess.

MARIA: I think it would probably be very interesting. Let’s assume that flavins and some other metabolites have an effect, and this is the source of the difference in lag. Then you start seeing all of these metabolic enzymes — and it will correlate. Potentially they are more magnetic-field-responsive enzymes to start with, which would be, I think, a neat result.

ANDREW: I’ll make a slightly spicy point. Take this with a grain of salt and know that I’m saying it with a smile. But a lot of folks that are interested in endogenous magnetoresponse are building on her work in an engineered system that they’re not necessarily interested in. We’re not really building on the existing literature for arguments about endogenous response. It’s more a philosophical point: it’s kind of funny how the engineering helped us with the science, and possibly just because I don’t read as much as I should, the science didn’t so much help us with engineering.

HOST: But to clarify — have you built any proteins, enzymes, antibodies completely by rational design? Or is it all still directed evolution? Or it’s like, you’re going to look for hits, because that’s not really engineering. It’s not engineering until it’s rational.

ANDREW: Well, let’s call it — pick a name and we’ll call it that. But it sure ain’t rational.

MARIA: I think what I like about what I do is that you do enough of them, and you start seeing rules. The rules don’t necessarily — like, you can start putting, you can look in the literature, “Oh, if we remove the cysteine, now the electron is hanging out here for longer. That kind of makes sense in terms of this model.” Even for the engineering itself — you do engineering, engineering, and you start getting rules of “We want it sturdier. Here we’re changing charges. We’re moving — we think we’re going from here to here on the electrons, and we’re making a path.”

---

Founding Nonfiction Labs

HOST: Tell me about founding Nonfiction Labs. You’re both at Calico, right, at the time. And you decide to leave that to do this for-profit company. Why not just stay and do the R&D and then license out the technology, or something like that?

MARIA: We were having a very, very good time at Calico. Calico was an amazing place. We had a lot of freedom, we had our time, in a sense. After we discovered the magnetic-field effect, it’s not something that is necessarily aligned to the rest of the company, but we thought it was really neat and worth doing — because we could, and to take it the next step further from just fluorescence and just microscopy.

HOST: And they would have kept supporting it?

ANDREW: We had our freedom. What we didn’t have was mission alignment. This is, like, a once-in-a-lifetime thing. This is the coolest science I’ve ever been associated with. Let’s take a big stinking swing at it. I did what we could to explore the possibility of taking a big swing at it internally. It just really didn’t line up with the mission. So the idea of splitting our time between the most important thing I could imagine working on and the thing I’m supposed to do seemed like a bad idea for everyone involved. Let’s go all in. Let’s roll the dice.

MARIA: I had met Richard Fooss, my co-founder, and he seemed perfect for this. He actually was with us while all of this was being born in terms of the science. So he knew it from the beginning, and he was excited about it too. We started brainstorming and thinking what could be the most impactful application. He was amazing. He set it all up, and here we are trying our best.

ANDREW: As you might imagine, our motivations are primarily nerd-based. This is the coolest science we know how to be part of. We believe it can create just magical-seeming technologies, capabilities. And let’s not forget — it’s not that we’re motivated by money, but money is how you get cool shit done. Sometimes begging is a really effective strategy to get some beautiful science done, and I have been a professional beggar my entire life, quite happily so. Yes, I definitely want more money to do more science. But given a chance to do even better — give it a shot.

Don’t get me wrong, you know, I think Maria’s company as a nonprofit would do just fine. I think it would make just as much sense. I think it’d be just as much fun. And I think at this point it wouldn’t even make much difference. But in the event that our dreams come true, in the event that what we see is almost a technological inevitability — well, somebody’s going to get rich off it. Why not the people who invented it and discovered it? Again, not that we really care about being rich, but goddamn, would we like complete intellectual freedom and the ability to bring beautiful ideas into existence with the help of as many of our brilliant friends as we can.

HOST: We’re filming this at the Biohub, and Andrew, you work at the Biohub, and Maria, you work at Nonfiction Labs. What was the story of leaving Calico, starting Nonfiction Labs — but now you’re at the Biohub?

ANDREW: I love the people here. I’ve been collaborating with the folks here for a decade or so. Well, not — okay, as long as they’ve been here, I’ve been collaborating with them. I love the spirit of this organization. I love the mission. I love the values.

There are a lot of ways to make traditional academic progress on this incredible scientific opportunity. And I think being at a nonprofit in the past has functioned as a funding agency to bring cool science into existence, fund people to do cool science. I thought I could potentially — basically I just want to drive the field forward. It’s the coolest thing I’ve ever been associated with. How can we capitalize on this as much as possible? How can we bring as much human opportunity into existence as possible? One way to do it is to come to a place that has incredible values, incredible resources, incredible people. And yeah, I’m not in control of those people. I’m not in control of those resources, but I have a voice. If I have good ideas, maybe they’ll listen to me.

Whereas starting a company — Maria’s in 100% control, her and Richard. If they got an idea, they can try it. There’s nobody older than them, bigger than them, that’s going to stop them.

MARIA: It’s not relaxing, but I do enjoy the pace. I do enjoy the flexibility, and the fact that if you think of something that needs to be done — the only question is, why didn’t you do it yesterday?

ANDREW: At Calico we had access to incredible resources, and we had a mission that did not overlap with this mission. And it turns out that having access to far fewer resources but focusing clearly on a mission where everybody is pulling in the same direction — goddamn, has she been productive since she started the company. Whereas if we’d been in an organization that was better resourced — a wonderful organization, but an organization that fundamentally does not prioritize this work — it turns out you’re just staggeringly more impactful, even if you have tenfold less money.

---

Investors and market

HOST: I’m curious — when you go to investors for the company, what do they think of this? You’ve mentioned it’s a $100 billion market. We can make so many drugs better. But is there a poisoned-well effect, because there is so much fake stuff in the field? What do investors say when you approach them?

ANDREW: I don’t think there’s a poisoned-well effect based on previous attempts at magnetogenetics. Prior to Maria’s engineering work that made the effect enormous, we had a lot of discussions about “Is it real?” The discussions about “Is it real?” stopped when we started showing people this figure. I’ll describe it verbally: it is a photography time-lapse of a plate of *E. coli*. Maria’s taking pictures of the plate in the fluorescence channel while I’m waving a magnet around underneath the plate. There’s this obvious, enormous effect. Prior to showing this picture to people, they want to debate with me whether or not magnetic effects are real. After I show them this picture, the discussion turns to practical details of how to make this into a technology. No further questions on veracity.

MARIA: But I think it is true that there’s a lot to learn on the technology, and a lot to still engineer. We’re still very early. We are still at the “get the technology very, very well” stage. The beauty is that we can move fast, and we’re moving fast. I suspect in one year we’d be in a mouse. But right now we are at the biochemistry level, and it’s a little harder to have conversations about future drugs — especially because people are not used to this kind of super-unusual way of completely changing — it’s not an antibody.

ANDREW: If you want to make safe money, invest in real estate.

HOST: Is most of your time at the company just basic R&D right now? Like, “Let’s try to expand this to new types of proteins. Let’s figure out what’s happening.” You’ve said that you’ve now translated this effect to enzymes like luciferase, antibody binding. One thing I’ve noticed is in the antibody work, for example, in the presence of the magnet, the antibody stops binding. Do you have any sense of whether or not you can flip these effects? Or is it only one or the other?

MARIA: So we have data from maybe a few months ago where we got mutants where the effect is reversed. And it’s very interesting. We’re doing some experiments — porting certain parts of one antibody to the other.

HOST: This is for antibodies?

MARIA: Mm-hmm.

HOST: Oh, okay. So it’s like the antibody binds tighter in the presence of the magnet.

MARIA: They’re very fresh results, so —

ANDREW: We didn’t tweet those results, because we don’t — we like to have things pretty solid before we announce them. We don’t like to retract our claims.

MARIA: But we’re working hard on understanding why that happens, and how to engineer that on purpose. Even if I’m not very good at actual rational design, I like learning rules — and usually the behaviors tend to more or less boil down to specific aspects that you can try to engineer into a protein.

ANDREW: There’s two factors here, and one of them is: you get what you select for. So why do her antibodies go a particular way? Well, the first hit happened to go that way, and we’ll steer into that. Although it is certainly true that some things, the selection is a slog, and some things, the selection is a breeze. When she made the magnetically responsive fluorophore, that was a few months and it was incredible. The magnetically responsive luciferase, I think far more time has been put into it. The modulation gets bigger, the progress is beautiful — we have a nice animation of, you know, zero effect, tiny effect, medium effect, bigger effect. But had that same effort been put into a fluorescent protein, the return on investment would have been much, much higher. So we don’t know why one thing’s easy and one thing’s hard.

In a similar spirit — can things flip both ways? When I showed you the result from Adam Cohen in the synthetic organic molecule that inspired Maria to look in fluorescent proteins, that got brighter when you put a magnet near it. When she found this effect in GFP, when she found this effect in the LOV domain, you put the magnet near it, it gets dimmer. So if I were a smarter man, maybe I could tell you why. But just the fact that some are brighter, some are dimmer — perfect. We’ll get what we select for.

---

What would a medicine actually look like?

HOST: What would the end state for these be? If you are able to make a medicine, what would it actually look like? Let’s say you take an existing antibody used to treat cancer. What are the actual modifications you have to make? How do you know it’s going to persist as long in the body as just the natural antibody? What are the sorts of things that you have to think about in terms of translating this into medicines?

MARIA: The way it would look — the parts you would need is the fuel. So you would need to have some sort of injectable fuel. You need the luciferase that is chewing that fuel and activating the LOV domain. So that would be all your antibody. In a sense, you’re injecting the antibody, you’re injecting the fuel. You could probably take the fuel by a pill — the fuel would be a small molecule, not a protein, so you wouldn’t have to inject it necessarily. Maybe in the very future. Right now we’re using something that’s not very easy to deliver.

HOST: But you think you’ll always need the light component?

MARIA: You need a way to get into the triplet state, other than external illumination — an enzyme burning of fuel is the other —

HOST: Magnetic effect.

MARIA: That’s right. It only stems after stimulation with a photon, or a photon’s worth of energy.

HOST: But doesn’t that complicate things? Are there safe fuels? Will the fuel distribute through the body in the way that you need?

MARIA: I’ll give you a lame answer. It works in mice. So, whatever that’s worth, it’s a very common imaging method in mice, and you can do it repeatedly. However, yes, there are concerns. There’s also lots of concerns regarding how immunogenic these proteins may be.

ANDREW: Just to expand on that for folks listening: when we say “it works in mice,” it is a common assay in drug development to put a luminescent protein inside of a mouse. The luciferase that burns the fuel, and the luciferin (the fuel), are both totally compatible with putting in a mouse and produce glow that you can measure from outside the mouse.

HOST: For example, you would do that if you wanted to figure out where an AAV capsid goes in the body — you can package it with a promoter driving luciferase, you inject it into the mice, it infects all the tissues, and then you actually record where photons are coming out of the mouse to figure out where that AAV capsid went.

ANDREW: Yeah, exactly. But you’re saying — has anybody ever done this on humans? No.

MARIA: Yeah, why would they? That would be hilariously unethical. But now that there’s an actual reason to do it, I am cautiously optimistic that there’s no reason in principle why it should fail in a human. There’s just the practical reality that going from a mouse to a human is a lot of work.

In a sense, going back to that original experiment where we were looking and asking the question of “Can we evolve *E. coli* and come out with metabolic enzymes?” — there’s no reason, like, probably FAD will be involved, but those are not the only molecules involved. The idea of having a molecule that can chew up sugar and make excited high-energy triplet states — it’s not insane. It’s something that happens all the time in metabolic reactions. There’s high energy.

ANDREW: If she had infinite money, she would probably be chasing other magnetoresponsive chemical systems. She has one that’s incredible. It’s like, “Wow, that’s as good as we need and more.” But her nature, if she had infinite funding, would be: let’s go find something that’s even better. Let’s go find something like an endogenous human system that happens to be magnetoresponsive that we can stack our engineering on top of. The dream would be a protein that *does* change conformation with a magnetic field, so that you can tether it to things and then you don’t need to worry about light.

HOST: But you’re saying, because you have to worry about the light, you need to inject fuel. Walk me through what the molecule actually looks like. You have an antibody. It’s fused to a MagLOV domain, which is fused to luciferase?

MARIA: Yeah. The current design has a nanobody with a LOV domain, and that gets controlled by light from a luciferase that is not magnetic-field dependent. But you can put both of them — you can use one to drive the other. This luciferase can have a LOV domain on its own. So you have two LOV domains on the design. It’s a nanobody, LOV domain, luciferase, LOV domain. And they’re all fused together into a single molecule.

HOST: And the idea is that this would only work for therapies that are just intravenously injected?

ANDREW: Anywhere where you can deliver a protein. Intravenous injection is a great way to deliver a protein. There are other ways you could imagine to deliver a protein. Like, what’s an RNA therapy? It’s a means of getting yourself to make the protein.

HOST: But you have to package all this.

ANDREW: Oh, sure. It’s too big to fit in an AAV, for example.

MARIA: It’s actually not that big, because all the parts here are very tiny. So yeah, it’s small, actually.

HOST: And this wouldn’t get to the brain, would it?

MARIA: If you can get a protein to the brain — there are tricks to getting proteins into the brain, potentially. Like people have used these transferrin antibodies where you can get peptides through.

ANDREW: Maybe let’s put this in the category of — I don’t know, we are not specialists in delivery, and that is not, this year, the primary bottleneck. That said, whatever the challenges or opportunities are for delivering protein therapies to the brain, this would have the exact same difficulty/achievability profile.

MARIA: That is true.

ANDREW: Maybe that’s something worth emphasizing. The fact that this is just a protein — it’s not a big chunk of metal, it’s not a nanoparticle.

HOST: Why and the fuel? Which is a small molecule.

ANDREW: So, totally agree. But the fuel is easy mode. Getting a small molecule everywhere in the body — hell yeah. Getting a protein wherever you want it in the body — I respect that. But it’s way easier than getting a nanodiamond, a quantum dot, a microbubble.

HOST: But couldn’t you find — instead of injecting a molecule, why not just try to find some kind of light source that burns a fuel that the body already has?

MARIA: I think what could fall out of the experiments of the *E. coli* and selecting — and what could fall out of looking for endogenous triplets and molecules that change shape based on those reactions — is the fact that you’ll be using things that are already inside your cell.

ANDREW: We don’t currently — you’re making a great point — we looked for this, we talked about this, we looked for a luciferase which could burn an endogenous fuel rather than an exogenous fuel. On the one hand, we would have been hyped if we’d found it. On the other hand, there’s something kind of cool about — for example, some antibody drugs, as I understand it, have a super-long half-life in the body. You inject them, but they live in you for like a month. On the one hand, amazing. On the other hand, do you want to be magnetosensitive for a month? Maybe not. But if you need both — if you need the protein and the fuel — the protein might be annoying to deliver but stay in you for a while. If the fuel, in the fantasy now, if the fuel was a pill you took with a half-life of 30 minutes, you get the — we would love to have an endogenous fuel that we burned, but an exogenous switch with a separate half-life from the protein therapy? It’s also useful.

MARIA: Referring to something that you said on what you would need — one of the advantages of what we’re doing is that it’s control of biology. So the antibody is one way of doing it. But you could imagine, let’s say, activating CAR-Ts at the tumor, where all your engineering happens in a cell. So the cell is engineered to express luciferase and to make luciferin. Then you could do the CAR-T activation once you only on the parts where you have the magnet. I think potentially there are other paths forward.

ANDREW: And as much as what she described represents an enormous amount of work to do, optogenetic CAR T-cells — basically existing tools we don’t have to reinvent. I believe there is an optogenetic CAR T-cell. So the fantasy she was painting largely already exists. It’s less of a fantasy and more of, like, composing two existing things in a difficult, laborious, but intellectually straightforward way.

HOST: If you were to deliver these as a protein therapeutic, you also have to engineer them to be long-lived. So are you now in the domain of, like, semaglutide-style engineering — of fusing fatty-acid chains, putting in unnatural amino acids so they don’t get degraded by peptidases? So it’s not actually just this little chain.

MARIA: It’s very possible. I agree. It’s very possible. I am eager to get to mouse experiments to try to get a sense of where we are in terms of aspects like the pharmacokinetics and efficacy. Even very preliminary experiments will be very, very informative.

ANDREW: The two paths — one where you try to get the systems in the body that make proteins to make this protein. We love that it’s an entirely protein system. There’s no big chunk of metal. There’s nothing you have to inject — you need the small-molecule fuel, of course. But one of the cool advantages of “Oh, if you’re going to make the protein outside of the body” — well, that opens up a whole other toolbox. Like we were talking about — isotope substitution, adding heavy atoms — things that you couldn’t get the body to add to a protein natively in ways that we know about. Oh yeah, if you throw an iodine in there, you can probably get the triplet yield through the roof — things like that. I’m not saying that’s Plan A or a major focus right now, but if it’s exogenously prepared and injected, it makes one set of things easy. If endogenously produced, proteins your own body makes might have an easier time getting the post-translational modifications.

MARIA: And it is true that other luciferases do have substrates that you can ingest — that you can feed to mice in the water, and then they will luminesce for a whole week. There are consumable luminescence substrates. It’s just a different luciferase from the one we’re using, but we made a few so we could switch.

---

First indications and regulatory path

HOST: Realistically, what is the first indication? I know you’re very much in R&D phase, but if you were just to hazard a guess, what is the application that seems most like, “Oh, we can make a really big impact here”?

MARIA: The targets that we have now that we’re making these antibodies for — they’re not necessarily the ones that we will move forward with, but they’re all cancer targets. The reason for that is, it seemed like the most straightforward biology to implement, and there’s just so much cancer research and good models for it that we picked those. It doesn’t mean that they’re the ones that, maybe two years from now, we switch.

ANDREW: Among the reasons they retargeted the antibody was: let’s reassure ourselves that we’re not making a hyper-specific tool for only one problem. Let’s make sure this is a general tool that can be retargeted to the target of choice, and that seemed to work quite well.

HOST: What do you mean by retargeting?

MARIA: The idea is that most of the nanobodies look very similar, and it’s only the regions that contact the antigen — the ones that vary. So these CDRs, as they’re referred to, you can switch them from one antibody to the other.

HOST: Sorry, you mean one target to another?

MARIA: Mm-hmm. So if you have two antibodies and this binds target A and this one binds target B, you can take the surface part of this one and graft it on a second one, and then it will bind that other target. We’ve actually been — I wasn’t expecting this to work so well, but it works.

ANDREW: You have an antibody against protein A. Can we make it instead bind protein B? And the beauty of that is that we only need to make one backbone that is good in the magnetic-field control. And then we hope —

HOST: So you have a universal tool for any protein target?

MARIA: Exactly. But we haven’t done it enough times. We’ve only switched around like three times. We’re like, “Super generalizable.”

ANDREW: You know the way a physicist counts? Zero, one, infinity.

MARIA: Yeah. So we’re at infinity.

HOST: But three is — have you failed yet?

MARIA: No, we haven’t.

HOST: So every time you tried it, it worked.

ANDREW: Perhaps.

MARIA: I made three different nanobodies that bind three different targets, all with exactly the same architecture, except the binding domain is for different targets.

HOST: Exactly. And to convince yourself that this is a thing that might be clinically plausible — what sorts of preclinical experiments would you do? How do you even do them? Do you need to do preclinical experiments that nobody has ever done before, because you’re adding in a magnetic field? What sorts of things are you thinking about?

MARIA: Definitely the experiment that I really want to get to is to have a little mouse — one tumor on the right that expresses an antigen, one tumor on the left. The tumor on the right has a magnet, and the other one doesn’t. We can look at one bound more than the other one. And then compare that to the antibody without the magnetic-field effect, and hopefully we have more where we want it. That’s the experiment in my head that we are working towards.

HOST: And you would also measure the toxicity. Nobody has delivered this fuel, luciferin —

MARIA: Although in this case, in mice, we have. But I totally agree with your point.

HOST: No, but I agree.

MARIA: It’s like, we need way more strict toxicity studies on the mouse, for sure.

HOST: Yeah. Has anybody ever measured the toxicity of luciferin? Or have they not even really thought to do that because, “That’s mice. We’re not using it to make therapies”?

MARIA: I feel like there is — if you look in the literature and you start — there’s data. And depending on how — I’m not sure that there’s a very, very specific “we looked at this,” but there’s evidence for some level of toxicity, and then evidence for “we can do it many times and nothing happens, they seem okay.”

HOST: Are there specific forms of cancer that have, you know, that are treated with nanobodies, where the nanobodies have particularly potent off-target effects — where you’re like, “This seems like a plausible sort of starting point that we would target”?

ANDREW: Before, when we were just dreaming — when Maria just had the discovery and we weren’t thinking about the company yet — we spent a full week just sitting in the office, like, “What’s the best thing we can do?” Some of our colleagues at Calico were part of the creation of this drug Herceptin — that is a pretty good drug for curing certain types of cancer. And among other things, it is believed to damage the heart. The trade-off of, like, “Yeah, good treatment for cancer; take a certain amount of heart damage; worth it in many cases.” That was our first like, “Oh, obviously — wouldn’t it be great if you didn’t damage the heart? Wouldn’t it be great if there was more drug on the tumor, less damage on the heart?” I’m not saying Herceptin is the first thing she ought to do, but it’s definitely the one that left us with, “Okay, we’re onto something. That’s a problem we might be able to solve.”

HOST: And that’s just an antibody, Herceptin?

ANDREW: Antibody, yes.

HOST: Yeah. At least it’s an antibody. But you think you could target the same thing with a nanobody.

MARIA: Yes, actually.

HOST: Which kind of segues nicely into the regulatory question, which is: is Nonfiction Labs planning to develop its own drugs and bring them through trials? Or are you planning to do Phase 1 and then sell the IP? Because nobody has ever done the fuel thing. Even if you took an existing target, you’re using a new architecture for the molecule. There are so many changing variables. How do you think about that?

MARIA: Our idea is to definitely do partnerships. I personally do not want to develop myself the skill of navigating regulatory —

ANDREW: You covered a lot of ground between the quantum mechanics and cancer therapy. You don’t also need to know —

MARIA: So we might change our mind, when we talk to Richard, my co-founder.

ANDREW: I think Richard has more stomach for that work than Maria does. So I think the company has stomach for that type of work. She personally perhaps was not made to navigate a regulatory process.

MARIA: That doesn’t mean I’m not interested. It just means, like, I need someone — I think we need somebody to join the team that cares deeply about this.

ANDREW: To the point you made earlier about, of all the things you could do first, what’s the one you think you ought to do first? We’ve got some thoughts, we’ve got some opinions. But this might be a good time to emphasize — if anybody listening is rich and really wants us to prioritize a particular disease, you could get our full attention and have a tremendous influence on which thing we attack first. Just saying.

And then to the current topic: Genentech, the classic — the grandmother of all biotechs — navigated the question of, “All right, we can get bugs to poop insulin. How shall we turn this into money?” And I think the answer is, I don’t think they did. I think — who was it? — Eli Lilly got —

HOST: Eli Lilly bought it, yeah.

ANDREW: So the first — I think insulin, human growth hormone — they passed the ball to someone else. They got famous off the first couple. And then the next couple, they were in a position to do it themselves rather than pass the ball and watch somebody else run it.

HOST: Yeah, like if you have a lucrative partnership that leads to high royalties or something, it makes you rich enough that you can start doing your own drugs.

MARIA: We can start doing our thing. Yeah, that would be the dream, right.

---

What does the field need?

HOST: What’s needed to build the field?

MARIA: I think the field is getting started. It is a perfect intersection of quantum mechanics, biology, physics, engineering. So multidisciplinary labs, or a lot of people that want to work together. What I’m hoping is that there will be a very nice community around this that will be very open, with representations from many fields.

ANDREW: Something we didn’t see coming, but we have absolutely found delightful: nobody in these conversations knows everything. Nobody gets to be gatekeeping. Nobody gets to be smug and superior. Do you know quantum mechanics? Do you know immunology? Tell me what cancer target we should pick first. Tell me about the radical-pair mechanism. Oh — that’s right, there are limits to your knowledge. Perhaps all of us are worthy of respect and a voice and a place in the discussion. See, I’m a physicist — I know a thing or two. Not about immunology, which has maybe been made clear over this time together. Maria is a biochemist — quantum is not central to her training. She’s not incapable of it, but —

MARIA: And that’s not to say the field has been going on for a long time, but I think you will get a big push with big signals now.

ANDREW: Most of the folks in the field so far — because virtually everyone in this field, we’re watching it be born, and we were lucky enough that, because of the beautiful things Maria’s engineered, we’re kind of at the center of it — so everybody wants to join the field basically asks her first, “Hey, send me some protein.” She sends them protein, so we get to know them all. They’re all really cool, all well-behaved, bright, hardworking, motivated. So far — some fields are cursed with an established old person that believes a false thing, that if you contradict them, you’re not getting funded. We don’t have that yet.

HOST: That will be you in 20 years.

ANDREW: Yeah, yeah, yeah. Looking forward to it.

HOST: You should work on your beard.

ANDREW: I will be the gray-bearded gatekeeper. “I’ve heard that radical-pair hypothesis before.”

We watched this in the aging field. There were certain reasonable scientific questions we’d ask that we’d watch our senior colleagues — who were veterans of the aging field — kind of their hackles would go up, and we’d realize, “Ah, it doesn’t matter if I’m right or not. I need to shut up.” We don’t have that yet.

So what does a field need? It needs visionaries — delusional, motivated lunatics. They’re going to try their asses off. It needs funders. Right now the funders are the typical funding sources. There’s some funding sources in the Bay that like to fund crazy new visionary ideas that could change the world. There are branches of the government that like to fund — you talk to DARPA — they like to pump a lot of money into a crazy thing that could be huge but might not work. Thank gosh that exists. Getting, for example, NIH funding for this — I think it’s quite possible, but maybe after a couple more *Nature* papers. What does the field need to come into existence?

MARIA: Honestly, like — Andrew will give protein to anybody, give advice, input to anybody. I feel like the big contribution we did to the field was to come in with protein engineering. So let’s see where that goes. More engineering — we would love to have more colleagues who are interested in doing protein engineering, because it’s not just you, but it’s not five other people. Whereas there’s, I want to say, probably hundreds of other labs that are interested in the science, the mechanism, the questions raised by magnetosensation. That’s awesome. Let’s also do some more engineering.

HOST: And it is kind of surprising that the total amount of funding, as far as I understand, is like a few million dollars for all of magnetogenetics, right? Whereas Sonogenetics now has a $40 million ARPA-H program. Mikhail just raised $250 million, I believe — Merge Labs, right? People are using focused ultrasound to control biology at huge scales now. But magnetogenetics is still like $5 million total, maybe?

ANDREW: Maria’s been printing literal miracles. Literal miracles. Back-to-back-to-back *Nature* papers, and things that were confidently predicted to be thermodynamically impossible as recently as like two years ago. She’s been doing that for a burn rate of $100,000 a month.

HOST: Is that true?

ANDREW: It is comical how little money produces how much in her hands. I’m not talking shit on sonogenetics, but we could do a lot with $250 million. Holy crap. You know what? $2 million? It’s not that expensive. Protein engineering is cheap as hell. A few smart, motivated undergrads and some training. The instrument I built that she’s done all this protein engineering with — $20,000.

MARIA: I am excited about sonogenetics. I think — give us a little bit of time and we can catch up, and then we can both —

ANDREW: I think in some categories there’s catching up to do. I think in other categories you’re already ahead. Do they have an enzyme? You know what I mean? I think we have the potential to do better.

MARIA: Yeah.

HOST: Do they have an antibody?

ANDREW: Well, Maria and Andrew, thank you so much.

MARIA: Thank you.

ANDREW: Goddamn — my head. That was fun.

---

Whiteboard lecture

ANDREW: So we just told a lot of stories. We referred to a lot of characters in the story. I kind of want to sketch out the characters in the story. So I’m going to show a cartoon — a very simplified cartoon — of the photo cycle of the LOV domain. And you’ll help me get it right, because you’re the biochemist; I’m just a physicist.

I like to think of the LOV domain as a little Pac-Man. It is capable of changing shape when you shine light on it. So if I shine light on the LOV domain, it can absorb the light, and the most common thing for a fluorescent protein to do when you shine light on it is, it absorbs the light and it goes into the excited singlet state. That is an enormous amount of energy for the molecule to be holding, and it tends to only hold onto that energy for a few nanoseconds. So the singlet lasts for a few nanoseconds.

The most common thing for the singlet to do is relax back down to the ground state. So the energy flies away — the photon comes in and another photon goes out. And that is fluorescence. Energy goes in, hangs out for a nanosecond, energy comes out. Fluorescence — the singlet state.

HOST: And when that energy comes out, it’s a different wavelength, a different energy level than the photon that entered, right?

ANDREW: Exactly right.

HOST: Where does that extra energy go?

ANDREW: The Stokes shift — the jargony name for the fact he’s referring to. When you go into the excited singlet state, you start in a state which is neither electronically nor vibrationally excited. That’s jargony, but it’s universal jargon at least. You go to a state which has an electronic excitation and a vibrational excitation. A vibrational excitation is like the atoms in the molecule are wiggling around. That vibrational excitation tends to last for picoseconds. So you lose a little bit of energy. Exactly as you said: the photon that comes out is not the same color as the photon that comes in. It is a lower-energy photon. For photons, lower energy means redder.

So in the case of the LOV domain, bluish light comes in — 450? — 450-nanometer light? What’s the absorption peak?

MARIA: 450.

ANDREW: And the light that comes out is greenish, 510, 515 for the LOV domain?

MARIA: Great.

ANDREW: And 510, 515 — the units there are nanometers. That’s the color of the light.

However, there is another thing that can happen to a singlet. Your photon comes in, you make an excited state, but instead of lasting for a few nanoseconds, it converts into a triplet. The lifetime of a triplet for the LOV domain is tens of microseconds. There are a thousand nanoseconds in a microsecond, right? This is staggeringly longer-lived than the singlet. And depending on which protein we’re talking about, the lifetime of the triplet state can actually be millions of nanoseconds. So triplet states are super-long-lived compared to singlet states. Still really short on a human-perceptible timescale, but damn long on a fluorescence timescale.

So, apparently, according to the literature, in the LOV domain the yield of triplets is really high. I’m used to triplet yields in coral fluorescent proteins or jellyfish fluorescent proteins that are like 1%. In the LOV domain, there have been literature claims on the order of like 50%, 60%. This is accurate as you recall it? I’m not saying that’s true. I’m saying I’ve read it, and I have no reason to disbelieve it.

Anyway, neither this process nor this process, to our knowledge, has any dependence on the magnetic field. However, one of the things the triplet can do is participate in charge transfer. And in the case of the LOV domain, this charge transfer would be an electron literally moving from one location in the protein to another location in the protein. If you want to know which locations, don’t ask a physicist, but —

MARIA: Ask a biochemist. It’s very likely a lot like what happens in cryptochromes — taking electrons from a tryptophan close by. So the flavin becomes very electron-withdrawing —

ANDREW: And the key point here is that the lifetime of the charge-transferred state is potentially enormous — on the scale of seconds or even minutes, as opposed to the scale of nanoseconds or microseconds.

HOST: What do you mean by charge transfer? It means that the electron has already moved away from the triplet towards another thing, and then that other thing just sits in that charge state for one to a hundred seconds?

MARIA: Yeah. So in a sense, we think that if it’s like cryptochromes, the electron comes from a tryptophan to the flavin.

ANDREW: So next up — this charge transfer, and I’m actually conflating two things a little bit. Associated with the charge transfer, as I understand it, is a bond. And so following this charge transfer, there can be the breakage of the bond. Is that accurate?

MARIA: In the normal photo cycle, yes — right away. It reacts with the cysteine.

ANDREW: Maybe I shouldn’t focus so much on the lifetime of the charge transfer. I should say the charge transfer leads pretty directly to the breakage of a bond.

MARIA: Yes.

ANDREW: And the breakage of the bond leads to: Pac-Man opens his mouth.

MARIA: Yeah. Pretty much, you make triplet — and this can only happen when it goes to the triplet state, because the way the electrons move breaks a bond that opens up the protein. Which is really nice, because it makes the triplet state the thing that connects the conformational change to potential discharge transfer and magnetic-field control. So I think it’s possible to bring back the conformational change and put it under control of the magnetic field, if I can make that system less reactive — which would be awesome.

ANDREW: It would. So I’m conflating these two things a little bit. I’m not saying I know the lifetime of this separately from the lifetime of this. What we actually observe is: if you look at fluorescence, fluorescence is cycling between the ground state and the singlet state and getting photons. Every time you end up in one of these states, you are now — in the case of the LOV domain — dimmer. You do not participate as well. I don’t know if it’s that you stop participating entirely, or if you just don’t participate as well — but basically you’re not as good at going into the singlet state, relaxing back to the ground state, and giving a photon. So what we actually observe is, the fluorophore just gets dimmer over time. And you might think, “Oh, you’re destroying it.” But the cool part is, if you turn the light off and walk away and come back later, it’ll recover back to the original state.

So we believe that over time, something happens that — I think involves oxygen, we were recently taught — but anyway, something happens where, yeah, Pac-Man will close his mouth, the bond will reform, the protein is there for you to do it again.

That’s the photo cycle of the LOV domain. As we understand it, this step — the charge-transfer step, upstream of the super-long-lived state that has super-different properties than the default state — this is the thing that depends on whether or not the magnetic field is on. That is often attributed to the radical-pair mechanism, which I’m not an authority on. I have no reason to disbelieve. We have produced no data that supports that model. We’ve produced no data that contradicts that model. What we’ve seen is, it seems like this step depends on the magnetic field.

Among the evidence for that — this scientist Jonathan Woodward, that Maria referred to earlier — he’s shown, I believe, that in systems like this (not the LOV domain necessarily, although I think he might study that in the future, we hope), if you turn the magnetic field on during the triplet lifetime, you affect whether or not charge transfer occurs. If you wait more than a few triplet lifetimes for the triplets to go away, and you turn the magnetic field on, it doesn’t do anything. That’s our basis for belief that it’s this step — whether or not the triplet participates in charge transfer — that the magnetic field affects.

And then in Maria’s case, in the LOV domain, this step is magnetosensitive, but you really gotta squint to see it. Holy cow, is it a small effect. But what Maria engineered is probably some combination of how much the magnetic field affects the probability of the charge transfer, and what the consequences of the charge transfer are — how observable the impact is. What she selected for was: you see the brightness wiggle up and down as you turn the magnetic field on and off. She selected for that being bigger. It worked like a stinking charm.

And then every other magnetoresponsive system that we’ve ever engineered — we haven’t proven, dissected the mechanism in any of them, but we strongly suspect the mechanism in all of them is this charge-transfer step.

In addition to this internal charge transfer within the LOV domain, we have evidence — for example, when she fused the LOV domain to a red fluorescent protein, we see the brightness of the red protein remembers what you did to the LOV domain. We think that this charge transfer, instead of being internal to the LOV domain, it can also hop to something nearby. So if there was a red fluorescent protein sitting over here, you can transfer charge over to the red fluorescent protein, and it’ll remember you did it for like a minute. So that’s an example of how you can control the function of a partner protein by manipulating the charge transfer from this protein.

HOST: And what is the reason you think that the deletion of a cysteine changes this? What is happening to this sort of test of characters when you get rid of the cysteine?

MARIA: So, yeah, so if you have the cysteine, the process of reaction — once you get to the triplet state, the reaction with the cysteine is a very fast process. So it has no chance —

ANDREW: That’s the charge transfer.

MARIA: Yeah. So the triplet has no chance to sit there and take electrons from the tryptophan.

ANDREW: Actually — you’ve done some work, correct me if I’m wrong, where you’ve just deleted the entire jaw of the Pac-Man and given it no possible way for this bond to form or exist. And yet this charge-donation process — you hook it to another protein, look how the function of this protein changes based on the illumination — this protein still works. Is that accurate, for example, in the case of luminescence?

MARIA: It’s true. Yeah. So we don’t need the jaw — the part that’s doing the conformation.

ANDREW: That’s a nice example of, you could call it, basic science that emerged from a very mechanism-agnostic engineering effort. Delete the whole goddamn thing — goddamn, that makes it clear how important he was to the process.

So is there more important things we should get to? These are really the characters in the story: singlet, triplets, charge transfer, conformational change — with conformational change perhaps a red herring, because, you know what I mean, whether or not the jaw opens might not matter if you can just delete the jaw.

HOST: Is the cysteine where — just to walk me through it — the triplet — what amino acid catches the triplet? Is that the cysteine?

MARIA: Yes.

HOST: And then the cysteine dumps it to another amino acid, and that’s the charge transfer.

MARIA: So, no — so the triplet can take electrons from the cysteine. I associate the triplet with the flavin itself. Would you agree?

ANDREW: Yes.

MARIA: So in a sense, physically what it looks like is, the flavin becomes covalently bonded to that cysteine for a period of a controllable amount of time, depending on the mutation. But that reaction with the cysteine is super fast. So if you want that triplet to take electrons from something else, that cysteine is there and it’s very happy to interact with it. So you need to make that cysteine less reactive.

HOST: So when you delete the cysteine, the triplet time increases from 10 microseconds to something higher?

MARIA: Yes, actually, it would. That makes sense.

ANDREW: I don’t think we observed that. That makes total sense.

MARIA: But it happens. I agree.

HOST: How do you know?

MARIA: I think there is transient — yes.

ANDREW: Okay, you’re saying literature.

MARIA: We would expect it. It’s a good question. But there are literature observations. We didn’t measure that ourselves. And it’s actually so fast that — I think at least one paper that I remember, it’s almost impossible to measure that radical. It exists for such a short period of time. So otherwise — the moment, yeah, it just happens too fast.

Otherwise, it sits there.

ANDREW: And then — I guess I should probably draw the arrow. If there was some target protein over here where you wanted to turn his function on or off — this charge transfer, which I’ve kind of drawn as some internal charge transfer — if it was an external charge transfer (giving or taking an electron from the protein you’re attached to) — yeah, the flavin wants an electron. It’s going to take it from the cysteine if it’s close by, real quick, because it’s very well-positioned for it. If it’s not there, then it’s going to take it from something else, and that is a longer process.

How to Weigh a Cell

Niko McCarty — Wed, 27 May 2026 16:13:56 GMT

The next chapter in my book, “Biology is a Burrito & Other Essays,” is now live! It is highly interactive, so please check out the article on my website: burrito.bio. If you send me feedback that improves the essay, I will credit you in the article and in the printed books. Here is a quick preview:

Microbes are small. Tens of thousands of them fit in the period at the end of this sentence. And yet, despite their tinyness, it is possible to weigh individual microbes with remarkable precision.

A single yeast cell weighs about 100 picograms. An E. coli bacterium weighs 0.55 picograms, or 100 million times less than a grain of sand. With masses so small, weighing a single cell seems an impossible task. After all, a normal kitchen scale resolves down to 0.1 grams, whereas an E. coli weighs 100 billion times less than that. Weighing a cell, then, demands eleven orders of magnitude more precision than the scale in a typical pantry can provide.

Over the last few decades, scientists have created wondrous devices to weigh individual cells with femtogram precision. Before those devices existed, though, scientists instead made do with whatever was lying around the lab; often just microscopes, centrifuges, and scraps of paper.

Read the rest of this essay — and play with the interactives — at burrito.bio ➔

What’s the Point of Theory in Biology?

Niko McCarty — Tue, 26 May 2026 15:27:45 GMT

The letters below were written by Noah Olsman, a scientist in the Paulsson laboratory at Harvard. They were not intended to be published, and many details could be sharpened. I’m publishing them here with Noah’s permission, in the hopes they’ll be a starting point for more discussion. Please email noah.olsman@gmail.com with feedback.

A sketch from Darwin’s notebooks, 1837, showing what may be the first phylogenetic tree.

Letter #1

What role does theory play in the sciences? I think there’s a lot more nuance to this question than most people think, and the answer has changed in the last century.

There is a widespread, unstated assumption — often taught in textbooks — that theory is the end goal of science. Experiments are used to build up a theory, and a theory passes its test when it correctly predicts experimental results. Experimentation, in this frame, plays a subservient role to theory. This is useful pedagogically, but it is worth taking theory off that pedestal for a moment and treating it as just another tool for scientific reasoning.

From this perspective, theory’s role in science falls into three buckets: explanation, interpolation, and extrapolation.1 Explanation means that theory can articulate or explain why a model behaves as it does by deriving its general properties (e.g., when is this system stable? are there conserved quantities?). Interpolation is the marriage of theory and modeling, allowing us to try to find parsimonious frameworks that connect many data points in some sort of unified way (e.g., Einstein unifying inertial and gravitational mass, Noether’s theorem linking conservation laws to symmetry). And extrapolation is the big payoff; theory making predictions about things that have not yet happened and do not exist in current data.

Extrapolation has long been used to justify theory, and theoretical ideas have produced many conceptual breakthroughs. But today, a lot of theory exists in a bubble, with only marginal impacts on the broader science and engineering ecosystem.2 I suspect this shift began in the mid-20th century, when computers started competing with theory at extrapolation: the atomic bomb was built from theory, but the hydrogen bomb was too complex and relied heavily on simulation.3 Before that, theory was self-justifying; it was the only tool available for making predictions, and because data was limited and expensive, models had to be simple enough for theoretical analysis anyway. As computers took over, people began to ask whether theoretical abstractions were intrinsically worthwhile when simulation could be more precise.

If we fast forward to today, I think machine learning changes the paradigm again. If we can collect massive amounts of data, and our best models of that data are giant, uninterpretable statistical models, where does theory fit in? You could argue that theory will still offer deeper insight into reality, but here is where I get to our world of biology, and I think the history of systems biology is a good case study.

When systems biology began as a field, the pitch was that we could finally put together data-driven models of biological processes, and that within these models we would uncover simple and universal design principles for biological systems.4 But look at what has happened over the last two decades in the field! This vision has mostly been put out to pasture. Many big names in the field from the 2000s and 2010s have pivoted to method development, and the rational actors realized that the nature of data in the field was just too coarse-grained to generate first-principles models, and therefore shaped their research efforts around solving those problems.

Up until recently, there was sort of an “if you build it, they will come” approach to data generation. If we generated enough data, then we could go back to the simple theoretical models and do what we imagined in the early days of the field. But now, the game has changed and the drive is towards statistical models. Even many of the most prominent theorists from the early days of systems biology have voted with their feet. Many top quantitative biology departments have moved away from theory in favor of methods-oriented research.

It’s time to re-evaluate the role of theory in science and engineering. Within biology, we need to do a lot of soul-searching about what theory is actually contributing to the field. I say this as someone who cares a lot about theory, and who structured his work around theory! I don’t think theory is dead in the water, but I think theorists need to think hard about what they are actually contributing to their respective fields.5

Response from Niko

Noah, do you reckon there are things in biology we can only get from classical theory, and not from large statistical models?

One useful framework for thinking about this question might be the virtual cell. There are basically two approaches to build one; top-down or bottom-up.

The bottom-up or “mechanistic” approach, perhaps best embodied by Markus Covert at Stanford, aims to build a whole-cell model from equations and first principles. It feels more elegant than top-down efforts because it builds up an interpretable “knowledge base.” Basically, the model surfaces predictions, experiments reveal discrepancies, and each mismatch tells you where your understanding is wrong and which experiment might be useful to fill the gap. This is one way to grow knowledge.

It also seems like there are things we can uniquely learn from mechanistic models, and not from purely statistical ones. Markus Covert once told me the story about Neptune’s discovery, which came via discrepancies in a data set. Astronomers had a Newtonian model of the known planets, noticed that Uranus’s orbit didn’t behave as predicted, and figured out the perturbations were caused by gravity exerted by a then-unknown planet — Neptune. Covert’s point was that you could take the same data today and train a statistical model to predict planetary motion, but that model would be unlikely to take a conceptual leap of this nature and infer the existence of a missing planet. There are some discoveries that seem to require human reasoning, operating on a mechanistic framework (for now.)6

Every time I sit down to write about theory in biology, and try to formulate my criticisms of statistical models, I worry readers will simply say, “Well, we can build sparse autoencoders to interpret the predictions of otherwise opaque models.”7 And maybe they’re right. But then I think about the Neptune example, and I’m not sure interpretability tools actually solve the deeper problem.

My worry with purely data-driven models is that we lose something special that is inherent in science itself. If we’re only probing a model in the context of making useful predictions or finding a cure for some disease, will we actually be asking the right questions to understand the fundamental nature of a cell? Will we even know how to wield these sparse autoencoders, or which questions to ask of them, if we only ever approach this problem from the top-down?

Letter #2

I agree that one of the great things theory provides is a way to formalize beliefs about a system, make predictions, and then see when data does or doesn’t match those predictions. But in my mind, the problem is that we can only confidently identify those discrepancies in two cases.

First, we can use theory to try and prove impossibility results. For example, we could argue that any system containing XYZ interacting components can never achieve some behavior, such as oscillations or a certain level of noise. The advantage of this theory is that you don’t need a detailed model of the whole system, just knowledge about one particular subsystem.

This paper gives a fourth root scaling on how much feedback systems can suppress noise. So to reduce noise by a factor of 10, you need 10,000x the rate of signaling. The limitation of this theory is that it is never really constructive. At best, it tells you when a given system pushes up against a fundamental limit. If you do exceed the limit, then (assuming your theoretical results are correct), it means you are wrong about the constrained part of the system.

Second, we can try to explicitly model the system. This, of course, gives far more precise results, but the validity of those results is strongly tied to the correctness of both the structure of the model and its parameters. I think this is where we get into dangerous waters with bottom-up cell models. If you could explicitly model the whole cell, you would need to be extraordinarily confident in all of the hundreds or thousands of equations that go into the model before you rigorously analyze it to compare data to theory.

This is why I am, frankly, skeptical of the whole endeavor. Sure, you can do it, and maybe you can show your whole-cell model performs well in some mean-square error sense, but if you want to do anything else with the model, you either need to be really certain that the model is correct, or have a strong foundational understanding of which parameters most strongly affect a given prediction. This gets even more sketchy when the output you are observing is an indirect proxy of what you actually care about.8

If we want to start actually modeling the cell, I think we need to start from simple processes and do the difficult work of really nailing down how accurately we can measure parameters, predict new data, perturb the experiments, and see if our predictions are still accurate. One counter-intuitive thing is that such an effort will actually push us away from purely mechanistic models, and towards more phenomenological ones.

I say this because even when we know the mechanisms of a system, if we work through what our reporters can actually tell us, we often find that certain parameter combinations are degenerate and can’t be uniquely identified from any experiment. As a simple example, if all you have is steady-state gene expression, you cannot infer production/degradation rates uniquely, as the steady state is a function of the ratio k_p/k_d. Rather than trying to just throw all of our biological knowledge into the model, we actually need to be thoughtful about how to simplify our model such that its parameters are identifiable from a given set of experiments.9

Here’s a thought experiment I like. Imagine someone invited you on a brand new aircraft, and when you asked about safety they told you they had placed sensors measuring every component of the aircraft and fed it into a giant model, and the model said it was safe to fly. Would you get in? Probably not.

Now say they fed a mechanistic model of the whole plane into a supercomputer. Would you get in? My answer is still probably no, just because trusting that simulation requires an enormous amount of faith in the modeling assumptions. The reality is that we build up complex systems by validating models of various subsystems, integrating them, testing those integrations, etc. It isn’t some grand unified framework, but it is the pathway to making predictions that are reliable.

Maybe the state-of-the-art cell models have done this, but my sense is that the literature is still a lot more of a Frankensteinian assemblage of data and assumptions. It produces something, but not an airplane you’d board willingly.10

Letter #3

I had a few follow-up thoughts that I figured I’d send along before they diffuse away. I was listening to a (very niche) podcast by a control theorist where he covers the history of one of the central results in the field, the Nyquist stability criterion. Without too much detail, this is a result that is taught in every intro control theory class. It was one of the first practical methods that allowed engineers to predict when a feedback system would be stable or unstable. While there were many 19th century results that allow you to prove stability of a given set of differential equations, they all relied on having a parameterized model of your system a priori. The beauty of Nyquist is that, while it can be used to prove the stability of a system given a model, it can also do so purely from experimental data.11

I bring this up because it’s a nice historical example of how theory actually enters a discipline. A field discovers some new phenomenon that is surprisingly useful (in this case feedback control of an electrical system), builds increasingly complex technology with it, and coasts on intuition until they run into problems that can’t be fixed by trial and error. This is when, if we are lucky, theory can jump in and play a big role. In Nyquist’s case, the telephone company was discovering instabilities in long-distance networks coming from their feedback amplifiers, but had no systematic way to fix them. Nyquist figured out how to use the existing experimental data to solve the core problem, and his work laid the foundation for control theory as a rigorous discipline. You could tell a structurally identical story about Shannon and information theory, or Maxwell and electromagnetism.12

I’ll try not to go on too long, but I think there are three idiosyncratic but interesting juxtapositions to think about today, namely economics, machine learning, and biology. This triad forms a sort of goldilocks story of theory.

On one end, we have economics, where theory outpaced data for decades. The theory was beautiful and mathematically precise, but as data collection has improved, it turns out to be a pretty poor predictor, and the field has shifted to be more empirical.

At the other end of the spectrum, biology has remained staunchly empirical for decades, and resisted comprehensive theoretical treatment. This has left us with an explosion of experimental techniques, but general skepticism around the role of theory. It’s almost as if getting the theory right is so hard that we have largely decided it is easier to just hammer away on the experiments.

And then we have machine learning, where theory and experiments have entered this wildly virtuous flywheel, such that every marginal conceptual advancement immediately seems to produce practical improvements and drive new engineering work. Good researchers are worth astronomical sums of money just to keep the flywheel spinning. It’s probably at least somewhat a bubble, but the labor market tells us something about how productive researchers can be in the ideal circumstances.

Maybe biology will never hit that inflection point, but I suspect if it does it will have to go through the same sort of growing pains as these other success stories. By that, I mean we should acknowledge that the field will have to go through its trajectory. There probably isn’t a shortcut, but if we are aware of the template, then we can try to accelerate progress. Maybe this is well-trod ground in metascience, but I think there is something practical to learn from this kind of historical dissection of other fields’ success, in particular with an eye towards looking at what we can map onto fields that have not yet hit that point.

There needs to be some kind of serious intellectual investigation of what we are trying to accomplish in biology now. While there’s huge and well-earned excitement about the role AI will play in the future of biology, I worry we underestimate how much foundational work is still needed before those tools can deliver on their promise. It has been a bit sad seeing so many faculty staple AI onto their own work to appeal to funders, without much serious engagement with the bigger picture. I’m reminded of the opening line of Kurt Vonnegut’s God Bless You, Mr. Rosewater, which reads, “A sum of money is a leading character in this tale about people, just as a sum of honey might properly be a leading character in a tale about bees.”

This is a simplification, but note that theory doesn’t really operate on data directly, but rather on models. One major failing of science curricula is the complete absence of serious discussion of what goes into creating mathematical models. In every course I’ve ever taken, models are sort of just handed to you. You see how models fit the data, but they are just sort of a set of equations or formalisms pulled from the aether. We often conflate modeling and theory, but really theory operates on a model. As an aside, I think this is a partial answer to your question about theory in biology: if your models aren’t good, theory doesn’t have much raw material to work with.

In control theory, there is an oft-quoted statistic that >95% of all controllers in industry are PID controllers, despite the fact that PID is over 100 years old and there are many more sophisticated techniques out there now. This was always brought up as a sort of humorous self-deprecating anecdote, but it highlights the gap between the idealized version of theory and how the work of theorists plays out in the real world. Theory work stays afloat because there is enough useful practical output that we keep it going.

Von Neumann and his wife did the programming for this. It is the origin of the term “Monte Carlo simulation.”

See Uri Alon’s work from the early 2000s.

One positive example of this is control theorists starting the Learning for Dynamics and Controlconference, whose sole purpose is to get control theorists in the same room as ML people to figure out how the two fields could interact productively.

Alvin Djajadikerta has written on similar ideas for Asimov Press. See his essay on “AI for Disruptive Science.”

Adam Green at Markov.bio has written a lot about this. See “Through a Glass Darkly: Mechanistic Interpretability as the Bridge to End-to-End Biology.”

The thing we use most commonly as an output today is RNA-seq, and I think we (the quantitative biology community) have invested a lot in the idea that, because we can do RNA-seq in a fast and cost-effective way, it is also able to give us reliable information about cell state. This is probably broadly true, but consider this thought experiment: imagine sometime in the future where we have perfected single-cell proteomics, and can get a readout of every protein’s state in the cell. How much would we trust RNA-seq as an assay when that exists? My guess is that transcriptomics would seem really coarse-grained and primitive in comparison. Right now, RNA-seq is the best we have, so we sort of have to trust it, but when the next thing comes along we will wonder why we ever put so much stock into it (as happened to microarrays, qPCR, and many other methods).

This sort of modeling, which sits between mechanism and whole-system simulation, sometimes gets a bad rap, since physicists tend to propose models and derive theoretical properties without taking the next step of mapping to data. The middle ground worth pursuing (and this is my bias, based on my work) is models that are both simple enough to be amenable to theory and rigorously justified by data.

I think it is telling that, for all the success of scaling laws and big general models, the first safety-critical ML systems (self-driving cars) were developed incrementally. Maybe that will change, but I think we’re still gonna need old-school engineering to understand complex systems well enough to engineer them, be they cars or CAR-T cells.

The basic idea is that you do a standardized experiment on an open-loop input/output system (say a vacuum tube amplifier), where you input sinusoids with a constant amplitude and increasing frequency. If the system is linear, the output will also be a sinusoid of the same frequency, but not necessarily with the same amplitude and phase. From this data, you can graph the relationship between the input signal’s frequency and the output signal’s amplitude and phase. This pair of plots is called a Bode plot. The question is, can you predict the stability of the closed-loop system purely from the open-loop characterization? What Nyquist realized is that you could derive a comprehensive theory based on fundamental mathematical properties of dynamical systems. This led to what is called the Nyquist stability criterion, which at a basic level says you can take those phase and magnitude plots, combine them into a trajectory in the complex plane, and count how many times that trajectory encircles the point −1. That count of encirclements maps directly onto whether or not the system is stable. If this discussion was too abstract, here is a little interactive tutorial built with Claude.

If you haven’t read The Making of the Atomic Bomb, it is phenomenal and the first half of the book is all about this scientific history.

Even “Boring” Ideas are Surprisingly Deep

Niko McCarty — Wed, 20 May 2026 17:14:02 GMT

Photo by me.

Many ideas that seem “simple” on their surface are surprisingly deep. I suspect this is true in every scientific field, and it’s certainly true in biology.

Polymerase chain reaction, or PCR, is one example. PCR wasn’t mundane when scientists invented it in the 1970s, but even high schoolers learn it today. The technique seems simple, in part, because teachers present old methods matter-of-factly.

Maybe you learned about it like this: heat a DNA sample to around 95°C to “melt” the two strands apart. When the temperature cools, small DNA snippets called primers attach to the freed strands. Next, heat the sample again — this time to about 70°C — so an enzyme called DNA polymerase can latch onto each DNA strand and copy it. When it’s done, heat the sample back to 95°C and restart the cycle. The key thing to remember is that heat does the work of separating the strands, and the rest of PCR follows.

In a recent essay, I made the grandiose claim that PCR is a “near-optimal” technology. My core argument was that DNA polymerases are already fast, so any further speed improvements would come from the temperature cycling steps. If we could instantaneously change the temperature of DNA samples (using lasers, for example), we could shave about 20 percent off the time needed to do PCR. This is only a modest improvement, though; hence my claim that PCR is “near-optimal.”1

OH BOY WAS I WRONG!

After the essay went out, Momčilo Gavrilov — a biophysicist at Johns Hopkins University and co-founder of SHARP Diagnostics — reached out and told me he’d invented a way to do PCR at a single, fixed temperature. Instead of using heat to break apart the two DNA strands, Gavrilov uses a helicase enzyme that physically separates them, letting primers bind and polymerase copy the DNA. You just mix the ingredients and incubate them at a single temperature between 37°C and 65°C.

Gavrilov’s method seems to be a major improvement over standard PCR for three reasons. First, it works in existing thermocyclers, or in tiny devices that you could carry in a pocket. Second, isothermal PCR can be much faster; without the temperature shifts, amplification runs continuously, with no pauses between steps. And finally (this is the most important thing), ditching the high temperatures means you can drastically expand the amount of “chemistry” you can do during PCR. (For example, by ditching the high heat, biologists can now use a wider palette of polymerases for PCR. They are no longer limited to only heat-stable options.)

In short, I no longer believe that PCR was, in fact, a near-optimal technology. Even methods that seem “solved” or simple can, upon deeper research, turn out to be surprisingly deep.

II.

Isothermal PCR isn’t a new idea.

In his 1987 PCR patent, Kary Mullis suggested using an enzyme to mechanically separate DNA strands, instead of heat.2 He speculated that helicase, a motor protein that unwinds DNA during genome replication, might work. At the time, though, nobody had yet discovered a helicase that could pull apart DNA strands with blunt ends (where both strands “line up” at the same point); the enzymes needed a “loose” strand of DNA to grip onto.

In 2004, New England Biolabs began shipping kits for helicase-dependent amplification, or HDA, an isothermal method built around Mullis’s original idea. The problem with HDA, though, is that the helicase was too weak. It would open up DNA, unwind the first 150 bases of DNA, and then fall off. The polymerase enzyme would begin copying this “opened” DNA, but couldn’t proceed any further than the helicase. These limits meant that HDA could only amplify small snippets of DNA. BioHelix, a spinout from New England Biolabs, commercialized the method and used it to make two FDA-approved diagnostics, including for herpes simplex virus.3

The “weak helicase” problem was later solved by scientists at Johns Hopkins. In 2015, Taekjip Ha’s lab described a “superhelicase” they had engineered. The enzyme was locked into its “unwinding” configuration, such that it couldn’t slip off the DNA. This superhelicase could open up DNA strands stretching 6,000 bases or more, but required a single-stranded DNA overhang to initiate that unwinding. It didn’t work on blunt end DNA!

A few years later, in 2022, Gavrilov engineered PcrA M6, a helicase from a “heat-loving” microbe that can open up blunt-ended DNA. This helicase — like the original superhelicase — was locked into its unwinding conformation, such that it could be used on long stretches of DNA. The helicase opens about 150 base pairs of DNA per second, but it’s likely possible to engineer a helicase that moves much faster.

Now, putting everything together, the isothermal PCR works like this:

First, mix together the DNA sequence (to be amplified) with the PcrA M6 helicase, Bst polymerase, primers, nucleotides, and SSB. Bst is a polymerase from a bacterium called Bacillus stearothermophilus. It works at moderate temperatures and, importantly, can separate paired DNA strands as it copies them, a property called “strand-displacement.” SSB, or single-strand DNA-binding protein, is a small protein that grabs onto single strands of DNA and keeps them from re-pairing. SSB is also what allows PcrA M6 to grab onto blunt-ended DNA in the first place.

Once everything is mixed together, the reaction starts. The helicase pries open a short stretch of the DNA, SSB coats the freed strands to keep them apart, primers diffuse in and bind to their target sites, and the polymerase starts copying.

But the polymerase, Bst, moves much faster than the helicase, PcrA M6. When the polymerase catches up to the helicase, it knocks it off the DNA and keeps going on its own, separating paired bases as it copies them. The displaced helicase is then free to bind to a fresh copy of DNA. This cycle repeats continuously.4

Gavrilov says his technique can amplify DNA fragments up to 6,000 bases long, or can double the amount of a 200-base fragment of DNA in under a minute. The reaction works anywhere from 30 to 65°C with standard PCR primers. In other words, it is a drop-in replacement for normal PCR.

Ahis is important! In my prior essay, I argued that many biologists do not adopt newer methods (like photonic PCR) because the switching cost is too high. Buying new machines and troubleshooting protocols is a lot of work, and the payoff is usually minor. Why bother, then, if an existing setup already works? Gavrilov’s method, though, can run the same type of reaction in the same thermocyclers that scientists already have, using the same primers. The switching cost is low, which makes adoption much easier.

III.

A “true” isothermal PCR method now exists, one that works on long DNA sequences. What can we do with it?

First, we can run PCR with much better polymerases. Biologists have historically used Taq (or engineered heat-stable polymerases, like Phusion) mostly because they survive at 95°C. But these are not the best or fastest polymerases available! The 95°C step basically acts as an artificial filter that limits PCR only to heat-tolerant polymerases, rather than allowing biologists to work with the millions of polymerases that only work at lower temperatures. The native E. coliDNA polymerase copies about 1,000 nucleotides per second and is highly accurate. We can presumably find polymerases that move faster still, and engineer them to do strand-displacement if needed. At high speeds, it should be possible to make billions of copies of even a long DNA sequence (3,000+ bases) in just a few minutes.

Second, we can expand the “chemical space” of PCR. Without the 95°C step, it’s trivial to add enzymes to the tube that would normally be destroyed by high heat.

Consider DNA methyltransferase, an enzyme that tags cytosines in DNA with methyl groups, which determines which genes get switched on or off. Methyltransferases are destroyed at high temperatures, which means that PCR can copy methylated DNA but doesn’t preserve that methylation as the DNA is copied. The amplified product comes out “naked,” in other words, without methyl groups. With isothermal PCR, you can include methyltransferase in the reaction mix and, as each new strand is copied, the enzyme writes the same methylation pattern onto the newly-copied strands.

Third, many DNA sequences can’t easily be amplified with normal PCR, because highly repetitive regions (like the CAG repeats in Huntington’s) form hairpins during cooling. In a 2024 preprint, however, researchers showed that this isothermal method could amplify a 561 base pair sequence containing 91 percent As and Ts, and even amplify stretches of DNA that contain 200 consecutive AT base pairs. Normal PCR couldn’t amplify any of it.

Finally — the thing I’m most excited about — is that isothermal PCR is just more portable. By getting rid of the normal heating and cooling of PCR, you no longer need thermocyclers, with their big metal blocks and fans and electrical cords. Instead, you can run isothermal PCR using a small heater, roughly the size of a coin, that holds a tube at 65°C for many hours using minimal electricity.

The reagents are also more portable, too. Without the heat-stable polymerases, researchers can instead choose enzymes that freeze-dry easily for long-term storage at room temperature. The helicase, SSB, primers, and dNTPs can all be stored as powders, according to Gavrilov. ATP, the least stable ingredient, can also be stored as a powder. The whole reaction can therefore ship in a small tube, at room temperature, and then be rehydrated with a drop of water. It’s easy to imagine a portable and cheap diagnostic tool that could detect dozens of different pathogens — in soil or human blood, say — in just a few minutes.5

My prior essay made arguments about PCR based on its historical “shape.” I considered all the steps involved and then reasoned through ways to make each one faster or cheaper. This is a natural way of thinking, but it tends to yield only incremental improvements. It doesn’t easily reveal entirely new ways of thinking, or out-of-the-box solutions like “what if we didn’t change the temperature at all?”

It’s easy, in biology, to become enamored by the frontier. PCR is forty years old and taught to high schoolers, so why bother writing about it? Surely there’s no room left to improve. But the old, boring stuff often hides many layers of complexity, and if we haven’t yet optimized a forty-year-old technique, what makes us think we’re anywhere close to understanding the frontier? Simple ideas are rarely simple.

Some readers told me that my claims about polymerase speeds were too low. But the published data on polymerase speeds is really bad! Some companies advertise rates as high as 2–5kb per second, which truly boggles my mind. I’ve never seen actual data to support such claims, and think biology ought to produce real benchmarks for basic claims like this.

It’s mentioned at least three times in the patent. One quote says, “…the reaction mixture may contain, in addition to the nucleic acid strand(s) containing the desired sequence, the strand-separating enzyme (e.g., helicase), an appropriate energy source for the strand-separating enzyme, such as rATP, the four nucleotides, the oligonucleotide primers in molar excess, and the inducing agent...” This is an identical match to what Gavrilov made nearly 40 years later.

There are other isothermal ways to amplify DNA. LAMP, invented in 2000, skips strand separation entirely. It uses a polymerase that moves through paired bases as it copies, but requires four to six primers designed so that the new DNA folds back on itself to create hairpin loops. Each loop creates a fresh site for the next primer, and the reaction cascades. The downside is that it’s tedious to design these primers, and the end result is these tangled stem-loops that you can detect (such as for diagnostics) but can’t easily clone or sequence. RPA, invented in 2006, uses a recombinase, an enzyme bacteria use for DNA repair, to slip primers directly into the double helix without pulling the strands apart. RPA runs at body temperature, but it needs unusually long primers (30 to 38 bases) that tend to fold up on themselves, and like HDA, it caps out around 100 to 200 base pairs. Neither of these methods, therefore, are a direct replacement for PCR.

It’s important for the polymerase to be strand-displacing because it moves much faster than the helicase. Remember that this engineered “superhelicase” only moves at 120–150 bases per second, whereas a good polymerase can move at 1,000+ bases in the same amount of time. Some polymerases are not strand-displacing, though, and so their speeds would be bottlenecked by the helicase.

The real bottleneck is probably economics. Even a mundane diagnostic, like for lead poisoning, is way too expensive to use and there is little incentive for companies to make better options. The returns on diagnostics are not nearly as good as those for a new cancer drug, for example, and so many companies don’t work on these problems.