{"id":114767,"date":"2025-03-01T05:45:00","date_gmt":"2025-03-01T10:45:00","guid":{"rendered":"https:\/\/www.freethink.com\/?post_type=ftm_article&#038;p=114767"},"modified":"2025-02-28T12:17:30","modified_gmt":"2025-02-28T17:17:30","slug":"evo-2-generative-biology","status":"publish","type":"ftm_article","link":"https:\/\/www.freethink.com\/biotech\/evo-2-generative-biology","title":{"rendered":"Arc Institute\u2019s new AI can read and write the code of life"},"content":{"rendered":"\n<p><em>This article is an installment of&nbsp;<a href=\"https:\/\/www.freethink.com\/collections\/future-explored\">Future Explored<\/a>, a weekly guide to world-changing technology. You can get stories like this one straight to your inbox every Saturday morning by&nbsp;subscribing above.<\/em><\/p>\n\n\n\n<p>It\u2019s 2040. You\u2019re at your doctor\u2019s office, going over the results of your genome analysis. An advanced AI has identified patterns in your DNA code that suggest you\u2019re at high risk of developing a certain disease in the future. Thankfully, the same AI can be used to design a treatment.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-generative-biology\"><strong>Generative biology<\/strong><\/h2>\n\n\n\n<p>Biology\u2014the study of living things\u2014has been going on since prehistoric times when our ancestors first determined through trial and error which plants were food and which were poison.&nbsp;<\/p>\n\n\n\n<p>Over the next tens of millennia, scientists would develop increasingly advanced new tools to help them in their quest to understand the living world, eventually leading to the breakthrough discovery that everything we could want to know about an organism is written in its DNA.&nbsp;<\/p>\n\n\n\n<p>Now, an artificial intelligence (AI) called <a href=\"https:\/\/arcinstitute.org\/news\/blog\/evo2\">Evo 2<\/a> is entering the biology lab, and the introduction of <em>this<\/em> tool could signal the start of a new era in biology, one in which scientists aren\u2019t just trying to decipher the code of life, but rewriting it from the ground up.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-where-we-ve-been\">Where we\u2019ve been<\/h3>\n\n\n\n<figure class=\"wp-block-image alignwide size-large\"><img decoding=\"async\" width=\"1800\" height=\"3330\" src=\"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Human-genome-timeline.jpg?quality=75&amp;w=1800\" alt=\"1869 - Biochemist Friedrich Miescher discovers and isolates a previously unknown substance in cells. He called it \u201cnuclein,\u201d but the name of the molecule was later changed to deoxyribonucleic acid (DNA).\n\n1889 - Botanist Hugo de Vries hypothesizes that an organism\u2019s specific traits are packaged into particles that he called \u201cpangenes\u201d and that these particles are inherited from one\u2019s parents. Scientist Wilhelm Johannsen later shortens the term to \u201cgenes.\u201d\u00a0\n\n1944 - Scientist Oswald Avery determines that genes are made of DNA, connecting DNA to heredity for the first time.\n\n1953 - Biologist James Watson and physicist Francis Crick correctly propose that DNA is structured in a double helix with pairs of chemical bases\u2014adenine (A) with thymine (T) and cytosine (C) with guanine (G)\u2014connecting the two sides.\u00a0\n\n1960s - Biophysicist Margaret Oakley Dayhoff pioneers the use of computers for biology research. This makes it easier for scientists to manage and analyze biological information.\n\n1972 - Belgian molecular biologist Walter Fiers becomes the first person to sequence the DNA of a complete gene, writing out, in order, all of its bases. Advanced techniques will later allow scientists to sequence whole genomes\u2014the complete collection of genes in an organism\u2014including that of humans.\n\n1972 - Biochemist Paul Berg successfully combines the DNA of two different viruses, laying the foundation for modern genetic engineering. Before the end of the decade, scientists will engineer human insulin\u2014a life-saving treatment for diabetes\u2014and genetically modified mice\u2014a huge boon for biomedical research.\u00a0\n\n2008 - Scientist John Craig Venter and his colleagues give birth to the field of synthetic biology by creating a copy of a bacterial genome from scratch in the lab. Two years later, they successfully integrate the genome into a cell, creating the first synthetic life form.\n\n2012 - Biochemists Jennifer Doudna and Emmanuelle Charpentier propose that CRISPR-Cas9, a natural gene-editing system found in bacteria, could be programmed to edit DNA far more easily than was previously possible.\u00a0\n\n2025 - The Arc Institute open sources Evo 2, an AI trained on the DNA of more than 100,000 species. The model is able to spot patterns in gene sequences across species, identify disease-causing mutations in human genes, and even write brand new genomes up to 1 million base pairs long\u2014a huge milestone in the burgeoning field of generative biology.\" class=\"wp-image-114774\" srcset=\"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Human-genome-timeline.jpg 1800w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Human-genome-timeline.jpg?resize=768,1421 768w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Human-genome-timeline.jpg?resize=830,1536 830w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Human-genome-timeline.jpg?resize=1107,2048 1107w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Human-genome-timeline.jpg?resize=320,592 320w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Human-genome-timeline.jpg?resize=600,1110 600w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Human-genome-timeline.jpg?resize=1000,1850 1000w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Human-genome-timeline.jpg?resize=1400,2590 1400w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Human-genome-timeline.jpg?resize=330,611 330w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Human-genome-timeline.jpg?resize=540,999 540w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Human-genome-timeline.jpg?resize=850,1573 850w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Human-genome-timeline.jpg?resize=175,324 175w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Human-genome-timeline.jpg?resize=275,509 275w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Human-genome-timeline.jpg?resize=400,740 400w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Human-genome-timeline.jpg?resize=360,666 360w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Human-genome-timeline.jpg?resize=500,925 500w\" sizes=\"(max-width: 1800px) 100vw, 1800px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-where-we-re-going-maybe\">Where we\u2019re going (maybe)<\/h3>\n\n\n\n<p>The genome is like an organism\u2019s instruction manual, dictating the appearance and function of every cell in its body. While all humans have basically the same genome, <a href=\"https:\/\/www.genome.gov\/about-genomics\/educational-resources\/fact-sheets\/human-genomic-variation\">about 0.1%<\/a> of yours will differ from the reference human genome\u2014there will be spots where you have a G instead of the standard A, for example.<\/p>\n\n\n\n<p>We call these differences \u201cgenetic variants,\u201d and they play a key role in making you <em>you<\/em>, helping determine everything from your eye color to your blood type. They\u2019ve also been linked to an estimated 7,000 diseases\u2014the blood disorder sickle cell anemia, for example, is caused by variants in just one gene.&nbsp;<\/p>\n\n\n\n<p>While we\u2019ve determined that some genetic variants are benign and some put us at higher risk of certain diseases, others are \u201cvariants of unknown significance\u201d (VUS), which Patrick Hsu, head of the <a href=\"https:\/\/www.freethink.com\/science\/arc-institute-science-century-of-biology\">Arc Institute<\/a>, a nonprofit biomedical research organization, tells Freethink \u201cis a kind of fancy word for we don&#8217;t know what the hell is going on.\u201d<\/p>\n\n\n\n<p>Figuring out what, if anything, these variants do could have a huge impact on healthcare because if they <em>are<\/em> implicated in a disease, that gives us a target to treat. We might be able to deliver <a href=\"https:\/\/www.freethink.com\/health\/butterfly-disease\">healthy copies<\/a> of the affected gene into cells or use gene-editing tools <a href=\"https:\/\/www.freethink.com\/health\/crispr-therapy-casgevy\">like CRISPR<\/a> to correct the mutation.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1600\" height=\"900\" src=\"https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?quality=75&amp;w=1600\" alt=\"a rendering of the blood of someone with sickle cell disease\" class=\"wp-image-40063\" srcset=\"https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg 1600w, https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?resize=768,432 768w, https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?resize=1536,864 1536w, https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?resize=320,180 320w, https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?resize=600,338 600w, https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?resize=1000,563 1000w, https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?resize=1400,788 1400w, https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?resize=213,120 213w, https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?resize=355,200 355w, https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?resize=533,300 533w, https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?resize=711,400 711w, https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?resize=1067,600 1067w, https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?resize=330,186 330w, https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?resize=540,304 540w, https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?resize=850,478 850w, https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?resize=175,98 175w, https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?resize=275,155 275w, https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?resize=400,225 400w, https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?resize=360,203 360w, https:\/\/www.freethink.com\/wp-content\/uploads\/2022\/06\/CRISPR-therapy-working-3-years-out_Web.jpg?resize=500,281 500w\" sizes=\"(max-width: 1600px) 100vw, 1600px\" \/><div class=\"img-caption\"><figcaption class=\"wp-element-caption\">SciePro \/ Adobe Stock<\/figcaption><div class=\"img-caption__description\">In 2023, Casgevy\u2014a treatment for the blood disorder sickle cell anemia\u2014became the first CRISPR-based therapy approved by the FDA.\n<\/div><\/div><\/figure>\n\n\n\n<p>Solving the mystery can be hugely challenging, though.<\/p>\n\n\n\n<p>For one, only about 2% of the human genome contains DNA sequences that are \u201ccoding,\u201d meaning they teach cells how to make proteins (the molecules that actually do the work in cells). The other 98% consists of \u201cnoncoding\u201d DNA sequences that have no known biological function.&nbsp;<\/p>\n\n\n\n<p>Researchers are starting to piece together the impact of some of this \u201c<a href=\"https:\/\/www.freethink.com\/health\/dry-macular-degeneration\">junk DNA<\/a>,\u201d but the bottom line is the majority of VUS are in parts of the genome that might do something, but we don\u2019t know what, making it hard to even begin to guess how they might affect our health.<\/p>\n\n\n\n<p>Another issue is that genetic variants often don\u2019t act alone. In 2022, for example, a study of 5.4 million human genomes identified <a href=\"https:\/\/hms.harvard.edu\/news\/scientists-uncover-nearly-all-genetic-variants-linked-height\"><em>12,000 variants<\/em><\/a> that influence height.&nbsp;<\/p>\n\n\n\n<p>Heart disease, diabetes, and many other health problems are considered \u201cpolygenic\u201d\u2014caused by the combined effects of multiple genes\u2014so a researcher hoping to identify the variant(s) responsible for a disease might need to be able to spot a pattern involving thousands of them in the genomes of multiple people with that disease.<\/p>\n\n\n\n<p>That\u2019s a lot to ask of a human, but it\u2019s the sort of task an AI could excel at.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-the-technology\">The technology<\/h4>\n\n\n\n<p>OpenAI\u2019s 2022 release of ChatGPT may have propelled <a href=\"https:\/\/www.freethink.com\/robots-ai\/generative-ai-2022\">generative AI<\/a> into the mainstream, but the field really got its big break in 2017, when researchers at Google introduced the \u201c<a href=\"https:\/\/research.google\/blog\/transformer-a-novel-neural-network-architecture-for-language-understanding\/\">transformer<\/a>,\u201d a new kind of neural network architecture for language processing.<\/p>\n\n\n\n<p>Instead of analyzing a text one word after another, transformers break the whole text into small \u201ctokens\u201d (individual words or even punctuation marks), look at them all at once, and then determine which are the most important based on their relationships to one another.<\/p>\n\n\n\n<p>Armed with this information, a transformer-based AI can generate a response to a prompt by predicting what word is <em>most likely<\/em> to come first in an appropriate answer. It then predicts the next word and the next in the same way until it generates a complete response.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-pullquote\"><blockquote><p>&#8220;We thought, &#8216;What would happen if we did that for DNA?'&#8221;<\/p><cite>Patrick Hsu<\/cite><\/blockquote><\/figure>\n\n\n\n<p>Google introduced transformers as a tool for language translation, but researchers soon realized the architecture could be used to create AIs capable of generating human-like text, <a href=\"https:\/\/www.freethink.com\/hard-tech\/text-to-image-ai\">images<\/a>, <a href=\"https:\/\/www.freethink.com\/robots-ai\/ai-music-generator-udio\">music<\/a>, <a href=\"https:\/\/www.freethink.com\/robots-ai\/googles-lumiere-has-created-a-new-paradigm-for-text-to-video-ai\">videos<\/a>, and more in response to prompts. The kind of token changes\u2014from words to pixels or music notes, for example\u2014but the basic operation remains the same.&nbsp;<\/p>\n\n\n\n<p>&#8220;People have been using these transformer-type architectures and these models that are trained on next-token prediction to decode many other domains, whether that&#8217;s language or vision or robotics,&#8221; Hsu tells Freethink. &#8220;We thought, &#8216;What would happen if we did that for DNA?'&#8221;<\/p>\n\n\n\n<p>\u201cThe effects of natural selection are transmitted throughout generations of life via DNA mutations,\u201d he adds, \u201cso, in principle, by reading across massive data sets of DNA mutations, you might be able to connect these mutations to function.\u201d<\/p>\n\n\n\n<p>To test this theory, the Arc Institute teamed up with researchers at Stanford University and the University of California, Berkeley, to create an AI model that could interpret and generate DNA sequences the same way others do text or images.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\">\n<p>&#8220;We think of this as enabling an app store for biology.&#8221;<\/p>\n<cite><em>Patrick Hsu<\/em><\/cite><\/blockquote>\n\n\n\n<p>From existing research, they knew a standard transformer architecture wasn\u2019t going to work\u2014the computational cost of analyzing long sequences of DNA was too high, and the architecture underperformed at the single-token resolution needed to make sense of genetic variants.<\/p>\n\n\n\n<p>\u201cWe had to develop a new frontier deep learning architecture beyond the vanilla transformer that is basically standard in the field,\u201d says Hsu.&nbsp;<\/p>\n\n\n\n<p>They named their new architecture \u201c<a href=\"https:\/\/www.together.ai\/blog\/stripedhyena-7b\">Striped Hyena<\/a>\u201d (a nod to the \u201c<a href=\"https:\/\/ermongroup.github.io\/blog\/hyena\/\">hyena layers<\/a>\u201d incorporated alongside the transformer layers) and used it as the basis for <a href=\"https:\/\/arcinstitute.org\/news\/blog\/evo\">Evo<\/a>, an AI model trained on the genome sequences of more than 2.7 single-cell organisms and microbes.&nbsp;<\/p>\n\n\n\n<p>And it worked. After training, Evo was able to make accurate predictions about the relationship between an organism\u2019s genome and its function. It could predict which genes were essential in a bacteria, for example, and how a genetic variant would impact a gene\u2019s protein performance.<\/p>\n\n\n\n<p>It could also <em>generate<\/em> DNA sequences more than 1 million base pairs long. As a proof of concept, the researchers prompted Evo to write the code for a new <a href=\"https:\/\/www.freethink.com\/science\/jennifer-doudna\">CRISPR-Cas system<\/a>, and after synthesizing the system in the lab, the team found it to be fully functional.&nbsp;<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-the-next-evo-lution\">The next Evo-lution<\/h4>\n\n\n\n<p>The Arc team unveiled Evo in February 2024, making both the model and a large training dataset available to the public for free, and one short year later, it\u2019s back with the next iteration of the technology: <a href=\"https:\/\/arcinstitute.org\/manuscripts\/Evo2\">Evo 2<\/a>.<\/p>\n\n\n\n<p>This model\u2014created in collaboration with researchers at Stanford University, University of California, Berkeley, University of California, San Francisco, and Nvidia\u2014is trained on a massive dataset of more than 9.3 trillion DNA letters from the genomes of nearly 130,000 species across the tree of life, including humans.<\/p>\n\n\n\n<p>Thanks to an updated architecture, Striped Hyena 2, Evo 2 is able to analyze up to 1 million DNA bases at a time\u2014a significant increase over Evo 1\u2019s 131,000 limit\u2014and generate sequences as long as the genomes of some bacteria.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"2100\" height=\"1696\" src=\"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Evo-2-training-set.jpg?quality=75&amp;w=2100\" alt=\"A scatter plot depicting clusters of genomes: Bacteria (107.5k, green), Eukarya (15k, blue), and Archaea (5.9k, red).\" class=\"wp-image-114769\" srcset=\"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Evo-2-training-set.jpg 2100w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Evo-2-training-set.jpg?resize=768,620 768w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Evo-2-training-set.jpg?resize=1536,1241 1536w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Evo-2-training-set.jpg?resize=2048,1654 2048w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Evo-2-training-set.jpg?resize=320,258 320w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Evo-2-training-set.jpg?resize=600,485 600w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Evo-2-training-set.jpg?resize=1000,808 1000w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Evo-2-training-set.jpg?resize=1400,1131 1400w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Evo-2-training-set.jpg?resize=330,267 330w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Evo-2-training-set.jpg?resize=540,436 540w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Evo-2-training-set.jpg?resize=850,686 850w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Evo-2-training-set.jpg?resize=1800,1454 1800w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Evo-2-training-set.jpg?resize=175,141 175w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Evo-2-training-set.jpg?resize=275,222 275w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Evo-2-training-set.jpg?resize=400,323 400w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Evo-2-training-set.jpg?resize=360,291 360w, https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Evo-2-training-set.jpg?resize=500,404 500w\" sizes=\"(max-width: 2100px) 100vw, 2100px\" \/><div class=\"img-caption\"><figcaption class=\"wp-element-caption\">Arc Institute<\/figcaption><div class=\"img-caption__description\">Evo 2 is trained on data from more than 128,000 genomes across the three domains of life. \n<\/div><\/div><\/figure>\n\n\n\n<p>To demonstrate the potential of Evo 2\u2019s prediction power, the Arc team focused on the BRCA1 gene. A small number of variants in this gene are known to dramatically increase a person\u2019s risk of <a href=\"https:\/\/www.freethink.com\/health\/mammograms\">breast cancer<\/a>, but genetic testing often turns up many VUS, meaning there\u2019s potentially still a lot we could learn about the gene\u2019s role in the disease.<\/p>\n\n\n\n<p>\u201cThe question for folks who have these VUS mutations is, \u2018Do I do anything other than getting an annual mammogram?\u2019\u201d says Hsu.&nbsp;<\/p>\n\n\n\n<p>When they tasked Evo 2 with predicting whether a variant in BRCA1 was benign or potentially pathogenic\u2014could cause disease\u201490% of its answers matched those in a dataset of predictions based on the <a href=\"https:\/\/www.nature.com\/articles\/s41586-018-0461-z\">results of lab experiments<\/a>. Evo 2 also proved to be better than any other AI model at classifying variants in those tricky noncoding segments of the gene\u2019s DNA.<\/p>\n\n\n\n<p>\u201cEvo 2 is the only model that is able to score or predict the effects of both coding and noncoding mutations,\u201d Hsu explained during a press briefing on February 19. \u201cIt&#8217;s the second-best model for coding mutations, but it&#8217;s state-of-the-art for noncoding mutations, which this model, AlphaMissense from DeepMind, cannot score.\u201d<\/p>\n\n\n\n<p>Evo 2 achieved this without being trained on anything specifically related to BRCA1, too. If someone were to take the model and finetune it on data related to that particular gene, they could potentially improve its performance.<\/p>\n\n\n\n<p>\u201cWe think of [Evo 2] as the foundational layer of biological information, and people can build different applications,\u201d says Hsu, adding, \u201cWe think of this as enabling an app store for biology.\u201d<\/p>\n\n\n\n<figure class=\"wp-block-pullquote\"><blockquote><p>&#8220;On the design side, this is starting to touch things that feel much more science fiction.&#8221;<\/p><cite><em>Patrick Hsu<\/em><\/cite><\/blockquote><\/figure>\n\n\n\n<p>To demonstrate Evo 2\u2019s ability to <em>generate<\/em> DNA sequences, meanwhile, the Arc team tasked it with writing three kinds of increasingly complex genomes: a mitochondrial genome, a bacterial genome, and a yeast chromosome.<\/p>\n\n\n\n<p>The AI was able to generate sequences that encoded all of the genes you\u2019d expect to see in a real mitochondrial genome, which is about 16,000 base pairs long. Its outputs for the others weren\u2019t as realistic, but they contained many of the genes you\u2019d expect to see in nature.<\/p>\n\n\n\n<p>\u201cOn the design side, this is starting to touch things that feel much more science fiction,\u201d Hsu tells Freethink.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"h-looking-ahead\">Looking ahead<\/h4>\n\n\n\n<p>Just like it did with Evo 1, the Arc team has open-sourced Evo 2, making its code <a href=\"https:\/\/github.com\/arcinstitute\/evo2\">available on GitHub<\/a>, as well as integrating it into Nvidia\u2019s <a href=\"https:\/\/github.com\/NVIDIA\/bionemo-framework\">BioNeMo framework<\/a>. Researchers can also opt to interact with it using the user-friendly <a href=\"https:\/\/arcinstitute.org\/tools\/evo\/evo-designer\">Evo Designer<\/a> interface.<\/p>\n\n\n\n<p>Sudarshan Pinglay, head of the <a href=\"https:\/\/www.pinglay-lab.com\/\">Pinglay Lab<\/a> at the Seattle Hub for Synthetic Biology, is one of the researchers taking advantage of Evo 2. His team is already making some of its designs in the lab just to see what they look like, and he envisions a future in which he can use an Evo model to generate genomes unlike any that exist in nature.&nbsp;<\/p>\n\n\n\n<p>\u201cI think models like Evo will really help us design truly synthetic genomes that basically look nothing like life that was evolved,\u201d he tells Freethink, adding, \u201cI don&#8217;t think Evo is the finish line. I think it&#8217;s a starting point for models for whole genome design that basically break the shackles of evolution.\u201d<\/p>\n\n\n\n<p>The fact that genomes generated by Evo 2 were a significant improvement over Evo 1\u2019s DNA sequences suggests that that\u2019s where the technology could be heading.<\/p>\n\n\n\n<p>\u201cIt&#8217;s definitely following the scaling laws,\u201d says Hsu, \u201cwhich is another machine learning term that underpins that more compute, more parameters, and more data are all really predictable ways to improve the performance of these machine learning models.\u201d<\/p>\n\n\n\n<p>He looks forward to the point that an Evo model could be used to look at all the variants across a person\u2019s entire genome and generate <a href=\"https:\/\/www.freethink.com\/biotech\/genome-sequencing-nucleus\">risk scores<\/a> for diseases associated with multiple genes.<\/p>\n\n\n\n<p>\u201cWe showed a million token context, but the human genome is 3.2 billion bases long, so it would be nice if we had a three billion token context model,\u201d says Hsu. \u201cI don&#8217;t know if that&#8217;s Evo 3, but we want that Evo.\u201d<\/p>\n\n\n\n<p><em>We\u2019d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at&nbsp;<a href=\"mailto:tips@freethink.com\" target=\"_blank\" rel=\"noreferrer noopener\">tips@freethink.com<\/a>.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Training on the DNA of nearly 130,000 species taught Evo 2 how to generate DNA sequences the same way other AIs do text or images.<\/p>\n","protected":false},"author":25,"featured_media":114775,"template":"","ftm_taxonomy_fields":[46,57],"ftm_taxonomy_challenges":[],"ftm_taxonomy_statuses":[36],"ftm_taxonomy_hidden_tags":[1939],"class_list":["post-114767","ftm_article","type-ftm_article","status-publish","has-post-thumbnail","hentry","ftm_taxonomy_fields-ai","ftm_taxonomy_fields-biology","ftm_taxonomy_statuses-featured"],"acf":[],"apple_news_notices":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v26.9) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Arc Institute\u2019s Evo 2 can read and write the code of life<\/title>\n<meta name=\"description\" content=\"Training on the DNA of nearly 130,000 species taught Evo 2 how to generate DNA sequences the same way other AIs do text or images.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.freethink.com\/biotech\/evo-2-generative-biology\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Arc Institute\u2019s new AI can read and write the code of life\" \/>\n<meta property=\"og:description\" content=\"Training on the DNA of nearly 130,000 species taught Evo 2 how to generate DNA sequences the same way other AIs do text or images.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.freethink.com\/biotech\/evo-2-generative-biology\" \/>\n<meta property=\"og:site_name\" content=\"Freethink\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Generative-Biology-Art-1.gif?resize=1200,630\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"630\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/gif\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:description\" content=\"By training an AI on DNA, they&#039;ve laid the foundation for &quot;an app store for biology.&quot;\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"10 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Arc Institute\u2019s Evo 2 can read and write the code of life","description":"Training on the DNA of nearly 130,000 species taught Evo 2 how to generate DNA sequences the same way other AIs do text or images.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.freethink.com\/biotech\/evo-2-generative-biology","og_locale":"en_US","og_type":"article","og_title":"Arc Institute\u2019s new AI can read and write the code of life","og_description":"Training on the DNA of nearly 130,000 species taught Evo 2 how to generate DNA sequences the same way other AIs do text or images.","og_url":"https:\/\/www.freethink.com\/biotech\/evo-2-generative-biology","og_site_name":"Freethink","og_image":[{"width":1200,"height":630,"url":"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Generative-Biology-Art-1.gif?resize=1200,630","type":"image\/gif"}],"twitter_card":"summary_large_image","twitter_description":"By training an AI on DNA, they've laid the foundation for \"an app store for biology.\"","twitter_misc":{"Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.freethink.com\/biotech\/evo-2-generative-biology#article","isPartOf":{"@id":"https:\/\/www.freethink.com\/biotech\/evo-2-generative-biology"},"author":{"name":"kristinhouser","@id":"https:\/\/www.freethink.com\/#\/schema\/person\/e45bf79276f6c14454ee4e1dfa7aca8c"},"headline":"Arc Institute\u2019s new AI can read and write the code of life","datePublished":"2025-03-01T10:45:00+00:00","mainEntityOfPage":{"@id":"https:\/\/www.freethink.com\/biotech\/evo-2-generative-biology"},"wordCount":2060,"publisher":{"@id":"https:\/\/www.freethink.com\/#organization"},"image":{"@id":"https:\/\/www.freethink.com\/biotech\/evo-2-generative-biology#primaryimage"},"thumbnailUrl":"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Generative-Biology-Art-1.gif?quality=75","inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.freethink.com\/biotech\/evo-2-generative-biology","url":"https:\/\/www.freethink.com\/biotech\/evo-2-generative-biology","name":"Arc Institute\u2019s Evo 2 can read and write the code of life","isPartOf":{"@id":"https:\/\/www.freethink.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.freethink.com\/biotech\/evo-2-generative-biology#primaryimage"},"image":{"@id":"https:\/\/www.freethink.com\/biotech\/evo-2-generative-biology#primaryimage"},"thumbnailUrl":"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Generative-Biology-Art-1.gif?quality=75","datePublished":"2025-03-01T10:45:00+00:00","description":"Training on the DNA of nearly 130,000 species taught Evo 2 how to generate DNA sequences the same way other AIs do text or images.","breadcrumb":{"@id":"https:\/\/www.freethink.com\/biotech\/evo-2-generative-biology#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.freethink.com\/biotech\/evo-2-generative-biology"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.freethink.com\/biotech\/evo-2-generative-biology#primaryimage","url":"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Generative-Biology-Art-1.gif?quality=75","contentUrl":"https:\/\/www.freethink.com\/wp-content\/uploads\/2025\/02\/Generative-Biology-Art-1.gif?quality=75","width":1200,"height":675,"caption":"Jacob Hege \/ Freethink"},{"@type":"BreadcrumbList","@id":"https:\/\/www.freethink.com\/biotech\/evo-2-generative-biology#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Articles","item":"https:\/\/www.freethink.com\/articles"},{"@type":"ListItem","position":2,"name":"Arc Institute\u2019s new AI can read and write the code of life"}]},{"@type":"WebSite","@id":"https:\/\/www.freethink.com\/#website","url":"https:\/\/www.freethink.com\/","name":"Freethink","description":"Move the world","publisher":{"@id":"https:\/\/www.freethink.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.freethink.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.freethink.com\/#organization","name":"Freethink Media","url":"https:\/\/www.freethink.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.freethink.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.freethink.com\/wp-content\/uploads\/2021\/06\/logo.svg","contentUrl":"https:\/\/www.freethink.com\/wp-content\/uploads\/2021\/06\/logo.svg","width":651,"height":124,"caption":"Freethink Media"},"image":{"@id":"https:\/\/www.freethink.com\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/www.freethink.com\/#\/schema\/person\/e45bf79276f6c14454ee4e1dfa7aca8c","name":"kristinhouser","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.freethink.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/ff88759e0ed195de655c7703310050f17b921ae4fc276d7eb5930cddafa694f9?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/ff88759e0ed195de655c7703310050f17b921ae4fc276d7eb5930cddafa694f9?s=96&d=mm&r=g","caption":"kristinhouser"},"url":"https:\/\/www.freethink.com\/author\/kristinhouser"}]}},"jetpack_sharing_enabled":true,"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/ftm_article\/114767","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/ftm_article"}],"about":[{"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/types\/ftm_article"}],"author":[{"embeddable":true,"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/users\/25"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/media\/114775"}],"wp:attachment":[{"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/media?parent=114767"}],"wp:term":[{"taxonomy":"ftm_taxonomy_fields","embeddable":true,"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/ftm_taxonomy_fields?post=114767"},{"taxonomy":"ftm_taxonomy_challenges","embeddable":true,"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/ftm_taxonomy_challenges?post=114767"},{"taxonomy":"ftm_taxonomy_statuses","embeddable":true,"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/ftm_taxonomy_statuses?post=114767"},{"taxonomy":"ftm_taxonomy_hidden_tags","embeddable":true,"href":"https:\/\/www.freethink.com\/wp-json\/wp\/v2\/ftm_taxonomy_hidden_tags?post=114767"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}