Viruses are doing mysterious things everywhere – AI can help researchers understand what they’re up to in the oceans and in your gut

Libusha Kelly, Albert Einstein College of Medicine

15 May 2024 at 8:16 am·5-min read

Many viral genetic sequences code for proteins that researchers haven't seen before. <a href="https://www.gettyimages.com/detail/illustration/and-viruses-illustration-royalty-free-illustration/1311468425" rel="nofollow noopener" target="_blank" data-ylk="slk:KTSDesign/Science Photo Library via Getty Images;elm:context_link;itc:0;sec:content-canvas" class="link ">KTSDesign/Science Photo Library via Getty Images</a> — Many viral genetic sequences code for proteins that researchers haven't seen before. KTSDesign/Science Photo Library via Getty Images

Viruses are a mysterious and poorly understood force in microbial ecosystems. Researchers know they can infect, kill and manipulate human and bacterial cells in nearly every environment, from the oceans to your gut. But scientists don’t yet have a full picture of how viruses affect their surrounding environments in large part because of their extraordinary diversity and ability to rapidly evolve.

Communities of microbes are difficult to study in a laboratory setting. Many microbes are challenging to cultivate, and their natural environment has many more features influencing their success or failure than scientists can replicate in a lab.

So systems biologists like me often sequence all the DNA present in a sample – for example, a fecal sample from a patient – separate out the viral DNA sequences, then annotate the sections of the viral genome that code for proteins. These notes on the location, structure and other features of genes help researchers understand the functions viruses might carry out in the environment and help identify different kinds of viruses. Researchers annotate viruses by matching viral sequences in a sample to previously annotated sequences available in public databases of viral genetic sequences.

However, scientists are identifying viral sequences in DNA collected from the environment at a rate that far outpaces our ability to annotate those genes. This means researchers are publishing findings about viruses in microbial ecosystems using unacceptably small fractions of available data.

To improve researchers’ ability to study viruses around the globe, my team and I have developed a novel approach to annotate viral sequences using artificial intelligence. Through protein language models akin to large language models like ChatGPT but specific to proteins, we were able to classify previously unseen viral sequences. This opens the door for researchers to not only learn more about viruses, but also to address biological questions that are difficult to answer with current techniques.

Annotating viruses with AI

Large language models use relationships between words in large datasets of text to provide potential answers to questions they are not explicitly “taught” the answer to. When you ask a chatbot “What is the capital of France?” for example, the model is not looking up the answer in a table of capital cities. Rather, it is using its training on huge datasets of documents and information to infer the answer: “The capital of France is Paris.”

Similarly, protein language models are AI algorithms that are trained to recognize relationships between billions of protein sequences from environments around the world. Through this training, they may be able to infer something about the essence of viral proteins and their functions.

We wondered whether protein language models could answer this question: “Given all annotated viral genetic sequences, what is this new sequence’s function?”

In our proof of concept, we trained neural networks on previously annotated viral protein sequences in pre-trained protein language models and then used them to predict the annotation of new viral protein sequences. Our approach allows us to probe what the model is “seeing” in a particular viral sequence that leads to a particular annotation. This helps identify candidate proteins of interest either based on their specific functions or how their genome is arranged, winnowing down the search space of vast datasets.

Microscopy image of spherical bacteria colored bright green — *Prochlorococcus* is one of the many species of marine bacteria with proteins that researchers haven’t seen before. Anne Thompson/Chisholm Lab, MIT via Flickr

By identifying more distantly related viral gene functions, protein language models can complement current methods to provide new insights into microbiology. For example, my team and I were able to use our model to discover a previously unrecognized integrase – a type of protein that can move genetic information in and out of cells – in the globally abundant marine picocyanobacteria Prochlorococcus and Synechococcus. Notably, this integrase may be able to move genes in and out of these populations of bacteria in the oceans and enable these microbes to better adapt to changing environments.

Our language model also identified a novel viral capsid protein that is widespread in the global oceans. We produced the first picture of how its genes are arranged, showing it can contain different sets of genes that we believe indicates this virus serves different functions in its environment.

These preliminary findings represent only two of thousands of annotations our approach has provided.

Analyzing the unknown

Most of the hundreds of thousands of newly discovered viruses remain unclassified. Many viral genetic sequences match protein families with no known function or have never been seen before. Our work shows that similar protein language models could help study the threat and promise of our planet’s many uncharacterized viruses.

While our study focused on viruses in the global oceans, improved annotation of viral proteins is critical for better understanding the role viruses play in health and disease in the human body. We and other researchers have hypothesized that viral activity in the human gut microbiome might be altered when you’re sick. This means that viruses may help identify stress in microbial communities.

However, our approach is also limited because it requires high-quality annotations. Researchers are developing newer protein language models that incorporate other “tasks” as part of their training, particularly predicting protein structures to detect similar proteins, to make them more powerful.

Making all AI tools available via FAIR Data Principles – data that is findable, accessible, interoperable and reusable – can help researchers at large realize the potential of these new ways of annotating protein sequences leading to discoveries that benefit human health.

This article is republished from The Conversation, a nonprofit, independent news organization bringing you facts and trustworthy analysis to help you make sense of our complex world. It was written by: Libusha Kelly, Albert Einstein College of Medicine

Read more:

Libusha Kelly receives funding from the National Institutes of Health.

Futurism
NASA Scrapping Finished $450 Million Moon Rover, Will Send Dead Weight "Simulator" to Moon in Its Place
NASA's $450 million lunar explorer, the Volatiles Investigating Polar Exploration Rover (VIPER), will not be going to the Moon. Something else will be taking its place, though — and given the costs involved, the decision is bound to raise a few eyebrows, if not serious questions about the space agency's budget situation. The rover, which […]
Associated Press Finance
A plan to replenish the Colorado River could mean dry alfalfa fields. And many farmers are for it
A plan to help shore up the depleted Colorado River by cutting off water to alfalfa fields in California's crop-rich Imperial Valley is finding support from the farmers who grow it. The Imperial Irrigation District — the biggest user of water from the 1,450-mile (2,334-kilometer) river — has offered to pay farmers to shut off irrigation to forage crops including alfalfa for up to 60 days during the peak of the sweltering summer. While farmers often balk at the idea of letting fields lie fallow, at least 80% of properties eligible for the new program have been signed up to participate, said Tina Shields, the district's water department manager.
Reuters
Analysis - Taiwanese rocket startup may be early test of Japan's space hub plans
A Taiwanese startup aims to become the first foreign firm to launch a rocket from Japan by early next year, part of a plan industry advocates say will aid Tokyo's ambitions of becoming a space hub in Asia. The planned suborbital launch by TiSpace has faced regulatory hurdles and delays amid questions over whether Japan should embrace overseas business as part of its effort to double the size of its 4 trillion yen ($26 billion) space industry over the next decade. The private firm, co-founded in 2016 by current and former officials from Taiwan's space agency, has not had a successful launch.
Futurism
NASA Says Its Rover Has Discovered a "Potential Biosignature" on Mars
Mighty Likely NASA's Perseverance Rover has found a rock on Mars that scientists believe may contain signs of ancient life on the Red Planet. As the New York Times reports, NASA researchers aren't quite ready to declare that they've found definitive biosignatures — the scientific term for "signs of life" — in the piece of ancient […]
Futurism
Astronaut Shows Photo He Shot in Space That Would Be Impossible to Take Now
Pinpoint Stars In 2003, when the International Space Station was a mere three years old, NASA astronaut Donald Pettit took a gorgeous picture of the Earth's atmosphere, with countless stars frozen in time in the background. But as Pettit revealed in a Reddit post earlier this week, the same photo "cannot be taken anymore" — […]
BBC
Boy unearths rare mammal fossil at festival
Seven-year-old James dug up a prototomus tooth during an activity at the Lyme Regis Fossil Festival.
Engadget
NASA's Perseverance rover found a rock on Mars that could indicate ancient life
A Martian rock sample collected by Perseverance contains "chemical signatures and structures" that could've been formed by ancient microbial life from billions of years ago.
The Conversation
Not one, but two meteor showers are about to peak – here’s how to catch the stellar show
On July 31, Earth will pass through two debris streams that will produce meteor showers. Here’s where to look in the night sky in Australia and New Zealand.
CNN
Boeing, NASA may have found ‘root cause’ of Starliner spacecraft’s issues, but astronauts are still in limbo
After weeks of testing, NASA and Boeing officials say they better understand the issues plaguing the Starliner spacecraft, but still aren’t ready to name a return date.
PA Media: UK News
Rollout of payment schemes causing ‘widespread uncertainty’ for farmers – report
The changes have come at a time when extreme weather, market conditions and sudden rises in input costs are putting farms under immense pressure.
The Conversation
Landmark new research shows how global warming is messing with our rainfall
Mounting evidence shows rainfall is becoming increasingly variable, making the dry times drier and the wet much wetter. New findings confirm research into rainfall variability in Australia.
Australian Associated Press
Husband found not guilty of 'brutal' wedding night rape
A man accused of a series of sexual assaults on his wedding night and honeymoon has been found not guilty on all charges in a Sydney court.
HuffPost
Stephen Colbert Taunts Trump With Absolutely Brutal Reminder About Melania
The "Late Show" host mocked the former president over one curious claim.
Yahoo News Australia
Passengers slammed over 'disturbing' train act attracting $500 fine
Commuters were noticeably annoyed by the disturbance, one man told Yahoo, and were 'shifting away' from the men in question.
The Independent
Is Donald Trump good at golf? We asked a professional coach to analyze his swing
With Joe Biden calling Trump’s alleged golfing prowess into question, is the 45th president as good as he claims to be?
BuzzFeed
Kamala Harris' Press Release About Donald Trump's Fox News Appearance Is Going Viral
"Something about the question mark after 'old and quite weird' is taking me out."
Yahoo Sport Australia
Tennis world erupts over massive news about Novak Djokovic and Rafa Nadal at Olympics
Rafa Nadal has left the tennis world stunned. Find out more here.
NewsWire
Why Aussies being turned away from Bali
Hundreds of Aussie tourists are being denied entry into Indonesia’s island paradise for one reason.
Parade
Prince William Reportedly Removes Decades-Old Position From Royal Staff
The royal staff member reportedly let go is a relative of Queen Camilla.
HuffPost
Jimmy Fallon Trolls Donald Trump With 3 Words, Over And Over Again
The "Tonight Show" host envisioned an exchange between the Republican presidential nominee and Elon Musk.

Annotating viruses with AI

Analyzing the unknown

Latest stories