AI2 open sources text-generating AI models -- and the data used to train them

Kyle Wiggers

Updated 1 February 2024 at 12:23 pm·3-min read

The Allen Institute for AI (AI2), the nonprofit AI research institute founded by late Microsoft co-founder Paul Allen, is releasing several GenAI language models it claims are more "open" than others -- and, importantly, licensed in such a way that developers can use them unfettered for training, experimentation and even commercialization

Called OLMo, an acronym for "Open Language Models," the models and the dataset used to train them, Dolma -- one of the largest public datasets of its kind -- were designed to study the high-level science behind text-generating AI, according to AI2 senior software engineer Dirk Groeneveld.

"'Open' is an overloaded term when it comes to [text-generating models]," Groeneveld told TechCrunch in an email interview. "We expect researchers and practitioners will seize the OLMo framework as an opportunity to analyze a model trained on one of the largest public data sets released to date, along with all the components necessary for building the models."

Open source text-generating models are becoming a dime a dozen, with organizations from Meta to Mistral releasing highly capable models for any developer to use and fine-tune. But Groeneveld makes the case that many of these models can't really be considered open because they were trained "behind closed doors" and on proprietary, opaque sets of data.

By contrast, the OLMo models, which were created with the help of partners including Harvard, AMD and Databricks, ship with the code that was used to produce their training data as well as training and evaluation metrics and logs.

In terms of performance, the most capable OLMo model, OLMo 7B, is a "compelling and strong" alternative to Meta's Llama 2, Groeneveld asserts -- depending on the application. On certain benchmarks, particularly those touching on reading comprehension, OLMo 7B edges out Llama 2. But in others, particularly question-answering tests, OLMo 7B is slightly behind.

The OLMo models have other limitations, like low-quality outputs in languages that aren't English (Dolma contains mostly English-language content) and weak code-generating capabilities. But Groeneveld stressed that it's early days.

"OLMo is not designed to be multilingual -- yet," he said. "[And while] at this stage, the primary focus of the OLMo framework [wasn't] code generation, to give a head start to future code-based fine-turning projects, OLMo's data mix currently contains about 15% code."

I asked Groeneveld whether he was concerned that the OLMo models, which can be used commercially and are performant enough to run on consumer GPUs like the Nvidia 3090, might be leveraged in unintended, possibly malicious ways by bad actors. A recent study by Democracy Reporting International's Disinfo Radar project, which aims to identify and address disinformation trends and technologies, found that two popular open text-generating models, Hugging Face's Zephyr and Databricks' Dolly, reliably generate toxic content -- responding to malevolent prompts with "imaginative" harmful content.

Groeneveld believes that the benefits outweigh the harms in the end.

"[B]uilding this open platform will actually facilitate more research on how these models can be dangerous and what we can do to fix them," he said. "Yes, it's possible open models may be used inappropriately or for unintended purposes. [However, this] approach also promotes technical advancements that lead to more ethical models; is a prerequisite for verification and reproducibility, as these can only be achieved with access to the full stack; and reduces a growing concentration of power, creating more equitable access."

In the coming months, AI2 plans to release larger and more capable OLMo models, including multimodal models (i.e. models that understand modalities beyond text), and additional datasets for training and fine-tuning. As with the initial OLMo and Dolma release, all resources will be made available for free on GitHub and the AI project hosting platform Hugging Face.

Australian Associated Press
Husband found not guilty of 'brutal' wedding night rape
A man accused of a series of sexual assaults on his wedding night and honeymoon has been found not guilty on all charges in a Sydney court.
Cosmo
Rosalía goes braless and *almost* frees the nip in a lace naked dress
Rosalía stepped out wearing a breathtaking naked dress at the Prelude to the Olympics in Paris. The design was a nude coloured see-through lace gown by Dior.
HuffPost
Stephen Colbert Taunts Trump With Absolutely Brutal Reminder About Melania
The "Late Show" host mocked the former president over one curious claim.
The Independent
Is Donald Trump good at golf? We asked a professional coach to analyze his swing
With Joe Biden calling Trump’s alleged golfing prowess into question, is the 45th president as good as he claims to be?
Yahoo News Australia
Passengers slammed over 'disturbing' train act attracting $500 fine
Commuters were noticeably annoyed by the disturbance, one man told Yahoo, and were 'shifting away' from the men in question.
BuzzFeed
Kamala Harris' Press Release About Donald Trump's Fox News Appearance Is Going Viral
"Something about the question mark after 'old and quite weird' is taking me out."
Yahoo Sport Australia
Tennis world erupts over massive news about Novak Djokovic and Rafa Nadal at Olympics
Rafa Nadal has left the tennis world stunned. Find out more here.
NewsWire
Why Aussies being turned away from Bali
Hundreds of Aussie tourists are being denied entry into Indonesia’s island paradise for one reason.
Parade
Prince William Reportedly Removes Decades-Old Position From Royal Staff
The royal staff member reportedly let go is a relative of Queen Camilla.
NY Daily News
Harris campaign roasts Trump as ‘old and quite weird’ after Fox News insults
Republican presidential candidate Donald Trump called in to Fox News Thursday, where he told supporters that presumptive Democratic nominee Kamala Harris is a “radical left, not very smart person” who’s part of a massive conspiracy to weaponize the nation’s legal system against him. Harris’ campaign fired back mere minutes later with an email blasting the “78-year-old convicted criminal’s Fox ...
HuffPost
Jimmy Fallon Trolls Donald Trump With 3 Words, Over And Over Again
The "Tonight Show" host envisioned an exchange between the Republican presidential nominee and Elon Musk.
BuzzFeed
18 Famous "Childless Cat Ladies" And Their Thoughtful Reasons For Never Having Kids
Don't show this post to JD Vance.
Evening Standard
FBI director suggests Donald Trump may not have been struck by bullet during assassination attempt at rally
FBI director Christopher Wray said investigators did not know whether Trump’s ear was grazed by a bullet or shrapnel
Hello!
Amanda Holden stuns in mini dress alongside lookalike daughters during Greek getaway
BGT judge Amanda Holden looked flawless as she holidayed with her mini-me daughters Lexi and Hollie. Take a look inside their lavish Greek getaway…
Parade
Nicole Scherzinger Sizzles in See-Thru Lace Dress With Risqué Chest Cutout in the French Riviera
The Pussycat Dolls singer showed off the racy look in spicy new social media snaps.
The Independent
Passenger refuses to let mother and child sit in her plane seat by providing controversial reason
‘As a very tall and big man, I have had this happen more than a few times,’ one commenter related to the Reddit post
The Independent
Wife was convicted of killing her husband in violent hammer attack. She was found dead hours before sentencing
Linda Kosuda-Bigazzi killed her husband with a hammer before hiding his body in the basement of their home and pocketing his paychecks for months
Yahoo Lifestyle
Kmart shoppers raving about $12 kitchen item with multiple uses: 'I have three'
The popular Kmart product has quickly become a household essential. Here's why.
Parade
Selma Blair Rocks Red Bikini by the Pool As She Sends Team USA a Message
The Summer 2024 Olympics officially kick off in Pairs on Friday, July 26.
NewsWire
Men allegedly force Aussie teens to marry
Three men who allegedly forced two Aussie teenagers who were dating each other to marry have fronted court.

AI2 open sources text-generating AI models -- and the data used to train them

Latest stories

Husband found not guilty of 'brutal' wedding night rape

Rosalía goes braless and almost frees the nip in a lace naked dress

Stephen Colbert Taunts Trump With Absolutely Brutal Reminder About Melania

Is Donald Trump good at golf? We asked a professional coach to analyze his swing

Passengers slammed over 'disturbing' train act attracting $500 fine

Kamala Harris' Press Release About Donald Trump's Fox News Appearance Is Going Viral

Tennis world erupts over massive news about Novak Djokovic and Rafa Nadal at Olympics

Why Aussies being turned away from Bali

Prince William Reportedly Removes Decades-Old Position From Royal Staff

Harris campaign roasts Trump as ‘old and quite weird’ after Fox News insults

Jimmy Fallon Trolls Donald Trump With 3 Words, Over And Over Again

18 Famous "Childless Cat Ladies" And Their Thoughtful Reasons For Never Having Kids

FBI director suggests Donald Trump may not have been struck by bullet during assassination attempt at rally

Amanda Holden stuns in mini dress alongside lookalike daughters during Greek getaway

Nicole Scherzinger Sizzles in See-Thru Lace Dress With Risqué Chest Cutout in the French Riviera

Passenger refuses to let mother and child sit in her plane seat by providing controversial reason

Wife was convicted of killing her husband in violent hammer attack. She was found dead hours before sentencing

Kmart shoppers raving about $12 kitchen item with multiple uses: 'I have three'

Selma Blair Rocks Red Bikini by the Pool As She Sends Team USA a Message

Men allegedly force Aussie teens to marry