Can Dolly 2.0 Disrupt OpenAI’s Hold on Language Models? Databricks Aims to Find Out

Can Dolly 2.0 Disrupt OpenAI’s Hold on Language Models? Databricks Aims to Find Out

There’s a new player in the language model market trying to challenge OpenAI’s dominance.

On Wednesday, Databricks, a San-Francisco based start-up that develops software used to construct AI systems, publicly released a substantial data set that can be used to train chatbots like ChatGPT — for free. With the open-source release of Dolly 2.0, Databricks has emerged as a fresh contender in the language model industry, aiming to compete with OpenAI’s dominance. 

Databricks Unveils Dolly 2.0 

In a move company officials said is aimed at making artificial intelligence more accessible for all, Databricks released Dolly 2.0, a large language model (LLM) trained on a human-generated dataset, curated from its own employees, and licensed for research and commercial use.

“We are open-sourcing the entirety of Dolly 2.0, including the training code, the dataset, and the model weights, all suitable for commercial use,” company officials said in a statement. “This means that any organization can create, own, and customize powerful LLMs that can talk to people, without paying for API access or sharing data with third parties.”

According to Databricks, Dolly 2.0 is a language model with 12 billion parameters, built on the EleutherAI pythia model family, that has been exclusively fine-tuned on a new, premium-quality dataset comprising human-generated instructions, crowdsourced from Databricks employees.

Related Article: How to Choose the Best Chatbot for Your Business

Databricks’ New Language Model Dolly 2.0 Aims to Disrupt OpenAI’s Reign 

The announcement comes just two weeks after the launch of Dolly, an LLM trained on ChatGPT data, that couldn’t be employed in commercial applications because OpenAI policies prevent the usage of its data to build any commercial AI systems that could rival its offerings.

“When we first released Dolly two weeks ago, we were immediately flooded with requests from people who wanted to try it out. The number one question was —can I use this commercially?” Databricks CEO Ali Ghodsi said. “Dolly was originally trained using a dataset that the Stanford Alpaca team had created with the OpenAI API (and its terms of service seek to prevent anyone from creating a model that competes with OpenAI), the answer was unfortunately, no.”

And that, Ghodsi said, led the company to create their own dataset (databricks-dolly-15k), that could be commercially used, so Dolly 2.0 could be fine-tuned exclusively on a new, human-generated dataset.  

“We’re deeply committed to making it simple for customers to use LLMs, so expect both a continued investment in open source, as well as innovations that help accelerate the application of LLMs to key business challenges,” Ghodsi said. “We also hope that the community takes the baton and helps improve our dataset and builds better models that can solve various tasks for businesses and organizations.”


Source link