Image by Author
The war between open-source and closed-source has been going on for a while. After OpenAI launched GPT-3 as a close source model, EleutherAI launched an open-source alternative called GPT-Neo that has provided comparative results. Similarly, when DALL·E 2 was launched an open-source version of DALL·E 2 was released by Stability AI called Stable Diffusion.
We all know about ChatGPT and how people are craving to get an open-source version of the model and build their applications safely with more control. Currently, ChatGPT is offering API access and the ability to fine-tune, but you will be using their service and machine to perform all kinds of tasks.
On March 10, 2023, Together Computer released the open-source version of ChatGPT called OpenChatKit. An open-source alternative allows developers to have more control over the chatbot’s behavior and tailor it to their specific needs. Moreover, it is more accessible to a wider range of users and communities, particularly those who may not have the resources to access proprietary models.
OpenChatKit provides an open-source, powerful set of tools to create generalized and specialized chatbot applications. It is the first version of the model, and developers have released a set of tools and processes to improve the model with the help of community contribution.
Together Computer has released OpenChatKit 0.15 under an Apache-2.0 license that comes with source code, model weights, and training datasets.
You can try the based model demo on Hugging Face: OpenChatKit. It is similar to ChatGPT, where you write a prompt, and the model responds to you with the answer, code block, tables, or text.
Image by Author | OpenChatKit
OpenChatKit comes with the base bot and the building blocks to create customized chatbot applications from the base.
The kit consists of 4 components:
- Instruction-tuned large language model that is fine-tuned for a chat from EleutherAI’s GPT-NeoX-20B.
- Instruction on fine-tuning the model to achieve high accuracy on particular tasks.
- An extensible retrieval system for updating the bot response using knowledge from Wikipedia, news feeds, or sports scores.
- Fine-tuned from GPT-JT-6B for moderation purposes to filter out which questions the bot responds to.
The base of OpenChatKit is a large language model called GPT-NeoXT-Chat-Base-20B. It is based on EleutherAI’s GPT-NeoX model and fine-tuned on 43 million high-quality conversational instructions. The developer team has particularly focused on tuning several tasks such as multi-turn dialogue, question answering, classification, extraction, and summarization.
Image from TOGETHER
Out of the box, the model provides a strong base. As we can see, it has higher scores than its base model GPT-NeoX on the HELM benchmark. The GPT-NeoXT-Chat-Base-20B model has performed quite well on the question and answer, extraction, and classification tasks.
It is the first version of the model, and you will see a lot of mistakes, bugs, and appropriate answers. In this session, we will review a few areas that the model is struggling to understand.
- Knowledge-based: The chatbot might give factually incorrect results. ChatGPT has the same issues. The team is working on a retrieval system that will update the wrong information.
- Code-based: The model was not trained on a large enough corpus of source code to write accurate code. You might get frustrated.
- Context switching: If you start talking about something else during the conversation, the chatbot will not automatically switch the topic and keep giving you answers related to previous topics.
- Repetition: the chatbot sometimes repeats the response or gets stuck. You can refresh the page to reset it.
- Creative answers: Unlike ChatGPT, the chatbot does not generate essays or creative stories. It is limited to short responses.
OpenChatKit is a good initiative, and with the help of the community, we can see a better version of the chatbot soon. If you are expecting OpenChatKit to repose like ChatGPT or provide amazing answers, you will get disappointed as it is in the early stages, and it was trained on a less diverse dataset.
In this post, we have learned valuable insights about the open-source version of ChatGPT, which is great news for developers and the data science community. Moreover, we have explored how it works and delved into the four components of the kit that can help create a fully customizable chatbot, equipped with the latest news updates and moderation capabilities.
Try the demo and read more about the model to get information about model fine-tuning and other essential tools.
Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master’s degree in Technology Management and a bachelor’s degree in Telecommunication Engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.