An instruction-following large language model based on pythia-70m and trained on Databricks’ 15k instruction with capability domains from the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA and summarization.

This model is an experiment in using small base model (pythia-70m) to build a model similar to Databricks’ dolly model.


To use the model with the transformers library, first make sure you have the transformers and accelerate libraries installed :

%pip install "accelerate>=0.16.0,<1" "transformers[torch]>=4.28.1,<5" "torch>=1.13.1,<2"
import torch
from transformers import pipeline

generate_text = pipeline(model="databricks/dolly-v2-12b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")

res = generate_text("What is the capital of France ?")



The model was trained using Databricks’ 15k instruction on a recent Dell PC with 32G of RAM with a core i7 CPU. The training took around 12 hours !


As expected the model performance is very bad ! Especially when compared to Databricks dolly v2 12b model.

When prompted with What is the capital of France ?, the model answers with :

"The World". It is an artwork for "working time" called «The Middle East Today". It comes from Paris, Belgium, in local variation, including large cities as described in English language photographs which portray a crescent and sunrise of late note, Bangourt before Paris.
“Countries like Pakistan and throughout East Africa close to Australia have constructed a watered havock which can be felt ever longer. Bombardment and booby traps tend to occupy space by wind and water, as were effectively used for material and equipment which have a green signal leading in the images."

Compared with the following asnwer from Databricks dolly v2 3b model

The capital of France is Paris.


The accuracy between the base model used in this model (pythia-70m) and the base models used by Databricks (pythia-2.8b and pythia-12b) is huge ! And it makes all the difference in terms of accuracy. The only thing worth mentioning here is the model’s size, at around 160M it’s orders of magnitude smaller than the Databricks ones.