Introduction
You may have heard something about a Llama over the past couple weeks. Llama is an open-source (ish) large language model from Facebook. Similar to Stable Diffusion, the open source community has rallied to make Llama better and more accessible.
A recent paper from the Tatsu Lab introduced Alpaca, a "instruction-tuned" version of Llama. You can think of Llama as the original GPT-3. To effectively prompt Llama requires that you treat it like autocomplete i.e having the user write the first couple words of their desired output. Instruction-tuning, however, teaches the base model (Llama) to follow instructions. This enables a user experience similar to ChatGPT, where a user can ask a question and receive a response.
There's been a lot of fantastic work with the 7B and 13B parameter models. I wanted to release a fine-tuned version of the 30B parameter model on the Alpaca dataset, which empirically should perform better and be more capable than the smaller models. Below are the steps on how you can do the same. If you want to get straight to using the open-source model, you can run the following snippet (on an A100 40GB at least) or view the model here.
pip install git+https://github.com/huggingface/transformers
pip install bitsandbytes peft sentencepiece
git clone https://github.com/aspctu/alpaca-lora
cd alpaca-lora
python generate.py
If you've only got around 10GB of GPU memory, you can run inference on the 7B param model:
python generate.py --path_to_lora_adapters="tloen/alpaca-lora-7b" --pretrained_model="decapoda-research/llama-7b-hf"
Fine-tuning
The following section is a quick guide to how you can fine-tune your own Alpaca. Fine-tuning is a high-leverage way to improve a model's capabilities on a given task. Similar to how the Alpaca dataset teaches Llama to follow instructions, you can use fine-tuning to improve Llama on a specific task (like QA, summarization, or domain-specific generation).
Dataset
For any sort of fine-tuning (training an existing model on a smaller dataset), you need a dataset. Thanks to the Tatsu Lab, they open-sourced their Alpaca dataset which is what we'll use. However, after looking around, various issues were found so we opted to use a cleaned version of the dataset. Thanks to GitHub user gururise for going through and cleaning the dataset.
curl -o alpaca_data.json https://raw.githubusercontent.com/gururise/alpaca-lora/992a3be8ab4dcde90d7d67d65b1f177fa7e2b5ac/alpaca_data.json
There does seem to still be issues with the Alpaca dataset. It has a lot of duplicate answers and hallucinates a bit. Using other instruction-tuning datasets are on my roadmap to see if I can improve the model even more
Fine-tuning
We can use a modified version of GitHub user tloen’s repo to train Llama. Our fork changes a couple variables to accommodate the larger 30B model on 1xA100 80GB.
To fine-tune a 30B parameter model on 1xA100 with 80GB of memory, we'll have to train with LoRa. LoRa is a parameter-efficient training process that allows us to train larger models on smaller GPUs. It does this by freezing the layers of the pretrained model (in this case Llama) and performing a low-rank decomposition on those matrices. LoRa finds two smaller matrices that can approximate the original matrix and then fine-tunes the decomposed matrices.
Let's get our environment setup:
pip install git+https://github.com/huggingface/transformers
pip install bitsandbytes peft sentencepiece accelerate datasets
git clone https://github.com/aspctu/alpaca-lora
cd alpaca-lora
To kick off the job, you can just run:
python finetune.py --data_path="path/to/alpaca_data.json" --report_to="none"
By default, we enable WandB logging. If you want to use this, you'll need to install WandB and login with your API key
pip install wandb
wandb login
python finetune.py --data_path="path/to/alpaca_data.json"
Inference
To use your fine-tuned model (or to use our open-source model), you can use the following code snippet (also adapted from GitHub user tloen), you can run the generate.py
script in the forked Alpaca repo here.
Just like above, you can run inference with your own adapters by passing 2 arguments to the generate.py
script:
pip install git+https://github.com/huggingface/transformers
pip install bitsandbytes peft sentencepiece accelerate
git clone https://github.com/aspctu/alpaca-lora
cd alpaca-lora
python generate.py --path_to_lora_adapters="path/to/adapters/directory" --pretrained_model="the_base_model_you_trained_with"
If you have any questions or want to show off something cool you built with Alpaca-30B, feel free to reach out on Twitter.
Thanks for the instructions! I was able to run 13b model in google colab; I tried to finetune with my own data about 20 instructions(in .json format); I was able to run it but it is very, very slow when I point to the new lora-alpaca adapter created in fine tuning. Any ideas what could be happening? btw, using premium GPU with 40GB RAM and it is using all of it..
FYI, the cleaning of the dataset is on-going. Many hundreds more fixes have been made since the git-commit mentioned in the article. The latest cleaned alpaca dataset lives here: https://github.com/gururise/AlpacaDataCleaned