Fine-tune Llama2 with GPT Generated Dataset
Jimmy Rousseau
Author: Jimmy Rousseau | Published: 8/25/2023

Fine tune Llama 2 with QLora using PEFT and SFT

This shows you how to use ChatGPT to generate a synthetic dataset to train Llama2 on, Checkout the Colab Notbook Here from Matt Shumer

Data Generation

Write your prompt here. Make it as descriptive as possible!

Then, choose the temperature (between 0 and 1) to use when generating data. Lower values are great for precise tasks, like writing code, whereas larger values are better for creative tasks, like writing stories.

Finally, choose how many examples you want to generate. The more you generate, a) the longer it takes and b) the more expensive data generation will be. But generally, more examples will lead to a higher-quality model. 100 is usually the minimum to start.

Run this to generate the dataset.

We also need to generate a system message

Now let's put our examples into a dataframe and turn them into a final pair of datasets.

Split into train and test sets.

Install necessary libraries

Define Hyperparameters

Load Datasets and Train