Fine tune Llama 2 with QLora using PEFT and SFT
This shows you how to use ChatGPT to generate a synthetic dataset to train Llama2 on, Checkout the Colab Notbook Here from Matt Shumer
Data Generation
Write your prompt here. Make it as descriptive as possible!
Then, choose the temperature (between 0 and 1) to use when generating data. Lower values are great for precise tasks, like writing code, whereas larger values are better for creative tasks, like writing stories.
Finally, choose how many examples you want to generate. The more you generate, a) the longer it takes and b) the more expensive data generation will be. But generally, more examples will lead to a higher-quality model. 100 is usually the minimum to start.
Run this to generate the dataset.
We also need to generate a system message
Now let's put our examples into a dataframe and turn them into a final pair of datasets.
Split into train and test sets.