Finetuning LLMs

Finetune Llama, Mistral, Mixtral and more on one or more GPUs

LLMs are pre-trained on massive datasets of text and code. This makes them versatile for various tasks, but they may not perform optimally on your specific domain or data.

Finetuning allows you to train these models on your data, enhancing their performance and tailoring them to your unique requirements.

Fine-tuning with TrueFoundry allows you to bring your data, and fine-tune popular Open Source LLM's such as Llama 2, Mistral, Zephyr, Mixtral, and more. This is made easy, as we provide pre-configured options for resources and use the optimal training techniques available. You can choose to perform fine-tuning either using Jobs or Notebooks. You can further, easily track the progress of finetuning through ML-Repositories.

QLoRA

For fine-tuning, TrueFoundry embraces the QLoRA technique, a cutting-edge technique that revolutionizes fine-tuning by balancing power and efficiency. This technique uses clever tricks to stay compact, so you can fine-tune on smaller hardware (even just one GPU), saving time, money, and resources, all while maintaining top performance.

Pre-requisites

Before you begin, ensure you have the following:

  • Workspace:
    To deploy your LLM, you'll need a workspace. If you don't have one, you can create it using this guide: Create a Workspace or seek assistance from your cluster administrator.

Setting up the Training Data

We support two different data formats:

Chat

Data needs to be in jsonl format with each line containing a whole conversation in OpenAI Chat format

Each line contains a key called messages. Each messages key contains a list of messages, where each message is a dictionary with role and content keys. The role key can be either user, assistant or system and the content key contains the message content.

Example:

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris"}, {"role": "user", "content": "Can you be more sarcastic?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "William Shakespeare"}, {"role": "user", "content": "Can you be more sarcastic?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "384,400 kilometers"}, {"role": "user", "content": "Can you be more sarcastic?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}
...

Completion

Data needs to be in jsonl format with each line containing a json encoded string containing two keys prompt and completion.

Example:

{"prompt": "What is 2 + 2?", "completion": "The answer to 2 + 2 is 4"}
{"prompt": "Flip a coin", "completion": "I flipped a coin and the result is heads!"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
...

You can further split your data into training data and evaluation data.

Once your data is prepared, you need to store the data somewhere. You can choose where to store your data:

  • TrueFoundry Artifact: Upload it as a TrueFoundry artifact for easy access.
  • Cloud Storage: Upload it to a cloud storage service.
  • Local Machine: Save it directly on your computer.

Upload to a TrueFoundry Artifact

If you prefer to upload your training data directly to TrueFoundry as an artifact, follow the Add Artifacts via UI, and Upload your .jsonl training data file.

Upload to a cloud storage

You can upload your data to a S3 Bucket using the following command:

aws s3 cp "path-to-training-data-file-locally" s3://bucket-name/dir-to-store-file

Once done you can generate a pre-signed URL of the S3 Object using the following command:

aws s3 presign s3://bucket-name/path-to-training-data-file-in-s3
Output of uploading file to AWS S3 and getting the pre-signed URL

The output of uploading the file to AWS S3 and getting the pre-signed URL

Now you can use this pre-signed URL in the fine-tuning job / notebook.

Similarly, you can also upload to AZURE BLOB and GCP GCS.

Fine-Tuning a LLM

Now that your data is prepared, you can start the fine-tuning.

Once your data is ready, you can now start fine-tuning your LLM. Here you have two options, deploying a fine-tuning notebook for experimentation or launching a dedicated fine-tuning job.

  1. Notebooks: Experimentation Playground

Notebooks offer an ideal setup for explorative and iterative fine-tuning. You can experiment on a small subset of data, trying different hyperparameters to figure out the ideal configuration for the best performance. Thanks to the interactive setup, you can analyze the intermediate results to gain deeper insights into the LLM's behavior and response to different training parameters.

Therefore, notebooks are strongly recommended for early-stage exploration and hyperparameter tuning.

  1. Jobs: Reliable and Scalable

Once you've identified the optimal hyperparameters and configuration through experimentation, transitioning to a deployment job helps you fine-tune on whole dataset and facilitates rapid and reliable training. It ensures consistent and reproducible training runs, as well as built-in retry mechanisms automatically handle any hiccups, ensuring seamless training without manual intervention

Consequently, deployment jobs are the preferred choice for large-scale LLM finetuning, particularly when the optimal configuration has been established through prior experimentation.

Hyperparameters

Fine-tuning an LLM requires adjusting key parameters to optimize its performance on your specific task. Here are some crucial hyperparameters to consider:

  • Epochs: This determines the number of times the model iterates through the entire training dataset.
    Too many epochs can lead to overfitting, and too few might leave the model undertrained. You should start with a moderate number and increase until the validation performance starts dropping.
  • Learning Rate: This defines how quickly the model updates its weights based on errors.
    Too high can cause instability and poor performance, and too low can lead to slow learning. Start small and gradually increase if the finetuning is slow.
  • Batch Size: This controls how many data points the model processes before adjusting its internal parameters. Choose a size based on memory constraints and desired training speed. Too high can strain resources, and too low might lead to unstable updates.
  • Lora Alpha and R: These control the adaptive scaling of weights in the Lora architecture, improving efficiency for large models. These are useful parameters for generalization. High values might lead to instability, low values might limit potential performance.
  • Max Length: This defines the maximum sequence length the model can process at once.
    Choose based on your task's typical input and output lengths. Too short can truncate context, and too long can strain resources and memory.

The optimal values for these hyperparameters depend on your specific LLM, task, and dataset. Be prepared to experiment and iteratively refine your settings for optimal performance.

Fine-Tuning using a Notebook

Fine-Tuning using a Job

Before you start, you will first need to create an ML Repo (this will be used to store your training metrics and artifacts, such as your checkpoints and models) and give your workspace access to the ML Repo. You can read more about ML Repo's here

Now that your ML Repo is set up, you can create the fine-tuning job.

Deploying the Fine-Tuned Model

Once your Fine-tuning is complete, the next step is to deploy the fine-tuned LLM.

You can learn more about how to send requests to your Deploy LLM using the following guide