Caching Huggingface models using volumes

Reduce startup times across pod restarts by caching models

To enhance startup times and prevent model download delays in services during pod restarts, while utilising Hugging Face models from the Huggingface Transformers library, follow these steps on TrueFoundry:

  1. Create a Persistent Volume: Set up a persistent volume with a size matching the size of the models and other downloaded artefacts. To do this, go to New Deployments > Volume and enter the details. These are the recommended Storage Class per cloud provider:
    1. AWS: efs-sc
    2. Azure: azurefile
    3. GCP: standard-rwo or premium-rwo
Creating a persistent volume on Azure cloud

Creating a persistent volume on Azure cloud

  1. Mount the Volume: While creating/editing your TrueFoundry Service, make sure to mount the created persistent volume at a preferred path, such as /data/huggingface.
Attaching volume to service deployment

Attaching volume to service deployment

  1. Set HF_HOME: As you create the service, include an environment variable named HF_HOME and assign it the same path as the mounted persistent volume. This ensures that the any artifact downloaded from Huggingface will be stored in this folder.
Setting HF_HOME variable

Setting HF_HOME variable

  1. Deploy the Service: Proceed to deploy your service. With the model cached in the persistent volume, subsequent pod restarts won't require re-downloading the model, resulting in reduced startup times.