When we run model training jobs, the job will need to download the data from somewhere in order to be able to train the model. While doing this once is fine, but in case if we are planning to run multiple training jobs with different hyperparameters or other settings on the same set of data, it will be wasteful from a time and cost perspective for every job to download the same set of data again and again.This is where volumes can help. We can download the data once to a volume and then mount it to multiple jobs. They can then pickup the data from the volume as if it was downloaded on disk at the location where the volume is mounted.
To use a persistent volume, we will first need to create one and then attach it to our deployments. You can learn how to create volumes using the Creating a Volume guide
Once you’ve attached a volume to your deployment, you can use it like any other directory within your job’s workflow. For instance, if you’ve downloaded a dataset, named dataset.csv, to a volume mounted at /data, you can directly access it from within the Job as shown in the code below:
Copy
Ask AI
import osimport pandas as pd...# Load the dataset from the mounted volumedata = pd.read_csv("/data/dataset.csv")# Train the model...