Uploading files and directories to a Data Directory. Downloading contents of a Data Directory to disk.
A DataDirectory
is a collection of files and folders that are on remote storage (cloud buckets like S3, GCS, Azure Blob). DataDirectories are useful for storing data that is associated with a particular ML Repository.
These differ from Artifacts as these aren’t versioned and aren’t associated with any Runs. This makes them ideal for storing data that is not going to change or is constant among runs. For example, you can store your custom data with which you want to finetune an LLM here, without having to go through the process of creating a Run and storing the data in an Artifact.
Before we can start logging our data in a Data Directory, we need to create it:
To create a Data Directory you need:
To add data in Data Directory you need to get the data_directory instance, and then log the data in it using the .add_files
method.
For this, you will require
\<source path>
, \<destination path
) to add files and folders to the DataDirectory contents. The first member of the pair should be a file or directory path and the second member should be the path inside the artifact contents to upload to.this would result in
To download data in Data Directory you need to get the data_directory instance, and then download the data in it using the .download
method.
For this you will need the:
Uploading files and directories to a Data Directory. Downloading contents of a Data Directory to disk.
A DataDirectory
is a collection of files and folders that are on remote storage (cloud buckets like S3, GCS, Azure Blob). DataDirectories are useful for storing data that is associated with a particular ML Repository.
These differ from Artifacts as these aren’t versioned and aren’t associated with any Runs. This makes them ideal for storing data that is not going to change or is constant among runs. For example, you can store your custom data with which you want to finetune an LLM here, without having to go through the process of creating a Run and storing the data in an Artifact.
Before we can start logging our data in a Data Directory, we need to create it:
To create a Data Directory you need:
To add data in Data Directory you need to get the data_directory instance, and then log the data in it using the .add_files
method.
For this, you will require
\<source path>
, \<destination path
) to add files and folders to the DataDirectory contents. The first member of the pair should be a file or directory path and the second member should be the path inside the artifact contents to upload to.this would result in
To download data in Data Directory you need to get the data_directory instance, and then download the data in it using the .download
method.
For this you will need the: