Uploading files and directories to a Data Directory. Downloading contents of a Data Directory to disk.
DataDirectory
is a collection of files and folders that are on remote storage (cloud buckets like S3, GCS, Azure Blob). DataDirectories are useful for storing data that is associated with a particular ML Repository.
These differ from Artifacts as these aren’t versioned and aren’t associated with any Runs. This makes them ideal for storing data that is not going to change or is constant among runs. For example, you can store your custom data with which you want to finetune an LLM here, without having to go through the process of creating a Run and storing the data in an Artifact.
.add_files
method.
For this, you will require
\<source path>
, \<destination path
) to add files and folders to the DataDirectory contents. The first member of the pair should be a file or directory path and the second member should be the path inside the artifact contents to upload to..download
method.
For this you will need the: