Truefoundry Docs

Jupyter Notebooks are probably the first tool to be used in any ML Project. TrueFoundry enables you to run Jupyter Notebooks on Kubernetes on any hardware you need, and auto-shut down when it’s not being used hence saving costs. You can read more about TrueFoundry Notebooks vs other notebook solutions here. To get started with launching a Jupyter Notebook, you can follow the steps outlined below (You can find details on the configuration options below)

The notebook will go live in a few minutes. It takes a few minutes since the image will need to be downloaded on the machine first. Once the notebook becomes live, you will see the status turn to Running and an endpoint will start showing up. Clicking on the endpoint will take you to the Jupyter Notebook.

Stop and resume your notebook

When you’re done working with a notebook instance, you can stop it to conserve resources and reduce costs. Clicking on the Stop button shuts down the notebook instance environment and releases associated resources.

Your notebook instance data (apart from the apt packages installed as a root user) persists, and you can easily restart it later by clicking the Resume button.

Configure your Notebook

In the notebook creation form, you can configure the following options:

Image

When creating a Jupyter Notebook, you can choose between three image options:

Minimal Image: A lightweight image with no pre-installed packages, providing a clean slate for customization and package installation.
Cuda 12.1 Image: A pre-configured image with Cuda 12.1 pre-installed
Custom Extended Image: You can extend an Image created by truefoundry and add your custom packages to it. You can read about extended images here.

Auto Shutdown on Inactivity

By default, Notebook instances are configured to automatically stop after 30 minutes of inactivity. This helps prevent unnecessary resource consumption when Notebook instances are left idle. You can change the Stop After (minutes of inactivity) setting within the deployment form.

Install Apt packages in the notebook

By default anything you install outside of home directory will NOT be persistent across notebook restarts. This means all apt installs (done directly from notebook) will NOT be persistent across restarts of notebook. To make these apt packages persistent across restarts, you can add a “Build Script” in the notebook as follows

For example, here is a sample build script to install ffmpeg package:

sudo apt update
sudo apt install ffmpeg

Storage Size

Specify the amount of storage space you require for your notebook instance. This is persistent storage that will be used to store your notebook files, data, and any other artifacts generated during your work.

Set resources for your Notebook

Define the computational resources allocated to your notebook instance. You can adjust the CPU and memory allocation to meet the requirements of your data science tasks.

Running a Notebook instance with GPU

In case you want to use GPU in your Notebook Instance, you can follow these steps:

Upon successful deployment, your GPU Notebook instance instance will be provisioned with the specified GPU type. You can then utilize the GPU resources to accelerate your computations, such as training deep learning models or running GPU-intensive workloads.

Access data from S3 or other clouds

In some instances, your Jupyter Notebooks may need to access data stored in S3 or other cloud storage platforms. To facilitate this access, you can employ one of two approaches:

Credential-Based Access through environment variables

This approach involves defining specific environment variables that contain the necessary credentials for accessing the cloud storage platform. For instance, to access S3, you would set environment variables for the AWS access key ID and secret access key, the environment variables being: AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY

IAM Role-Based Access through Service Account

The second approach is to provide your Notebook with a Role with the necessary permission through Service Accounts. Service Accounts provide a streamlined approach to managing access without the need for tokens or complex authentication methods. This approach involves creating a Principal IAM Role within your cloud platform, granting it the necessary permissions for your project’s requirements. Here are detailed guides for creating Principal IAM Roles in your respective cloud platforms and integrating them as Service Accounts within the workspace:

Once you’ve configured the Service Account within the workspace, you can simply toggle the Show Advanced Fields option at the bottom of the form. This will reveal an expanded set of options, from which you can select the desired Service Account using the provided dropdown menu.

Access data in volume from the notebook

Mounting a volume to a notebook allows you to access data stored on the volume from within the notebook environment. This can be useful for a variety of tasks, such as loading data for analysis, training machine learning models, and deploying applications. You can follow these steps for mounting a volume to the notebook:

Once mounted, in your Notebook you will be able to access the data in your Volume from within the Notebook

Configuring Python Environments Inside a Notebook

Truefoundry’s Deployed Notebooks by default start with a conda environment with Python Version = 3.11. In case you are working on several projects simultaneously, and you want to maintain multiple Python environments for different Python versions / different sets of tasks. You can do so by following these steps:

Command to be executed:

# Create a new conda environment named "py39" with Python 3.9
# It will take around 2 minutes for the changes to sync
# After 2 minutes, You need to hard refresh after executing this command for kernel to show up in the listing page
conda create -y -n py39 python=3.9

Launching Jupyter Notebook with Custom Images

Instead of using the pre-built Jupyter Lab Images provided by TrueFoundry, you have the flexibility to create and deploy your custom images. This allows you to pre-install specific libraries, tools, and configurations within your notebook environment, tailoring it to your specific needs and project requirements. To create a custom image, start by using the TrueFoundry Jupyter Notebook Image as a base. Below are the docker images TrueFoundry uses to support Jupyter base and Jupyter full notebooks.

Image URI	Size	Jupyter Lab	CUDA 12.4 Toolkit
`tfy.jfrog.io/tfy-images/jupyter:0.4.1-py3.11.10-sudo`	~1 GB	✅
`tfy.jfrog.io/tfy-images/jupyter:0.4.1-cu124-py3.11.10-sudo`	~6 GB	✅	✅

Latest Images

https://tfy.jfrog.io/ui/packages/docker:%2F%2Fjupyter?name=jupyter&type=packages

You can use one of these images as a base for creating your custom image while keeping some invariants unchanged. Let’s take an example where we want to customize the image by including a few apt packages like ffmpeg and a few pip packages like gradio. We will create a new Dockerfile using these existing images as the base and push to a registry for later use.

FROM truefoundrycloud/jupyter:0.2.20
# Install apt packages
RUN DEBIAN_FRONTEND=noninteractive apt install -y --no-install-recommends ffmpeg
# Install pip packages
RUN python3 -m pip install --use-pep517 --no-cache-dir

Do not overwrite the ENTRYPOINT or CMD instructions. These are built into the base images and are critical for things to work correctly

Other possible customizations can be found here Build and push the image to a registry for it to be used in TrueFoundry Notebook. Make sure the destination registry is already integrated with the TrueFoundry platform. Detailed instructions are available here

Example: Installing CUDA 11.8 and cuDNN 8

FROM truefoundrycloud/jupyter:0.2.20-sudo
ENV TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6 9.0+PTX"
ENV DEBIAN_FRONTEND=noninteractive
USER root
# Install CUDA 11.8
RUN apt update && \
    apt install -y --no-install-recommends git curl wget htop && \
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb -O /tmp/cuda-keyring_1.1-1_all.deb && \
    dpkg -i /tmp/cuda-keyring_1.1-1_all.deb && \
    apt update && \
    apt install -y --no-install-recommends cuda-toolkit-11-8 libcudnn8=8.9.7.29-1+cuda11.8 libcudnn8-dev=8.9.7.29-1+cuda11.8
USER jovyan

Git integration is already installed for you via the JupyterLab Git extension. This guide outlines how to configure Git and authenticate using two recommended methods. You can access the extention from the left panel or menu bar from the top as shown below. You can clone your repository directly from this extention or via terminal.

Before using Git, set your identity:

git config --global user.name "Your Name"\
git config --global user.email "[your.email@example.com](mailto:your.email@example.com)"

Option 1: Personal Access Token (Recommended for GitHub/GitLab)

GitHub and GitLab have deprecated password authentication. Personal Access Tokens (PATs) are the most secure and recommended way. Steps:

Generate a Token:
1. For GitHub: https://github.com/settings/tokens
2. For GitLab: https://gitlab.com/-/profile/personal_access_tokens
Scope: Select at least repo or read/write permissions
Clone or Pull a Repository: Use the Git extension in JupyterLab or the terminal:

git clone <your private repository url>

Authenticate: When prompted for a username/password, paste your username in the username section and “token” in password section

Option 2: SSH Key Authentication

Use SSH for password-less authentication after one-time setup. Follow platform-specific instructions to generate and add your SSH key:

GitHub SSH Help: https://docs.github.com/en/authentication/connecting-to-github-with-ssh
GitLab SSH Guide: https://docs.gitlab.com/user/ssh/

After setup, clone your repositories using the SSH URL:

git clone git@github.com:sample-org/sample-repo.git

Getting Started

Train and Deploy Models

Service Deployment

Job Deployment

Workflow Deployment

Async Service Deployment

Volumes

ML Repository

Platform

Deploying On Your Own Cloud

Launch Jupyter Notebook

Stop and resume your notebook

Configure your Notebook

Image

Auto Shutdown on Inactivity

Install Apt packages in the notebook

Storage Size

Set resources for your Notebook

Running a Notebook instance with GPU

Access data from S3 or other clouds

Credential-Based Access through environment variables

IAM Role-Based Access through Service Account

Access data in volume from the notebook

Configuring Python Environments Inside a Notebook

Launching Jupyter Notebook with Custom Images

Latest Images

Do not overwrite the ENTRYPOINT or CMD instructions. These are built into the base images and are critical for things to work correctly

Example: Installing CUDA 11.8 and cuDNN 8

Option 1: Personal Access Token (Recommended for GitHub/GitLab)

Option 2: SSH Key Authentication

Getting Started

Train and Deploy Models

Service Deployment

Job Deployment

Workflow Deployment

Async Service Deployment

Volumes

ML Repository

Platform

Deploying On Your Own Cloud

​Stop and resume your notebook

​Configure your Notebook

​Image

​Auto Shutdown on Inactivity

​Install Apt packages in the notebook

​Storage Size

​Set resources for your Notebook

​Running a Notebook instance with GPU

​Access data from S3 or other clouds

​Credential-Based Access through environment variables

​IAM Role-Based Access through Service Account

​Access data in volume from the notebook

​Configuring Python Environments Inside a Notebook

​Launching Jupyter Notebook with Custom Images

​Latest Images

​Do not overwrite the ENTRYPOINT or CMD instructions. These are built into the base images and are critical for things to work correctly

​Example: Installing CUDA 11.8 and cuDNN 8

​Git Login Instructions for JupyterLab Users

​Option 1: Personal Access Token (Recommended for GitHub/GitLab)

​Option 2: SSH Key Authentication

Stop and resume your notebook

Configure your Notebook

Image

Auto Shutdown on Inactivity

Install Apt packages in the notebook

Storage Size

Set resources for your Notebook

Running a Notebook instance with GPU

Access data from S3 or other clouds

Credential-Based Access through environment variables

IAM Role-Based Access through Service Account

Access data in volume from the notebook

Configuring Python Environments Inside a Notebook

Launching Jupyter Notebook with Custom Images

Latest Images

Do not overwrite the ENTRYPOINT or CMD instructions. These are built into the base images and are critical for things to work correctly

Example: Installing CUDA 11.8 and cuDNN 8

Git Login Instructions for JupyterLab Users

Option 1: Personal Access Token (Recommended for GitHub/GitLab)

Option 2: SSH Key Authentication