Launch Jupyter Notebook
Jupyter Notebooks are probably the first tool to be used in any ML Project. Truefoundry enables you to run Jupyter Notebooks on Kubernetes on any hardware you need, and auto-shut down when it's not being used hence saving costs. You can read more about Truefoundry Notebooks vs other notebook solutions here.
To get started with launching a Jupyter Notebook, you can follow the steps outlined below (You can find details on the configuration options below)
The notebook will go live in a few minutes. It takes a few minutes since the image will need to be downloaded on the machine first. Once the notebook becomes live, you will see the status turn to Running and an endpoint will start showing up. Clicking on the endpoint will take you to the Jupyter Notebook.
Stop and resume your notebook
When you're done working with a notebook instance, you can stop it to conserve resources and reduce costs. Clicking on the Stop button shuts down the notebook instance environment and releases associated resources.
Your notebook instance data (apart from the apt packages installed as a root user) persists, and you can easily restart it later by clicking the Resume button.
Configure your Notebook
In the notebook creation form, you can configure the following options:
Image
When creating a Jupyter Notebook, you can choose between three image options:
- Minimal Image: A lightweight image with no pre-installed packages, providing a clean slate for customization and package installation.
- Cuda 12.1 Image: A pre-configured image with Cuda 12.1 pre-installed
- Custom Extended Image: You can extend an Image created by truefoundry and add your custom packages to it. You can read about extended images here.
Auto Shutdown on Inactivity
By default, Notebook instances are configured to automatically stop after 30 minutes of inactivity. This helps prevent unnecessary resource consumption when Notebook instances are left idle. You can change the Stop After (minutes of inactivity) setting within the deployment form.
Install Apt packages in the notebook
By default anything you install outside of home directory will NOT be persistent across notebook restarts. This means all apt installs (done directly from notebook) will NOT be persistent across restarts of notebook.
To make these apt packages persistent across restarts, you can add a "Build Script" in the notebook as follows
For example, here is a sample build script to install ffmpeg package:
sudo apt update
sudo apt install ffmpeg
Storage Size
Specify the amount of storage space you require for your notebook instance. This is persistent storage that will be used to store your notebook files, data, and any other artifacts generated during your work.
Endpoint
Choose the endpoint to which your notebook instance will be deployed. This endpoint will determine the URL at which your notebook will be accessible.
Set resources for your Notebook
Define the computational resources allocated to your notebook instance. You can adjust the CPU and memory allocation to meet the requirements of your data science tasks.
Running a Notebook instance with GPU
In case you want to use GPU in your Notebook Instance, you can follow these steps:
Upon successful deployment, your GPU Notebook instance instance will be provisioned with the specified GPU type. You can then utilize the GPU resources to accelerate your computations, such as training deep learning models or running GPU-intensive workloads.
Login Credentials
Imagine a scenario where someone outside your team stumbles upon the Notebooks Endpoint. This could potentially grant them access to sensitive data or allow them to tamper with the code. Login credentials are essential in safeguarding the notebook instance from unauthorized access. Even if an unauthorized user acquires the endpoint URL, they'll be unable to access the Notebook without proper authorization.
Specify the credentials for accessing your notebook instance. This includes the username and password for logging into the notebook environment.
Access data from S3 or other clouds
In some instances, your Jupyter Notebooks may need to access data stored in S3 or other cloud storage platforms. To facilitate this access, you can employ one of two approaches:
Credential-Based Access through environment variables
This approach involves defining specific environment variables that contain the necessary credentials for accessing the cloud storage platform. For instance, to access S3, you would set environment variables for the AWS access key ID and secret access key, the environment variables being: AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
IAM Role-Based Access through Service Account
The second approach is to provide your Notebook with a Role with the necessary permission through Service Accounts.
Service Accounts provide a streamlined approach to managing access without the need for tokens or complex authentication methods. This approach involves creating a Principal IAM Role within your cloud platform, granting it the necessary permissions for your project's requirements. Here are detailed guides for creating Principal IAM Roles in your respective cloud platforms and integrating them as Service Accounts within the workspace:
- AWS: Authenticate to AWS services using IAM service account
- GCP: Authenticate to GCP using IAM serviceaccount
Once you've configured the Service Account within the workspace, you can simply toggle the Show Advanced Fields
option at the bottom of the form. This will reveal an expanded set of options, from which you can select the desired Service Account using the provided dropdown menu.
Access data in volume from the notebook
Mounting a volume to a notebook allows you to access data stored on the volume from within the notebook environment. This can be useful for a variety of tasks, such as loading data for analysis, training machine learning models, and deploying applications.
You can follow these steps for mounting a volume to the notebook:
Once mounted, in your Notebook you will be able to access the data in your Volume from within the Notebook
Configuring Python Environments Inside a Notebook
Truefoundry's Deployed Notebooks by default start with a conda environment with Python Version = 3.11.
In case you are working on several projects simultaneously, and you want to maintain multiple Python environments for different Python versions / different sets of tasks. You can do so by following these steps:
Command to be executed:
# Create a new conda environment named "py39" with Python 3.9
# It will take around 2 minutes for the changes to sync
# After 2 minutes, You need to hard refresh after executing this command for kernel to show up in the listing page
conda create -y -n py39 python=3.9
Launching Jupyter Notebook with Custom Images
Instead of using the pre-built Jupyter Lab Images provided by TrueFoundry, you have the flexibility to create and deploy your custom images. This allows you to pre-install specific libraries, tools, and configurations within your notebook environment, tailoring it to your specific needs and project requirements.
To create a custom image, start by using the TrueFoundry Jupyter Notebook Image as a base. Below are the docker images TrueFoundry uses to support Jupyter base and Jupyter full notebooks.
Image URI | Size | Jupyter Lab | CUDA 12.1 Toolkit | Common ML Libraries |
---|---|---|---|---|
public.ecr.aws/truefoundrycloud/jupyter:0.3.20-sudo | ~0.7 GB | ✅ | ||
public.ecr.aws/truefoundrycloud/jupyter:0.3.0-cu121-sudo | ~6 GB | ✅ | ✅ |
Latest Images
Please visit the following links for latest versions:
You can use one of these images as a base for creating your custom image while keeping some invariants unchanged.
Let's take an example where we want to customize the image by including a few apt packages like ffmpeg
and a few pip packages like gradio
. We will create a new Dockerfile
using these existing images as the base and push to a registry for later use.
FROM truefoundrycloud/jupyter:0.2.20
# Install apt packages
RUN DEBIAN_FRONTEND=noninteractive apt install -y --no-install-recommends ffmpeg
# Install pip packages
RUN python3 -m pip install --use-pep517 --no-cache-dir
Do not overwrite the ENTRYPOINT or CMD instructions. These are built into the base images and are critical for things to work correctly
Other possible customizations can be found here
Build and push the image to a registry for it to be used in TrueFoundry Notebook. Make sure the destination registry is already integrated with the TrueFoundry platform. Detailed instructions are available here
Example: Installing CUDA 11.8 and cuDNN 8
FROM truefoundrycloud/jupyter:0.2.20-sudo
ENV TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6 9.0+PTX"
ENV DEBIAN_FRONTEND=noninteractive
USER root
# Install CUDA 11.8
RUN apt update && \
apt install -y --no-install-recommends git curl wget htop && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb -O /tmp/cuda-keyring_1.1-1_all.deb && \
dpkg -i /tmp/cuda-keyring_1.1-1_all.deb && \
apt update && \
apt install -y --no-install-recommends cuda-toolkit-11-8 libcudnn8=8.9.7.29-1+cuda11.8 libcudnn8-dev=8.9.7.29-1+cuda11.8
USER jovyan
Updated 1 day ago