👍
What you'll learn

How to set up a complete question-answering system using the Docs QA Playground.

Creating a ML Repo and configuring it.

Deploying various components such as Indexer Job, Backend service, and Frontend service.

Using Truefoundry for streamlined deployment.

Configuring essential environment variables for seamless operation.

Overview

The Docs QA Playground is a powerful system designed for question-answering tasks. It involves various components working together to enable efficient document retrieval and answering of user queries. This documentation provides a detailed overview of the architecture and instructions on how to deploy this system using Truefoundry, a Kubernetes-based platform.

Docs QA Playground

Architecture

To deploy the complete workflow, we need to set up various components. Here's an overview of the architecture:

High-level description of all components

Document Store:

The Document Store is where your documents will be stored. Common options include AWS S3, Google Storage Buckets, or Azure Blob Storage. In some cases, data might come in from APIs, such as Confluence docs.

Indexer Job:

The Indexer Job takes the documents as input, splits them into chunks, calls the embedding model to embed the chunks, and stores the vectors in the VectorDB. The embedding model can be loaded in the job itself or accessed via an API to ensure scalability.

Embedding Model:

If you're using OpenAI or an externally hosted model, you don't need to host a model. However, if you opt for an open-source model, you'll have to deploy it in your cloud environment.

LLM Model:

For OpenAI or hosted model APIs like Cohere and Anthropic, there's no need for additional deployment. Otherwise, you'll need to set up an open-source LLM.

Query Service:

A FastAPI service provides an API to list all indexed document collections and allows users to query over these collections. It also supports triggering new indexing jobs for additional document collections.

VectorDB:

You can use a hosted solution like PineCone or host an open-source VectorDB like Qdrant or Milvus to efficiently retrieve similar document chunks.

Metadata Store:

This store is essential for managing links to indexed documents and storing the configuration used to embed the chunks in those documents.

Setup with Truefoundry:

Truefoundry, a Kubernetes-based platform, simplifies the deployment of ML training jobs and services at an optimal cost. You can deploy all the components mentioned above on your own cloud account using Truefoundry. The final deployment will be a streamlined and powerful system ready to handle your question-answering needs.

Deploy on TrueFoundry

To be able to use Ask Questions on your own documents, follow the steps below:

Firstly clone the following repo https://github.com/truefoundry/docs-qa-playground.git via the following command
git clone https://github.com/truefoundry/docs-qa-playground.git
Register at TrueFoundry, follow here
- Fill up the form and register as an organization (let's say <org_name>)
- On Submit, you will be redirected to your dashboard endpoint ie https://<org_name>.truefoundry.cloud
- Complete your email verification
- Login to the platform at your dashboard endpoint ie. https://<org_name>.truefoundry.cloud
Note: Keep your dashboard endpoint handy, we will refer it as "TFY_HOST" and it should have structure like "https://<org_name>.truefoundry.cloud"
Setup a cluster, use TrueFoundry managed for quick setup
- Give a unique name to your Cluster and click on Launch Cluster
- It will take few minutes to provision a cluster for you
- On Configure Host Domain section, click Register for the pre-filled IP
- Next, Add a Docker Registry to push your docker images to.
- Next, Deploy a Model, you can choose to Skip this step
Add a Storage Integration
Create a ML Repo
- Navigate to ML Repo tab
- Click on + New ML Repo button on top-right
- Give a unique name to your ML Repo (say 'docs-qa-llm')
- Select Storage Integration
- On Submit, your ML Repo will be created
  
  For more details: link
Create a Workspace
- Navigate to Workspace tab
- Click on + New Workspace button on top-right
- Select your Cluster
- Give a name to your Workspace (say 'docs-qa-llm')
- Enable ML Repo Access and Add ML Repo Access
- Select your ML Repo and role as Project Admin
- On Submit, a new Workspace will be created. You can copy the Workspace FQN by clicking on FQN.
For more details: link
Generate an API Key
- Navigate to Settings > API Keys tab
- Click on Create New API Key
- Give any name to the API Key
- On Generate, API Key will be gererated.
- Please save the value or download it
  
  Note: we will refer it as "TFY_API_KEY"
  
  For more details: https://docs.truefoundry.com/docs/generate-api-key
In order to use default OpenAI embedder. Please get an OpenAI API Key. You can get your API Key here
Open your Terminal on parent folder
Install our servicefoundry cli
```
pip install servicefoundry
```

sfy login --host <paste your TFY_HOST here>

Fetch your Workspace FQN for the workspace we created at Step 5

Setup Vector DB, in our case we will deploy QDrant

servicefoundry deploy --workspace_fqn <paste your Workspace FQN here> --file qdrant.yaml --no-wait

Deploy Indexer Job

Edit the indexer.yaml and add following environment variables (Please replace your workspace name with the placeholder)
```
env:
    OPENAI_API_KEY: <OpenAI API Key>
    QDRANT_URL: qdrant.<workspace_name>.svc.cluster.local
```

Deploy the Indexer job

sfy deploy --workspace_fqn <paste your Workspace FQN here> --file indexer.yaml --no-wait

For more details: link

Deploy Backend service

Edit serve.yaml and add the values of environment variables (Please fill in the placeholders with required information)

env:
    OPENAI_API_KEY: <OpenAI API Key>
    ML_REPO: <paste your ML_Repo name>
    QDRANT_URL: qdrant.<workspace_name>.svc.cluster.local
    TFY_API_KEY: <TFY_API_KEY>
    TFY_HOST: <TFY_HOST>
...

Deploy the Backend service

sfy deploy --workspace_fqn <paste your workspace fqn here> --file serve.yaml --no-wait

Deploy Frontend service
- Fetch host for your frontend: navigate to Integrations > Clusters, copy the Base Domain URL from your cluster card
- Edit frontend.yaml and add host
```
ports:
- host: <host>
...
```
- Fetch JOB_FQN: navigate to Deployments > Jobs, click on your job llm-qa-indexer and copy the Application FQN from the details
- Edit frontend.yaml and add the values of environment variables (Please fill in the placeholders with required information)
```
env:
    JOB_FQN: <JOB_FQN>
    ML_REPO: <ML_Repo name>
    TFY_API_KEY: <TFY_API_KEY>
    BACKEND_URL: http://llm-qa-backend.<workspace_name>.svc.cluster.local:8000
    TFY_HOST: <TFY_HOST>
...
```
- Deploy the Frontend service
```
sfy deploy --workspace_fqn <paste your Workspace FQN here> --file frontend.yaml --no-wait
```
Visit your QnA playground
- Navigate to Deployments > Services
- Click on the Endpoint for your service

👍What you'll learn

Overview

Docs QA Playground

Architecture

Document Store:

Indexer Job:

Embedding Model:

LLM Model:

Query Service:

VectorDB:

Metadata Store:

Setup with Truefoundry:

Deploy on TrueFoundry

👍
What you'll learn