Introduction to LLMOps

LLMOps has become crucial specially when we are building LLM applications and deploying them in production. While building a demo with LLMs has been made quite easy with libraries like Langchain and LlamaIndex, getting an LLM to production still requires significant effort. If you are working with LLMs, you are probably adopting the approaches mentioned below and will have to solve atleast few of the corresponding issues.

Prompt Engineering

Storing all prompts and maintaining versions of prompts.
Implemeting retries, fallback while calling LLM provider APIs like Cohere, Anthropic.
LLM Model deployment if you are hosting an open source model.
Logging of all prompt response pairs for auditability and finetuning later.
Response moderation from LLMs to remove hate language and or comply with brand guidelines.
Monitoring of costs, api requests and latency
Cache queries and responses to save costs and latency.

Retrieval Augmented Generation

Write logic for data loading and chunking
Figure out which embedding and LLM Model to use.
Deploying VectorDBs
Building an feedback collection and evaluation system to evaulate your RAG accuracy.
Semantic Caching of queries

LLM Finetuning

You might need to finetune models if you have unique set of data and you want to alter the behaviour of the LLM
You want to finetune smaller LLMs to specific tasks like Classification, etc.

Truefoundry assists you in your LLM application journey when you are looking to take your demo built using Langchain/LlamaIndex to production. They key areas where Truefoundry helps in LLM development are:

Experiment with multiple LLMs (ChatGPT, Cohere, Anthropic, Llama and other open source LLMs) and embedding models using a single unified API in Truefoundry’s LLM Playground. You can get hosted endpoints for most of the popular open source models and you don’t need to deploy them to test them out.
Implement retries, fallbacks, caching for your API requests to improve reliability and latency. Check out AI Gateway.
Implement logging and monitoring for all LLM requests with an option to record feedback from the user for the requests. Check out AI Gateway.
Build a QA system over docs using RAG - Truefoundry provides a one click RAG production deployment setup which allows you to ask questions over different set of documents, allow incremental indexing and provide the best accuracy. Check out Building a RAG based QA system.
Deploying open source models on your own cloud environment - This can be the case when you cannot send your data over an API to an external provider. Truefoundry helps you deploy the LLM models in a reliable, and cost-effective manner on AWS, GCP, Azure or other cloud providers. Check out Deploy LLM.
Fine tune LLM Models - Truefoundry can help you finetune LLMs to your own data in a reliable and cost effective way by enabling finetuning on spot instances on your own infrastructure so that data doesn’t leave your environment. Check out LLM Finetuning.

Adding Alerts For Workflow Deploying LLMs

Getting Started

Service

Job

Workflow

Large Language Models (LLMs)

Workbenches: Notebooks and SSH

Volumes

Async Service

Secret Management

ML Repository

Access Control

Deploying On Your Own Cloud

HOW TOs