How to use Chroma to store and query vector embeddings
 
  
  Key Points
What is Chroma and how does it enhance Large Language Models (LLMs)?
Chroma is an open-source embedding database that efficiently stores and queries vector embeddings, thereby providing relevant context to user inquiries in AI applications. This enhances the performance and relevance of Large Language Models by enabling them to access and utilize contextual information.
What are the prerequisites for setting up Chroma in server mode?
To set up Chroma in server mode, you need to install Git, Chroma, PyTorch, Transformers, Docker, and Docker Compose on your system.
How do you start the Chroma server after cloning the repository?
After cloning the Chroma repository, navigate to the root of the 'chroma' directory and run the command 'docker compose up --build' to start the server with uvicorn, which makes port 8000 accessible.
What is the purpose of the 'CustomEmbeddingFunction' class in the example project?
The 'CustomEmbeddingFunction' class inherits from Chroma's 'EmbeddingFunction' and is designed to tokenize input text and generate vector embeddings using a pre-trained model, thus allowing custom embedding logic to be implemented in the application.
How does Chroma enable querying for similar documents?
Chroma allows users to query collections by comparing input queries against stored documents using cosine similarity. The results are returned with similarity scores, helping users identify the most relevant documents quickly.
Chroma is an open-source embedding database designed to store and query vector embeddings efficiently, enhancing Large Language Models (LLMs) by providing relevant context to user inquiries. In this tutorial, I will explain how to use Chroma in persistent server mode using a custom embedding model within an example Python project. The companion code repository for this blog post is available on GitHub.
Prequisites
Here are the items that you need to have installed before continuing with this tutorial:
- Git installed on your system (for cloning Chroma).
- Chroma (for our example project), PyTorch and Transformers installed in your Python environment.
- Docker installed on your system.
- Docker Compose also installed on your system.
Setting Up Chroma
Before diving into the code, we need to set up Chroma in server mode.
Create a new project directory for our example project. Next, we need to clone the Chroma repository to get started. At the root of your project directory let’s clone Chroma into it:
git clone git@github.com:chroma-core/chroma.gitThis will create a subdirectory chroma inside of your current project directory. Once you’ve cloned the Chroma repository, navigate to the root of the chroma directory and run the following command at the root of the chroma directory to start the server:
docker compose up --buildThis will set up Chroma and run it as a server with uvicorn, making port 8000 accessible outside the net docker network. The command also mounts a persistent docker volume for Chroma’s database, found at chroma/chroma from your project’s root.
I won’t cover how to implement authentication with chroma in server mode, to keep this blog post simpler and more focused on exploring Chroma’s functionality. More information on chroma authentication.
Next, ensure that the server is running by executing in another terminal:
curl http://localhost:8000/api/v1/heartbeatYou should get a response like:
{"nanosecond heartbeat":1696129725137410131}Now that the chroma server is running, let’s move onto our example Python app project for creating, storing and querying vector embeddings.
Embedding Generation
In embedding_util.py, used by our app.py module, we define a custom embedding class (that I am calling CustomEmbeddingFunction) by inheriting chroma’s EmbeddingFunction class and leveraging the Transformers library. This function tokenizes the input text and generates embeddings using a pre-trained model, in this case, thenlper/gte-base one of the currently top performing open source embedding models - and very runnable on many consumer hardware devices. The inspiration the implementation of generate_embeddings came from the gte-base model card on Hugging Face.
class CustomEmbeddingFunction(EmbeddingFunction):
    def __call__(self, texts: Documents) -> Embeddings:
        return list(map(generate_embeddings, texts))Creating the Chroma Client
Now in app.py, we import the necessary modules and create a chroma client by specifying the host and port where the Chroma server is running.
from chromadb import HttpClient
from embedding_util import CustomEmbeddingFunction
client = HttpClient(host="localhost", port=8000)Testing our client with the following heartbeat check:
print('HEARTBEAT:', client.heartbeat())Creating Collections and Adding Documents
Once the chroma client is created, we need to create a chroma collection to store our documents. A collection can be created or retrieved using get_or_create_collection method.
collection = client.get_or_create_collection(
    name="test", embedding_function=CustomEmbeddingFunction())After creating the collection, we can add documents to it. Here, I’ve added an array of documents related to various topics, each assigned a unique ID.
documents = [
    "A group of vibrant parrots chatter loudly, sharing stories of their tropical adventures.",
    "The mathematician found solace in numbers, deciphering the hidden patterns of the universe.",
    "The robot, with its intricate circuitry and precise movements, assembles the devices swiftly.",
    "The chef, with a sprinkle of spices and a dash of love, creates culinary masterpieces.",
    "The ancient tree, with its gnarled branches and deep roots, whispers secrets of the past.",
    "The detective, with keen observation and logical reasoning, unravels the intricate web of clues.",
    "The sunset paints the sky with shades of orange, pink, and purple, reflecting on the calm sea.",
    "In the dense forest, the howl of a lone wolf echoes, blending with the symphony of the night.",
    "The dancer, with graceful moves and expressive gestures, tells a story without uttering a word.",
    "In the quantum realm, particles flicker in and out of existence, dancing to the tunes of probability."]
# Every document needs an id for Chroma
document_ids = list(map(lambda tup: f"id{tup[0]}", enumerate(documents)))
collection.add(documents=documents, ids=document_ids)Querying the Collection
With our documents added, we can query the collection to find the most similar documents to a given query. Below, we execute a query and print the most similar documents along with their distance scores, which we will calculate cosine similiarty from with 1 - cosine distance. The higher the cosine similarity, the more similiar the given document is to the input query.
This is particularly useful for developing applications like AI-driven customer support agents, especially when utilizing existing collections of help documentation or e-commerce product listings.
result = collection.query(query_texts=[query], n_results=5, include=["documents", 'distances',])
for id_, document, distance in zip(ids, documents, distances):
    print(f"ID: {id_}, Document: {document}, Similarity: {1 - distance}")Running the Example
To run our example app, first, ensure you’ve installed the dependencies listed in the requirements.txt file, and then run app.py using a modern Python 3 version (This example project was tested with Python version 3.9.6).
python app.pyYou should see output printed similar to the following:
HEARTBEAT: 1696127501102440278
Query: Give me some content about the ocean
Most similar sentences:
ID: id6, Document: The sunset paints the sky with shades of orange, pink, and purple, reflecting on the calm sea., Similarity: 0.6018089274366792
ID: id4, Document: The ancient tree, with its gnarled branches and deep roots, whispers secrets of the past., Similarity: 0.5219426511858611
ID: id0, Document: A group of vibrant parrots chatter loudly, sharing stories of their tropical adventures., Similarity: 0.5164872313681625
ID: id7, Document: In the dense forest, the howl of a lone wolf echoes, blending with the symphony of the night., Similarity: 0.48931321779282144
ID: id1, Document: The mathematician found solace in numbers, deciphering the hidden patterns of the universe., Similarity: 0.4799339689190174Chroma orders the output by similarity to the input query - thus vector search with results sorted by similarity.
Conclusion
Chroma provides a versatile and efficient platform for managing vector embeddings, allowing developers to easily integrate advanced search and similarity features into their applications. By following this tutorial, you can set up and interact with Chroma to explore its capabilities and adapt them to suit your project needs.
For more details and resources, visit Chroma’s official documentation and GitHub repository.
This blog post’s companion code repository is available on GitHub.
Questions or comments? Feel free to contact me or connect on social media!