How to use Weaviate to store and query vector embeddings
What you will learn
- What is the main focus of combining vector databases with pre-trained large language models?
- The main focus is to deliver unprecedented user experiences by merging the capabilities of large language models with the context of specific data.
- What is Weaviate, and why is it significant in this tutorial?
- Weaviate is an open-source vector database introduced in the tutorial for embedding texts into vectors, storing them, and performing semantic search to find the most contextually similar documents.
- How is the `GTE base` embedding model utilized in the tutorial?
- `GTE base` from Alibaba, is used for creating vector embeddings of texts, which are then stored in Weaviate for semantic search operations.
- How do you perform a health check on the Weaviate instance in the tutorial?
- A health check on the Weaviate instance is performed by using the `client.is_ready()` command to ensure Weaviate is ready and operational.
- What are the steps involved in querying Weaviate for the most similar documents based on a query?
- Querying involves embedding the query using the same model for semantic compatibility, retrieving `DocumentSearch` objects with `source_text`, specifying the vector and a minimum certainty for filtering, and limiting the result to a defined number of similar documents.
By combining vector databases with pre-trained large language models, you can deliver unprecedented user experiences, merging the capabilities of LLMs with the context of your specific data.
In this tutorial, I introduce Weaviate, an open-source vector database, with the thenlper/gte-base embedding model from Alibaba, through Hugging Face’s transformers library.
The example project for this blog post demonstrates how to embed texts into vectors, store them in Weaviate, and perform semantic search to find the most contextually similar documents to the input query. All of the code for this blog post can be found on GitHub at the companion code repository.
Setup and Preparing the Embedding Model
Before getting into the application code, we need to have a working Weaviate server running.
Setting Up Weaviate Locally with Docker Compose
Running Weaviate locally for development can be streamlined using Docker Compose.
The following section explains how to utilize Docker Compose to spin up a Weaviate instance, configuring it according to our needs, and ensuring that data is persisted across restarts by mounting a local directory.
Docker Compose Configuration
We use the following docker-compose.yml
, taken directly from the Weaviate Docker Compose docs to define our service. I suggest copying this file into the root of a new directory for this project:
version: "3.4"
services:
weaviate:
command:
- --host
- 0.0.0.0
- --port
- "8080"
- --scheme
- http
image: semitechnologies/weaviate:1.21.2
ports:
- 8080:8080
volumes:
- ./data:/var/lib/weaviate
restart: on-failure:0
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
PERSISTENCE_DATA_PATH: "/var/lib/weaviate"
DEFAULT_VECTORIZER_MODULE: "none"
# While these are the default enabled modules from the Weaviate docs,
# we won't be using these but instead our custom embedding model, GTE-base
ENABLE_MODULES: "text2vec-cohere,text2vec-huggingface,text2vec-palm,text2vec-openai,generative-openai,generative-cohere,generative-palm,ref2vec-centroid,reranker-cohere,qna-openai"
CLUSTER_HOSTNAME: "node1"
Running Weaviate with Docker Compose
Once the docker-compose.yml
file is set up, navigate to your directory containing this file and run the following command to start the Weaviate server:
docker-compose up weaviate
This command pulls the specified Weaviate Docker image (if not already local), creates the container, and starts it with the specified settings. Your Weaviate instance should now be accessible at http://localhost:8080
.
With our Weaviate server running, we can move onto the Python code of our application.
embedding_util.py
Let’s continue with understanding how we are encapsulating our embedding model, the GTE base
text embedding model, in our embedding_util.py
python module.
Importing Necessary Libraries
We need to import the necessary libraries and modules first:
transformers
: To use pre-trained models.torch
andtorch.nn.functional
: For tensor operations and functional API.os
: To manipulate the Python runtime environment.warnings
: To manage warnings during runtime.
The companion code repository for this blog post includes a requirements.txt
file, for installing these Python dependencies.
from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F
from torch import Tensor
import os
import warnings
Handling Warnings and Parallelism
To avoid unnecessary warnings from the transformers library and manage parallelism surrounding usage of our tokenizer, warnings of category ResourceWarning
are ignored, and tokenizers parallelism is disabled for simplicity. Our application is single-threaded, so we will only have one thread calling the tokenizer.
# The transformers library internally is creating this warning, but does not
# impact our app. Safe to ignore.
warnings.filterwarnings(action='ignore', category=ResourceWarning)
# We won't have competing threads trying to use our tokenizer in this example app
os.environ["TOKENIZERS_PARALLELISM"] = "false"
Initializing Tokenizer and Model
With our imports out of the way, let’s create our tokenizer and model instances:
tokenizer = AutoTokenizer.from_pretrained('thenlper/gte-base')
model = AutoModel.from_pretrained('thenlper/gte-base')
These lines initialize the tokenizer and model using the thenlper/gte-base
pre-trained model from Alibaba.
Defining Utility Functions
I’ve defined two functions that implement the functionality of embedding_util.py
:
average_pool
: A function to pool the last hidden states of the model, using masking and averaging.generate_embeddings
: A function that tokenizes the input text, generates embeddings using the pre-trained model, and normalizes them.
def average_pool(last_hidden_states: Tensor, attention_mask: Tensor) -> Tensor:
last_hidden = last_hidden_states.masked_fill(
~attention_mask[..., None].bool(), 0.0)
return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]
def generate_embeddings(text):
inputs = tokenizer(text, return_tensors='pt',
max_length=512, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
attention_mask = inputs['attention_mask']
embeddings = average_pool(outputs.last_hidden_state, attention_mask)
# (Optionally) normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
return embeddings.numpy().tolist()[0]
While this code may look complicated, just understand that you pass a single text string to generate_embeddings
, and you get a list of floats back - our vector embedding created by the GTE-base model. If you want to dig deeper, these functions are “heavily inspired” by the GTE-base model card.
Interacting with Weaviate
With our embedding utility module implemented, it’s time to move onto the app.py
module, the core of our demo project.
app.py
app.py imports
We’ll import the weaviate library to create our Weaviate client instance, the json module for creating printable strings from our Python dicts, and our generate_embeddings
function for creating embeddings to pass to Weaviate (later):
import weaviate
import json
from embedding_util import generate_embeddings
Setting Up Weaviate Client
A Weaviate client is initialized by providing the endpoint URL, http://localhost:8080
for our local Weaviate server. This client will allow us to interact with Weaviate, perform CRUD operations on data objects, and query the database.
client = weaviate.Client(url="http://localhost:8080")
Health Check
A simple health check ensures that Weaviate is ready and operational. This line simply illustrates for you how to verify the readiness of the Weaviate server:
print('is_ready:', client.is_ready())
Creating a Schema
A schema is defined, creating a custom class named “DocumentSearch”. This specific name doesn’t matter, but acts as an identifier for Weaviate, as you’ll see how we reference it later. The vectorizer
is set to “none” since the vectorization is done externally using our embedding model.
class_obj = {"class": "DocumentSearch", "vectorizer": "none"}
client.schema.create_class(class_obj)
Adding Data to Weaviate
A batch is configured to add multiple data objects to Weaviate simultaneously, setting the batch size equal to the length of the documents
list for this tutorial:
# Test source documents
documents = [
"A group of vibrant parrots chatter loudly, sharing stories of their tropical adventures.",
"The mathematician found solace in numbers, deciphering the hidden patterns of the universe.",
"The robot, with its intricate circuitry and precise movements, assembles the devices swiftly.",
"The chef, with a sprinkle of spices and a dash of love, creates culinary masterpieces.",
"The ancient tree, with its gnarled branches and deep roots, whispers secrets of the past.",
"The detective, with keen observation and logical reasoning, unravels the intricate web of clues.",
"The sunset paints the sky with shades of orange, pink, and purple, reflecting on the calm sea.",
"In the dense forest, the howl of a lone wolf echoes, blending with the symphony of the night.",
"The dancer, with graceful moves and expressive gestures, tells a story without uttering a word.",
"In the quantum realm, particles flicker in and out of existence, dancing to the tunes of probability."]
client.batch.configure(batch_size=len(documents))
In the batch process, for each document:
- Embeddings are generated using
generate_embeddings(doc)
. - An object with the original text and its corresponding embedding vector is added to Weaviate.
with client.batch as batch:
for i, doc in enumerate(documents):
properties = {"source_text": doc}
vector = generate_embeddings(doc)
batch.add_data_object(properties, "DocumentSearch", vector=vector)
Querying Weaviate
A query is embedded using the same model to ensure semantic compatibility.
query = "Give me some content about the ocean"
query_vector = generate_embeddings(query)
When a query is performed against Weaviate:
- We retrieve “DocumentSearch” objects with “source_text” as the selected property.
with_near_vector
specifies the vector and a minimum certainty for filtering results.with_limit(2)
restricts the result to the two most similar documents.with_additional(['certainty', 'distance'])
includes additional information in the results, the level of certainty and the cosine distance (cosine similarity can be calculated simply as1 - cosine distance
).
result = client.query.get("DocumentSearch", ["source_text"]).with_near_vector({
"vector": query_vector,
"certainty": 0.7
}).with_limit(2).with_additional(['certainty', 'distance']).do()
Finally, the result is printed in a pretty JSON format using the json
module, presenting the retrieved documents and additional information.
print(json.dumps(result, indent=4))
Your output should look something like this:
{
"data": {
"Get": {
"DocumentSearch": [
{
"_additional": {
"certainty": 0.9004524648189545,
"distance": 0.19909507
},
"source_text": "The sunset paints the sky with shades of orange, pink, and purple, reflecting on the calm sea."
},
{
"_additional": {
"certainty": 0.8804855942726135,
"distance": 0.23902881
},
"source_text": "The ancient tree, with its gnarled branches and deep roots, whispers secrets of the past."
}
]
}
}
}
Conclusion
In this tutorial, we walked through how to use a custom embedding model (thenlper/gte-base
from Alibaba) with Weaviate to perform semantic search on text data.
The combination of pre-trained language models and vector databases unlocks potent capabilities in developing intelligent, language-understanding applications. From building a semantic search engine to developing knowledge graphs, the synergy between embedding models and Weaviate opens up possibilities that were impossible before.
Here’s the link again to the companion code repository for this blog post, available on GitHub.
Questions or comments? Feel free to contact me or connect on social media!