Enhancing GPT Assistants with Vector Search Tools

🚧 The OpenAI Assistants API is going to be sunset in 2026. The suggested alternative is to use the OpenAI Responses API instead.

Imagine an AI assistant that doesn’t just respond to queries but grasps the context with human-like understanding. With the fusion of GPT-4’s linguistic prowess and bespoke vector search capabilities, this is no longer science fiction.

In this deep dive, I peel back the layers of basic chatbot functionality to reveal the ‘VectorizedKnowledgeBot’—a GPT Assistant that leverages cutting-edge machine learning to deliver responses with uncanny relevance and precision. Step into the future of AI as I guide you through the process of supercharging your GPT Assistant with the power of semantic search, transforming the way it interacts, learns, and ultimately, understands.

We’ll construct a ‘VectorizedKnowledgeBot’, a conversational AI agent created with OpenAI’s new Assistants API and using a custom Tool for implementing vector search as an Assistant callable function, that transcends mere question-answering to achieve profound contextual comprehension, thus delivering responses with exceptional precision and relevance. All code for this blog post can be found in the companion GitHub repository.

Prerequisites

Before I explain how “VectorizedKnowledgeBot” works, here’s what you need first:

A .env file, with a valid OpenAI API Key:

OPENAI_API_KEY=YOUR_OPENAI_API_KEY

Python 3.9 installed (PyTorch, used by embedding_util.py, does not yet support Python 3.12)
Download the sqlite-vss v0.1.2 vector0 and vss0 vector sqlite extensions for your operating system
The following Python dependencies installed:

torch==2.1.0
transformers==4.34.0
sqlean.py
python-dotenv
openai

With our dependencies and local .env configured, let’s move onto explaining the code for how we can give a GPT Assistant a Tool to add context to a conversation.

Defining the `vector_search` Tool

Let’s start with examining the tool from the following code, found in vector_search.py from the companion code repository:

import json
import sqlite3
from db import open_connection
from embedding_util import generate_embeddings


def vector_search(arguments):
    arg_object = json.loads(arguments)
    query = arg_object['query']
    query_embedding = generate_embeddings(query)

    db_connection = open_connection()
    cursor = db_connection.cursor()

    try:
        # Perform the SQL query and fetch the results
        cursor.execute(
            '''
            with matches as (
                select rowid,
                distance
                from vss_posts where vss_search(content_embedding, (?))
                limit 10
            )
            select
            posts.content,
            posts.published,
            matches.distance
            from matches
            left join posts on posts.rowid = matches.rowid
            ''',
            [json.dumps(
                query_embedding)]
        )
        result = cursor.fetchall()
        return result
    except sqlite3.Error as err:
        # Handle the error as appropriate
        print("An error occurred:", err)
    finally:
        # Close the cursor and the connection
        cursor.close()
        db_connection.close()

At a high level, what is happening here is that we create a sqlite database connection that loads sqlite-vss to handle how we query vector embeddings in our sqlite database (I’ve explained how to work with sqlite-vss for vector search with sqlite databases in another blog post).

With this database connection, we query our virtual table vss_posts to get back data from the source posts table to return.

Our input, an arguments dictionary, passes it’s key query, containing a string value, to create embeddings from on return from generate_embeddings. We use this vector embedding result to perform our query on the virtual table to get back the original source content to give to our GPT Assistant.

For you own awareness, the code of embedding_util.py is heavily inspired by the Hugging Face model card of the embedding model it uses, GTE-base.

Now understanding how our tool works at a high level, let’s move onto the app.py where we will integrate it with our GPT Assistant.

Bringing the Tool and Assistant Together

The app.py serves as the foundational code for our tutorial project, where it establishes our example database and incorporates the vector_search tool into a dictionary. This setup enables the GPT Assistant to intuitively invoke the vector_search or any tool within its “toolbox”, as directed by the provided instructions. Furthermore, it orchestrates a simulated dialogue tailored to prompt the GPT Assistant into utilizing the vector_search tool.

`app.py` Setup

The first part of our app.py:

import os
import json
from dotenv import load_dotenv
import time
import openai
from setup_db import setup_db
from vector_search import vector_search

load_dotenv()

tools = {
    "vector_search": vector_search
}

# Your OpenAI API key should be set in the .env file
openai.api_key = os.getenv("OPENAI_API_KEY")

client = openai.OpenAI(api_key=openai.api_key)

In this section, we import the necessary libraries, create a fundamental tools dictionary for invoking various tools (currently only vector_search), retrieve our OPENAI_API_KEY from the environment, and initialize a client instance from OpenAI.

User Prompt and Instructions

This is our test user_prompt, that will encourage the GPT Assistant to use our vector search tool to find relevant content:

user_prompt = "Do you have any content about sea creatures?"

Here’s one of the most important parts of the entire project, our instructions:

instructions = '''
**You are the 'VectorizedKnowledgeBot':** A Chatbot with the capability to perform advanced vector-based searches to provide contextually relevant answers to user queries.

**Instructions for Using the 'vector_search' Tool:**

1. **Understanding the Tool:**
   - The "vector_search" tool is designed to perform a contextually aware search based on a string input. It uses vector space modeling to understand the semantics of the query and retrieve the most relevant information.

2. **Identifying the User Query:**
   - Begin by identifying the user's query. Pay attention to the key concepts and specific details the user is interested in.

3. **Formulating the Search String:**
   - Based on the user's query, formulate a concise and targeted search string. Include the most important keywords and terms that are central to the query's context.

4. **Using the Tool:**
   - Pass the formulated search string to the "vector_search" tool as an argument. Ensure that the string is encapsulated in quotes to be recognized as a text input.

5. **Interpreting Results:**
   - Once the "vector_search" returns results, analyze the information to verify its relevance and accuracy in addressing the user's query.

6. **Communicating the Outcome:**
   - Present the findings from the "vector_search" to the user in a clear and informative manner, summarizing the context or providing a direct answer to the query.

**Example Usage:**

If the user asks about "the impact of climate change on polar bear populations," you would:

- Extract keywords: "impact," "climate change," "polar bear populations."
- Formulate the search string: `"climate change impact on polar bear populations"`
- Pass the string to the tool: `vector_search("climate change impact on polar bear populations")`
- Analyze and relay the information back to the user in response to their query.

Remember to maintain the user's original intent in the search string and to ensure that the results from the "vector_search" are well-interpreted before conveying them to the user.
'''

These instructions are extremely important. They meticulously detail the operational blueprint of our GPT Assistant (named ‘VectorizedKnowledgeBot’), guiding it on how to leverage the vector_search tool to its fullest potential. This detailed guideline ensures that every search is precise, relevant, and context-aware.

The Main Function

Our main() function is where the magic happens. It’s a sequence of orchestrated steps:

initializing the database
creating a GPT-4 assistant with a clear set of instructions,
and setting up a communication thread to create messages to add to.

The assistant is then engaged with the user’s prompt, awaiting its cue (via simple polling) to call upon the vector_search tool. As the user’s input is processed, the vector_search tool is dynamically invoked, and its findings are then translated into a coherent and informative response by the assistant.

def main():
    try:
        # Initializes our test database, "blog.sqlite"
        setup_db()

        # Create our GPT Assistant
        assistant = client.beta.assistants.create(
            model="gpt-4-1106-preview",
            name="VectorizedKnowledgeBot",
            instructions=instructions,
            tools=[{
                "type": "function",
                "function": {
                    "name": "vector_search",
                    "description": "Perform a vector-based search to retrieve contextually relevant information based on a user's query.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "query": {"type": "string", "description": "A targeted search string based on a user query."},
                        },
                        "required": ["query"]
                    }
                },
            }]
        )

        # Create a new Thread - this is required for "identifying" which messages are part of what conversation (a.k.a, "thread")
        thread = client.beta.threads.create()

        # Create our first Message from our test user prompt
        client.beta.threads.messages.create(
            thread_id=thread.id,
            role="user",
            content=user_prompt
        )

        # Create a Run, that sends our messages so far to OpenAI's GPT4 API servers
        run = client.beta.threads.runs.create(
            thread_id=thread.id,
            assistant_id=assistant.id,
        )

        # Simple polling until the "Run" is done processing by OpenAI's GPT4 API servers
        while run.status in ['queued', 'in_progress']:
            time.sleep(1)
            run = client.beta.threads.runs.retrieve(
                thread_id=thread.id,
                run_id=run.id
            )
        # The processing by OpenAI's GPT4 determines it requires an action to take
        if run.status == "requires_action":
            if run.required_action.submit_tool_outputs.tool_calls[0].type == 'function':
                # Get the name of the tool and arguments to pass to it, from GPT4's understanding from our instructions
                tool_function = run.required_action.submit_tool_outputs.tool_calls[0].function
                function_name = getattr(tool_function, 'name')
                arguments = getattr(tool_function, 'arguments')
                # Now call the function from the tools dictionary using the function name
                result = tools[function_name](arguments)
                # Pass the tool's output result for more processing
                run = client.beta.threads.runs.submit_tool_outputs(
                    thread_id=thread.id,
                    run_id=run.id,
                    tool_outputs=[
                        {
                            "tool_call_id": run.required_action.submit_tool_outputs.tool_calls[0].id,
                            "output": json.dumps(result),
                        },
                    ]
                )
                # Poll one more time until this new Run is finished processing
                while run.status in ['queued', 'in_progress']:
                    time.sleep(1)
                    run = client.beta.threads.runs.retrieve(
                        thread_id=thread.id,
                        run_id=run.id
                    )

        # Get all of the messages now available on our Thread
        messages = client.beta.threads.messages.list(thread_id=thread.id)

        # Print the Thread messages
        for message in sorted(messages.data, key=lambda x: x.created_at):
            print(
                f'{"Assistant" if message.assistant_id else "User"}: {message.content[0].text.value}')

    except Exception as err:
        print("Error:", err)


if __name__ == '__main__':
    main()

The code includes handling for asynchronous operations, with pauses between checks on the assistant’s status, and dynamically calls the custom tool function when required. Exception handling is in place to output any errors encountered during execution.

Running the code

To run our app is as simple as:

python3 app.py

*Note that it may take a while for app.py to fully run. This is because of how long the vector embedding model used by the embedding_util takes to load into memory. Once the vector embedding model is loaded into memory (for example, once initialized for running in a python flask or FastAPI server) subsequent calls do not take anywhere near as long.

Understanding the Output of `app.py`

If you are able to successfully run python3 app.py, you should see output very similar to the following:

User: Do you have any content about sea creatures?
Assistant: Certainly! Here's an interesting fact about sea creatures: Octopuses have three hearts, and when they swim, two of these hearts actually stop beating. This unique physiological adaptation is just one of many fascinating aspects of sea creatures that inhabit our oceans. If you're looking for more information or specific details about sea creatures, feel free to ask!

This fact about octopuses comes from our database (found in setup_db.py):

# Hardcoded data to insert into the posts table
HARDCODED_DATA = [
    ...
    (1, '2023-01-09 00:00:00',
     'Octopuses have three hearts, but two of them actually stop beating when they swim.', 'Octopus Hearts'),
     ...
]

Our vector search Tool created a query vector embedding that represents the semantic representation of “Do you have any content about sea creatures?”(through our underlying vector embedding model, GTE-base), and our sqlite-vss vector extension used that same query vector embedding to perform a cosine similarity, ranking our stored vector embeddings to match against previously generated vector embeddings (for all of our HARDCODED_DATA items we inserted).

The retrieved output is then passed to our GPT Assistant to “strongly encourage” the next text that it generates to use our found context, here about this specific fact about octopuses.

To contrast, here is a conversation with GPT-4, using the same query. Nothing is mentioned about octupuses, let alone the surprising fact about how they have three hearts, and that two stop when they swim. Because we pulled this context to be front and center in the Assistant’s context window that it uses to generate the next sequence of tokens.

Conclusion

By the end of this tutorial, you should have a simple, functional ‘VectorizedKnowledgeBot’, capable of conducting advanced vector-based searches to provide nuanced, contextually relevant answers. This integration of the new GPT Assistants with a custom vector search tool doesn’t just represent a leap in technology—it’s a stride towards more intelligent, responsive, and understanding AI-powered interactions, equipped with semantic context and better suited for real world business applications.

Enhancing GPT Assistants with Vector Search Tools

Key Points

Prerequisites

Defining the `vector_search` Tool

Bringing the Tool and Assistant Together

`app.py` Setup

User Prompt and Instructions

The Main Function

Running the code

Understanding the Output of `app.py`

Conclusion

Recommended Posts

Enhancing GPT Assistants with Vector Search Tools

Key Points

Prerequisites

Defining the vector_search Tool

Bringing the Tool and Assistant Together

app.py Setup

User Prompt and Instructions

The Main Function

Running the code

Understanding the Output of app.py

Conclusion

Recommended Posts

Defining the `vector_search` Tool

`app.py` Setup

Understanding the Output of `app.py`