How to Supercharge Chatbots with Mixtral 8x7B and Vector Search

The introduction of Large Language Models (LLMs) like Mistral AI’s Mixtral 8x7B marks a new era in chatbot technology, where these systems do more than just answer questions - they understand and interpret them with unparalleled depth. A crucial aspect of this advancement is the integration of vector search tools. These tools enable chatbots to conduct semantic searches with impressive precision.

In the example project accompanying this blog post, I’ll be demonstrating how Mixtral 8x7B (referred to as mistral-small on Mistral AI’s API) can be enhanced through integration with a vector search tool, significantly boosting its capabilities.

All source code, including a runnable demo, is available at my GitHub repository.

Overview of Mistral AI and the Mixtral 8x7B Model

Mistral AI is at the forefront of developing robust and open generative models, and Mixtral 8x7B is their latest model. This model, known for its multilingual prowess and code understanding, has set a new benchmark in the chatbot landscape - rivaling GPT-3.5. It’s designed to grasp the nuances of language and code, making it a versatile tool for diverse applications.

The Role of Vector Search in Enhancing Chatbot Functionality

Vector search is revolutionizing the way chatbots interpret queries. By mapping words into vector space, it allows the bot to grasp the semantic relationships between words, thereby fetching contextually relevant information. This is particularly useful in cases where the query’s intent is complex or nuanced.

Prerequisites

Getting an API Key

You’ll need an API key to follow along in this tutorial. While there is a waitlist to get access to the API offered by Mistral AI, I got my API key 24 hours after signing up. You can find more information about joining the waitlist on the Mistral AI Docs.

Once you have an api key, create a .env at the root of your project at set the following key value pair:

MISTRAL_API_KEY=YOUR_MISTRAL_API_KEY

Setting Up the Environment

Before diving into the heart of the chatbot’s functionality, it’s crucial to set up the right environment. This involves loading essential libraries and setting up a database for the bot to draw information from. The dotenv library plays a key role here, managing environment variables securely.

You need to have Python 3.9 installed on your system. Once you have that installed, you can just install the required packages from the requirements.txt file at the root of the repo:

pip install -r requirements.txt

Understanding the Core Functionality of the Code

First off, we’ll import the essential modules and load the API key from the environment and specify which model to use from Mistral AI, in our app.py:

import os
import json
from dotenv import load_dotenv
from setup_db import setup_db
from vector_search import vector_search
from mistralai.client import MistralClient
from mistralai.models.chat_completion import ChatMessage

load_dotenv()

api_key = os.environ["MISTRAL_API_KEY"]

# This is actually "Mixtral 8x7B", served via endpoint
model = "mistral-small"

I’m reusing setup_db, for setting up and seeding a local SQLite database, and vector_search, for actually performing the vector searches from my other blog post on giving LLMs vector search tools. If you’d like to know how these functions work, check out that post!

Detailed Explanation of Chatbot Instructions

Next, let’s discuss the instructions for the chatbot:

instructions = '''
**You are the 'VectorizedKnowledgeBot':** An advanced Chatbot capable of conducting vector-based searches, providing contextually relevant answers to user queries or interpreting tool output.

**Instructions for Employing the 'vector_search' Tool:**

1. **Understanding the Tool:**
   - The "vector_search" tool is adept at context-aware searches using vector space modeling. It interprets the semantics of a query or tool output to find the most relevant information.

2. **Identifying the User Query or Tool Output:**
   - Begin by recognizing either the user's query or the output provided by the tool. Focus on key concepts and specific details.
   - If given the key words "CONTEXT:" and "USER_QUERY:" then consider the "CONTEXT:" as the tool output. Else respond to pass a JSON object to the tool.

3. **Formulating the Search Query:**
   - Transform the user's query or interpret the tool output into a concise, targeted search query. Highlight the main keywords and terms crucial to the query's context.

4. **Utilizing the Tool:**
   - Input the formulated search query into the "vector_search" tool, ensuring it's enclosed in quotes as text input.

5. **Interpreting the Results:**
   - Analyze the results from "vector_search" to ensure their relevance and accuracy in addressing the user's query or the context of the tool output.

6. **Communicating the Outcome:**
   - Present the findings from the "vector_search" in a clear, informative manner, summarizing the context or providing a direct response to the query or tool output.

**Example Application:**

If a user asks about "the impact of climate change on polar bear populations" or if the tool output pertains to this topic, you would:

- Extract key terms: "impact," "climate change," "polar bear populations."
- Develop the search query: `"climate change impact on polar bear populations"`
- Execute the query through the tool: `vector_search("climate change impact on polar bear populations")`

Respond with the search query in a JSON format:

```json
{ "query": "climate change impact on polar bear populations" }
```

If there is no "CONTEXT:" and no "USER_QUERY:", then only output something like the above. In JSON. Only JSON.

If given "CONTEXT:" and "USER_QUERY:" like the following:

```plaintext
CONTEXT: Climate change has decreased polar bear populations by 12.3% since 2022.

USER_QUERY: How has climate change impacted polar bear populations?

```
Then respond with something like:

{ "answer": "Polar bear populations have declined by 12.3% since 2022" }

Always respond exclusively in JSON format, no matter what. Only use CONTEXT if available.
'''

Very similar to system instructions for OpenAI’s GPT Assistants, these system instructions outline how the bot should interpret user queries or tool outputs, transform these into search queries, and then use the vector_search tool to find the most relevant information. This set of instructions is crucial for the bot to understand and execute its tasks effectively.

Initializing the Mistral AI Client

We initialize the Mistral AI Client with the following:

client = MistralClient(api_key=api_key)

Very similiar to instantiating the OpenAI SDK client, just passing the api_key.

The Chatbot’s User Interaction Process

The process begins with the chatbot receiving both a system prompt and a user prompt:

user_prompt = "Do you have any content about sea creatures?"
print("PROMPT:")
print(user_prompt)
messages = [
    ChatMessage(role="system", content=instructions),
    ChatMessage(
        role="user", content=user_prompt),
]

chat_response = client.chat(
    model=model,
    messages=messages,
)

It then discerns the core elements of the query, formulating a precise search query, or as you’ll see in the next section, if our vector search tool locates similar text, it provides this as context. This is where the bot’s understanding of language and context comes into play, determining the relevance and accuracy of the search results.

Integrating Vector Search with Mixtral 8x7B

Integrating vector search with Mixtral 8x7B can be a simple process. Once the chatbot formulates a search query, it inputs this into the vector_search tool. The results are then analyzed for relevance before being relayed back to the user, ensuring that the response is both accurate and contextually appropriate. As illustrated in the following code snippet:

response_content = chat_response.choices[0].message.content

if 'query' in response_content:
    context = vector_search(arguments=response_content)
    messages = [ChatMessage(role='user', content=f'''CONTEXT: ${json.dumps(context)}

    USER_QUERY: ${user_prompt}''')]
    chat_response = client.chat(
        model=model,
        messages=messages,
    )

    print("RESPONSE:")
    print(chat_response.choices[0].message.content)

All of the code of the main function is wrapped in a try clause, printing an exception if any.

Running the app

Assuming your Mistral AI API key is set and Python packages are installed correctly, you can run the app from the root of the repo:

python3 app.py

You should see terminal output similar to the following:

PROMPT:
Do you have any content about sea creatures?
RESPONSE:
Yes, here is some interesting information about octopuses:

Octopuses are cephalopods, which means they are closely related to squids and cuttlefish. One of the most distinctive features of octopuses is that they have three hearts. Two of these hearts pump blood to the gills, while the third pumps it to the rest of the body. However, when an octopus swims, the heart that delivers blood to the rest of the body stops beating, which is why octopuses prefer to crawl rather than swim, as swimming exhausts them.

Is there anything specific you would like to know about octopuses or other sea creatures?

In our seed data, we vectorize and store a fact about octopuses having three hearts. Our vector search tool has pulled this fact into the prompt, and now this fact about octopuses is “on the top of mind” of Mixtral 8x7B to use for answering the user’s prompt.

If you run this app repeatedly, you’ll consistently see this fact about octopuses is precisely what it answers with.

Real-world Applications and Implications

The practical applications of a chatbot powered by Mixtral 8x7B and vector search are vast. From customer service to data analysis, the potential to streamline and enhance various processes is immense. This technology paves the way for more intuitive, efficient, and accurate data retrieval systems across industries, leveraging more open models.

Benefits of Mixtral 8x7B

If Mixtral 8x7B isn’t the best LLM out there, why use it? There are a couple of reasons to consider it:

Not Proprietary: Since it isn’t a closed model - and if self-hosted - you have a much greater control of your data, keeping it private.
Cheap: If using Mistral AI’s API service, compared to GPT-3.5 turbo, it’s much cheaper on input tokens and slightly cheaper on output tokens (for the mistral-small model).
Good Enough: Given the other pros, in testing mistral-small for this blog post I found it to be “good enough” at following instructions.

Conclusion

The integration of Mixtral 8x7B with a vector search tool enables you to create AI applications backed by an open model, leveraging context stored as vector embeddings.

By understanding not just the words but the intent and context of user queries, these bots are set to redefine the boundaries of machine-assisted communication and data retrieval.