Building a Reliable and Accurate LLM Application with Voting Systems

Stephen CollinsJun 27, 2024
What you will learn

  • Why use a voting system in LLM applications?
  • Voting systems combine the strengths of multiple models to produce a more reliable and accurate outcome. Benefits include increased accuracy, robustness against individual model errors, and better generalization across different inputs.
  • What are the first steps to implementing a voting system for LLM applications?
  • The first steps include choosing your LLM APIs from different providers and initializing the APIs with the necessary API keys.
  • Which LLM providers are used in the tutorial for the voting system?
  • The tutorial uses Google Gemini 1.5 Flash, Claude Sonnet 3.5, and OpenAI GPT-4o as the LLM providers.
  • How does the majority voting system work in the tutorial?
  • The majority voting system sends the same input query to each LLM API and collects their responses. The final vote is determined by the answer that receives the majority of votes.
  • What is the main difference between majority and weighted voting systems in the context of LLM applications?
  • The main difference is that while a majority voting system gives equal weight to each model's decision, a weighted voting system assigns different weights to responses based on the model's reliability, potentially giving more influence to certain models.

Ensuring the reliability and accuracy of applications powered by large language models (LLMs) is of the upmost importance. One effective strategy to enhance these aspects is by implementing a voting system using multiple LLMs. This tutorial will guide you through the process of setting up a simple yet powerful voting system with various LLM APIs, ensuring your application delivers consistent and high-quality results.

All of the code for this tutorial is available in my GitHub repo.

Why Use a Voting System?

Voting systems combine the strengths of multiple models, leveraging their diverse perspectives to produce a more reliable and accurate outcome. Here are some key benefits:

  • Increased Accuracy: Aggregating outputs from multiple models often yields better performance than any single model.
  • Robustness: The system is less prone to individual model errors, ensuring more stable predictions.
  • Better Generalization: Combining multiple models helps in capturing a broader range of knowledge, improving generalization to new inputs.

Step-by-Step Guide to Implementing a Voting System

Step 1: Choose Your LLM APIs

First, select multiple LLM APIs from different providers. For this tutorial, we’ll use Google Gemini 1.5 Flash, Claude Sonnet 3.5, and OpenAI GPT-4o, three popular LLM providers.

Step 2: Initialize the APIs

Ensure you have the necessary API keys and initialize the clients for each provider. Here is how to get started with Anthropic’s API key, Google’s credential JSON, and OpenAI’s API key. And here is our initial setup for our AI-powered image processing application:

from dotenv import load_dotenv
from ai.factory import create_ai_processor


google_processor = create_ai_processor("google", "gemini-1.5-flash-001")
openai_processor = create_ai_processor("openai", "gpt-4o")
anthropic_processor = create_ai_processor(
    "anthropic", "claude-3-5-sonnet-20240620")
voters = [google_processor, openai_processor, anthropic_processor]

We initialize each processor from each vendor. Check out the abstract base class AIProcessor to see how to implement your own processor.

Step 3: Define Functions to Get Responses

Create functions to send the same input query to each API and collect their responses:

def majority_voting_system_votes(prompt, image):
    votes = []
    for voter in voters:
        vote = voter.process(prompt, image)
        votes.append(int(vote) if vote.isdigit() else vote)
        print(f"VENDOR: {voter.get_vendor()} MODEL: {voter.get_model_name()} VOTE: {vote}")
    return max(set(votes), key=votes.count)

def weighted_voting_system_votes(prompt, image, weights):
    weighted_responses = {}

    for voter, weight in zip(voters, weights):
        vote = voter.process(prompt, image)
        vote = int(vote) if vote.isdigit() else vote
        print(f"VENDOR: {voter.get_vendor()} MODEL: {voter.get_model_name()} VOTE: {vote} WEIGHT: {weight}")
        weighted_responses[vote] = weighted_responses.get(vote, 0) + weight

    return max(weighted_responses, key=weighted_responses.get)

Step 4: Test the System

Now that your voting system is set up, test it with an example prompt:

# Example usage
prompt = "How many coins are in the image? Only respond with a number."

with open("./images/coins.png", "rb") as image_file:
    image =

final_vote = majority_voting_system_votes(prompt, image)
print("Majority Voting Final Vote:", final_vote)

# Example weights for Google Gemini, OpenAI GPT-4o, and Claude Sonnet respectively
weights = [0.4, 0.3, 0.3]
final_vote = weighted_voting_system_votes(prompt, image, weights)
print("Weighted Voting Final Vote:", final_vote)

If the app is working as expected, you should see the following output:

VENDOR: google MODEL: gemini-1.5-flash-001 VOTE: 3
VENDOR: openai MODEL: gpt-4o VOTE: 3
VENDOR: anthropic MODEL: claude-3-5-sonnet-20240620 VOTE: 3
Majority Voting Final Vote: 3
VENDOR: google MODEL: gemini-1.5-flash-001 VOTE: 3 WEIGHT: 0.4
VENDOR: openai MODEL: gpt-4o VOTE: 3 WEIGHT: 0.3
VENDOR: anthropic MODEL: claude-3-5-sonnet-20240620 VOTE: 3 WEIGHT: 0.3
Weighted Voting Final Vote: 3

Enhancing the Voting System

While majority and weighted voting are great starts, you can enhance your system with techniques like performance monitoring.

Performance Monitoring

Monitor the performance of each model by keeping track of their accuracy over time. Adjust the weights based on this performance data to ensure the most reliable models have more influence on the final decision.

Best Practices

  • Model Diversity: Use different types of LLMs to benefit from diverse perspectives.
  • Data Management: Properly handle training and validation data to avoid data leakage and ensure fair evaluation.
  • Regular Evaluation: Continuously evaluate the performance of your voting system and individual models to maintain high accuracy and reliability.


Implementing a voting system with multiple LLMs can significantly enhance the performance of your application. By leveraging the strengths of different models, you can achieve higher accuracy, improved robustness, and better generalization. Start with a simple majority voting system and gradually incorporate more sophisticated techniques like weighted voting and performance monitoring to optimize your application further.

By following this tutorial, you’ll be well on your way to building a reliable and accurate LLM application. Whether you’re working on an image recognition system, a sentiment classification tool, or any other AI-powered application, a voting system can help ensure your results are consistently top-notch.