Learn how to set up a QA bot with LangChain

Introduction

Our previous articles have journeyed through the fascinating world of language models, vectors, and vector databases. We've learned how to load and split YouTube videos and other kinds of documents, generate embeddings, and store them in a vector database using LangChain and OpenAI. Now, it's time to take the next exciting step: using the data stored in the vector database to answer questions. Sounds like a thrilling project, right? Let's dive right in!

📚

Check out the previous articles to learn how to set up your projects:

This guide will use the Activeloop vector store we set up in the previous article as context for our QA bot.

Setting Up the Environment

Before starting, we must ensure our environment is properly set up. We'll be using the dotenv library to load environment variables from a .env file. This file should contain your OpenAI API key, Activeloop token, and the name of the language model you're using.

Here is a sample of the .env file you can use:

# OpenAI 
OPENAI_API_KEY="YOUR_OPENAI_KEY"
EMBEDDINGS_MODEL="text-embedding-ada-002"
LANGUAGE_MODEL="gpt-3.5-turbo" # gpt-4 gpt-3.5-turbo

# Deeplake vector DB
ACTIVELOOP_TOKEN="YOUR_ACTIVELOOP_TOKEN"
DATASET_PATH="./podcast_vector_db" # "hub://USER_ID/custom_dataset"   Edit with your user id if you want to use the cloud db.

Check the first article in the series to make sure your environment is ready.

The ultimate LangChain series — Environment setup

Getting started with the chat file

In your project root, create a new file named chat.py, this is the file that you will interact with; the logic will work as follow:

Import the necessary tools from LangChain
Set up the path to interact with the vector DB
Set up the retriever
Set up the QA chain
Allows the user to ask questions

Imports used for QA in LangChain

Start by importing the required tools and environment variables

import os
from dotenv import load_dotenv
from langchain.vectorstores import DeepLake
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.embeddings import OpenAIEmbeddings
from langchain.callbacks import get_openai_callback

# Load environment variables from .env file
load_dotenv()

# Set environment variables
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')
os.environ['ACTIVELOOP_TOKEN'] = os.getenv('ACTIVELOOP_TOKEN')
language_model = os.getenv('LANGUAGE_MODEL')

You should be familiar with those by now; the only ones are ConversationalRetrievalChain and get_openai_callback.

What is `ConversationalRetrievalChain`

The ConversationalRetrievalChain, a key component of the LangChain system, is a powerful tool designed to interact with data housed within a VectorStore. Its primary function is to retrieve the most relevant information or code snippets in response to a user's query; it also employs sophisticated techniques such as context-aware filtering and ranking to ensure the results it delivers are relevant, highly accurate, and meaningful.

The ConversationalRetrievalChain also has the ability to consider the conversation's history and context, thereby providing accurate responses and making sense of the flow of the conversation.

What is `get_openai_callback`

The get_openai_callback function, a handy utility provided by the LangChain library, is like a meticulous accountant for your OpenAI API calls. It keeps track of token usage, ensuring you're always aware of how many tokens are consumed during specific interactions with the OpenAI API and even the total cost in USD.

Here's a quick example of how to use get_openai_callback:

from langchain.callbacks import get_openai_callback
from langchain.llms import OpenAI

llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)

with get_openai_callback() as cb:
    result = llm("Tell me a joke")
    print(cb)

In this snippet, we're asking an OpenAI model to tell us a joke. The get_openai_callback function is keeping track of the tokens used in this interaction.

The output might look something like this:

Tokens Used: 42
    Prompt Tokens: 4
    Completion Tokens: 38
Successful Requests: 1
Total Cost (USD): $0.00084

This output tells us that the joke cost us 42 tokens, with 4 tokens for the prompt and 38 for completion. It also tells us that the total cost of this interaction was $0.00084.

You can also use get_openai_callback to track token usage for multiple calls in sequence or within a chain or agent with multiple steps.

💡

Keep in mind that the cost estimate might not always be accurate.

Initializing DeepLake and OpenAI Embeddings

Next, we'll initialize our OpenAI embeddings and DeepLake vector store. Remember, DeepLake is our vector database where we've stored our embeddings. We'll set read_only=True since we're only retrieving data, not adding or modifying it.

# Set DeepLake dataset path
DEEPLAKE_PATH = os.getenv('DATASET_PATH')

# Initialize OpenAI embeddings and disallow special tokens
EMBEDDINGS = OpenAIEmbeddings(disallowed_special=())

# Initialize DeepLake vector store with OpenAI embeddings
deep_lake = DeepLake(
    dataset_path=DEEPLAKE_PATH,
    read_only=True,
    embedding_function=EMBEDDINGS,
)

Setting Up the Retriever

Now, we'll set up our retriever. The retriever is the component that will search our vector database for the most relevant vectors to our query. We'll set the distance_metric to cos, which stands for cosine similarity, a common measure of vector similarity. We'll also set fetch_k to 100, meaning the retriever will initially fetch the top 100 most similar vectors. Then, using Maximal Marginal Relevance (MMR), it will refine this to the top 10 (k=10) most diverse and relevant vectors.

# Initialize retriever and set search parameters
retriever = deep_lake.as_retriever()
retriever.search_kwargs.update({
    'distance_metric': 'cos',
    'fetch_k': 100,
    'maximal_marginal_relevance': True,
    'k': 10,
})

Initializing the Language Model

Now is the time to initialize the language model using the ChatOpenAI class. We'll set the temperature parameter to 0.2, which controls the randomness of the model's output. A lower temperature results in more deterministic responses. When you use your own data, you can also set the temperature to 0 to ensure that the model has fewer chances to hallucinate.

📘

By the way, hallucination in LLMs refers to the generation of information or content that wasn't present or implied in the input data. It's as if the model is dreaming up details, hence the term "hallucination". This can sometimes lead to creative and unexpected outputs but can also result in inaccuracies or inconsistencies in the generated text.

# Initialize ChatOpenAI model
model = ChatOpenAI(model_name=language_model, temperature=0.2) 

# We set gpt-3.5-turbo by default in the env variables. 
# Use gpt-4 for better and more accurate responses.

Creating the Conversational Retrieval Chain

Let's start with the Conversational Retrieval Chain. This is where the magic happens! The Conversational Retrieval Chain takes our question, retrieves the most relevant vectors from our database, and uses them as context to generate an answer from our language model.

# Initialize ConversationalRetrievalChain
qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)

# Initialize chat history
chat_history = []

Helper functions for user interaction

This is where we set up how the user can ask questions and how the answers will be displayed.

We'll have two functions, get_user_input() and print_answer(). These functions take user input, process it, and provide a response.

def get_user_input():
    """Get user input and handle 'quit' command."""
    question = input("\nPlease enter your question (or 'quit' to stop): ")
    if question.lower() == 'quit':
        return None
    return question

def print_answer(question, answer):
    """Format and print question and answer."""
    print(f"\nQuestion: {question}\nAnswer: {answer}\n")

The get_user_input() function is the gateway for user interaction. It prompts the user to enter a question with the message "Please enter your question (or 'quit' to stop): ". This function also handles a special command, 'quit'. If the user types 'quit', the function will return None, signaling the program to stop taking further inputs. This is an elegant way to exit the program instead of using cmd + c; otherwise, it returns the question entered by the user.

The print_answer(question, answer) function, on the other hand, is responsible for presenting the response to the user. It takes two parameters: the original question asked by the user and the answer generated by the system. It then formats these two pieces of information in a readable manner and prints them out. This function ensures the user can clearly see their original question and the corresponding answer, providing a smooth and understandable user experience.

Running the Program

Finally, we're ready to run our program! We'll create a loop that prompts the user for a question, uses our Conversational Retrieval Chain to generate an answer, and then prints the answer. We'll also print the token usage for each question, which can help us keep track of our API usage.

def main():
    """Main program loop."""
    while True:
        question = get_user_input()
        if question is None:  # User has quit
            break

        # Display token usage and approximate costs
        with get_openai_callback() as tokens_usage:
            result = qa({"question": question, "chat_history": chat_history})
            chat_history.append((question, result['answer']))
            print_answer(question, result['answer'])
            print(tokens_usage)

if __name__ == "__main__":
    main()

Now you just need to run it with python3 chat.py and ask some questions! Here is an example of a few interactions:

Question: can you summarize what this episode of the podcast is about?

Answer: This episode of the podcast is mainly focused on web3 security, specifically discussing a faucet that was recently drained for almost 200 in gorillaith. The speakers talk about how it happened, why it's important to take precautionary measures, and the need for usability and mainstream adoption in the web3 space. They also mention previous episodes where they discussed AI adventures and the chain sector plugin.

Question: who are the speakers?

Answer: The speakers of this episode of the podcast are Tabasco (Ethan), Ache (Director of Developer Experience), and David (Developer Advocate).

And it will look like this in the terminal:

Question: who are the speakers?
Answer: The speakers of this episode of the podcast are Tabasco (Ethan), Ache (Director of Developer Experience), and David (Developer Advocate).

Tokens Used: 2577
        Prompt Tokens: 2537
        Completion Tokens: 40
Successful Requests: 2
Total Cost (USD): $0.005154000000000001

We can see that the responses are extremely relevant and well formatted; considering how simple the application is, this is a great starting point.

But you will notice some details are partially wrong, and this is not a problem with the LLM or our app. Since we rely on the YouTube transcription of the audio from the video, things like names, especially, might be transcripted wrong; for example, 'gorillaith' should be 'Goerli ETH', and the name of the Chainstack's Director of Developer Experience is 'Ake' and not "Ache'. We can easily fix these details when scraping the data we need.

Go check the podcast on YouTube to see how accurate our QA bot is!

FAFO & Chill Ep3

This code interacts with a local vector DB, but all you have to do to use a cloud instance is to change the path in the environment variables to your cloud vector DB, just like we did in the previous article explaining how to use Activeloop.

And you can use the same principle to interact with a Pinecone vector DB; check the article dedicated to Pinecone.

The ultimate LangChain series — Pinecone vector database

Full code

import os
from dotenv import load_dotenv
from langchain.vectorstores import DeepLake
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.embeddings import OpenAIEmbeddings
from langchain.callbacks import get_openai_callback

# Load environment variables from .env file
load_dotenv()

# Set environment variables
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')
os.environ['ACTIVELOOP_TOKEN'] = os.getenv('ACTIVELOOP_TOKEN')
language_model = os.getenv('LANGUAGE_MODEL')

# Set DeepLake dataset path
DEEPLAKE_PATH = os.getenv('DATASET_PATH')

# Initialize OpenAI embeddings and disallow special tokens
EMBEDDINGS = OpenAIEmbeddings(disallowed_special=())

# Initialize DeepLake vector store with OpenAI embeddings
deep_lake = DeepLake(
    dataset_path=DEEPLAKE_PATH,
    read_only=True,
    embedding_function=EMBEDDINGS,
)

# Initialize retriever and set search parameters
retriever = deep_lake.as_retriever()
retriever.search_kwargs.update({
    'distance_metric': 'cos',
    'fetch_k': 100,
    'maximal_marginal_relevance': True,
    'k': 10,
})

# Initialize ChatOpenAI model
model = ChatOpenAI(model_name=language_model, temperature=0.2) # gpt-3.5-turbo by default. Use gpt-4 for better and more accurate responses 

# Initialize ConversationalRetrievalChain
qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)

# Initialize chat history
chat_history = []

def get_user_input():
    """Get user input and handle 'quit' command."""
    question = input("\nPlease enter your question (or 'quit' to stop): ")
    if question.lower() == 'quit':
        return None
    return question

def print_answer(question, answer):
    """Format and print question and answer."""
    print(f"\nQuestion: {question}\nAnswer: {answer}\n")

def main():
    """Main program loop."""
    while True:
        question = get_user_input()
        if question is None:  # User has quit
            break

        # Display token usage and approximate costs
        with get_openai_callback() as tokens_usage:
            result = qa({"question": question, "chat_history": chat_history})
            chat_history.append((question, result['answer']))
            print_answer(question, result['answer'])
            print(tokens_usage)

if __name__ == "__main__":
    main()

Conclusion

And there you have it! You've built a powerful question-answering system using LangChain, OpenAI, and DeepLake. This system can take a user's question, search a vector database for relevant context, and generate a detailed answer. It's like having your very own AI-powered oracle! In the next articles I'll show you some other use cases and applications!

The ultimate LangChain series — chat with your data

Learn how to use the data you stored in a vector DB as context for a chat bot

Table of contents

Introduction

Setting Up the Environment

Getting started with the chat file

Imports used for QA in LangChain

What is `ConversationalRetrievalChain`

What is `get_openai_callback`

Initializing DeepLake and OpenAI Embeddings

Setting Up the Retriever

Initializing the Language Model

Creating the Conversational Retrieval Chain

Helper functions for user interaction

Running the Program

Full code

Conclusion

The ultimate LangChain series — chat with your data

Learn how to use the data you stored in a vector DB as context for a chat bot

Table of contents

Introduction

Setting Up the Environment

Getting started with the chat file

Imports used for QA in LangChain

What is ConversationalRetrievalChain

What is get_openai_callback

Initializing DeepLake and OpenAI Embeddings

Setting Up the Retriever

Initializing the Language Model

Creating the Conversational Retrieval Chain

Helper functions for user interaction

Running the Program

Full code

Conclusion

Did you find this article valuable?

What is `ConversationalRetrievalChain`

What is `get_openai_callback`