# The ultimate LangChain series — chat with your data

## Introduction

Our previous articles have journeyed through the fascinating world of language models, vectors, and vector databases. We've learned how to load and split YouTube videos and other kinds of documents, generate embeddings, and store them in a vector database using LangChain and OpenAI. Now, it's time to take the next exciting step: using the data stored in the vector database to answer questions. Sounds like a thrilling project, right? Let's dive right in!

<div data-node-type="callout">
<div data-node-type="callout-emoji">📚</div>
<div data-node-type="callout-text">Check out the previous articles to learn how to set up your projects:</div>
</div>

* [The ultimate LangChain series — Environment setup](https://blog.davideai.dev/the-ultimate-langchain-series-environment-setup)
    
* [**The ultimate LangChain series — Projects structure**](https://blog.davideai.dev/the-ultimate-langchain-series-projects-structure)
    
* [The ultimate LangChain series — data loaders](https://blog.davideai.dev/the-ultimate-langchain-series-data-loaders)
    
* [The ultimate LangChain series — text splitters](https://blog.davideai.dev/the-ultimate-langchain-series-text-splitters)
    
* [The ultimate LangChain series — Embeddings & vector stores](https://blog.davideai.dev/the-ultimate-langchain-series-embeddings-vector-stores)
    
* [The ultimate LangChain series — Pinecone vector database](https://blog.davideai.dev/the-ultimate-langchain-series-pinecone-vector-database)
    

This guide will use the Activeloop vector store we set up in the previous article as context for our QA bot.

## Setting Up the Environment

Before starting, we must ensure our environment is properly set up. We'll be using the `dotenv` library to load environment variables from a `.env` file. This file should contain your OpenAI API key, Activeloop token, and the name of the language model you're using.

Here is a sample of the `.env` file you can use:  

```python
# OpenAI 
OPENAI_API_KEY="YOUR_OPENAI_KEY"
EMBEDDINGS_MODEL="text-embedding-ada-002"
LANGUAGE_MODEL="gpt-3.5-turbo" # gpt-4 gpt-3.5-turbo

# Deeplake vector DB
ACTIVELOOP_TOKEN="YOUR_ACTIVELOOP_TOKEN"
DATASET_PATH="./podcast_vector_db" # "hub://USER_ID/custom_dataset"   Edit with your user id if you want to use the cloud db.
```

> Check the first article in the series to make sure your environment is ready.
> 
> [The ultimate LangChain series — Environment setup](https://blog.davideai.dev/the-ultimate-langchain-series-environment-setup)

## Getting started with the chat file

In your project root, create a new file named `chat.py`, this is the file that you will interact with; the logic will work as follow:

* Import the necessary tools from LangChain
    
* Set up the path to interact with the vector DB
    
* Set up the retriever
    
* Set up the QA chain
    
* Allows the user to ask questions
    

### Imports used for QA in LangChain

Start by importing the required tools and environment variables

```python
import os
from dotenv import load_dotenv
from langchain.vectorstores import DeepLake
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.embeddings import OpenAIEmbeddings
from langchain.callbacks import get_openai_callback

# Load environment variables from .env file
load_dotenv()

# Set environment variables
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')
os.environ['ACTIVELOOP_TOKEN'] = os.getenv('ACTIVELOOP_TOKEN')
language_model = os.getenv('LANGUAGE_MODEL')
```

You should be familiar with those by now; the only ones are `ConversationalRetrievalChain` and `get_openai_callback`.

### What is `ConversationalRetrievalChain`

The `ConversationalRetrievalChain`, a key component of the LangChain system, is a powerful tool designed to interact with data housed within a VectorStore. Its primary function is to retrieve the most relevant information or code snippets in response to a user's query; it also employs sophisticated techniques such as context-aware filtering and ranking to ensure the results it delivers are relevant, highly accurate, and meaningful.

The `ConversationalRetrievalChain` also has the ability to consider the conversation's history and context, thereby providing accurate responses and making sense of the flow of the conversation.

### What is `get_openai_callback`

The `get_openai_callback` function, a handy utility provided by the LangChain library, is like a meticulous accountant for your OpenAI API calls. It keeps track of token usage, ensuring you're always aware of how many tokens are consumed during specific interactions with the OpenAI API and even the total cost in USD.

Here's a quick example of how to use `get_openai_callback`:

```python
from langchain.callbacks import get_openai_callback
from langchain.llms import OpenAI

llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)

with get_openai_callback() as cb:
    result = llm("Tell me a joke")
    print(cb)
```

In this snippet, we're asking an OpenAI model to tell us a joke. The `get_openai_callback` function is keeping track of the tokens used in this interaction.

The output might look something like this:

```python
Tokens Used: 42
    Prompt Tokens: 4
    Completion Tokens: 38
Successful Requests: 1
Total Cost (USD): $0.00084
```

This output tells us that the joke cost us 42 tokens, with 4 tokens for the prompt and 38 for completion. It also tells us that the total cost of this interaction was $0.00084.

You can also use `get_openai_callback` to track token usage for multiple calls in sequence or within a chain or agent with multiple steps.

<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Keep in mind that the cost estimate might not always be accurate.</div>
</div>

### Initializing DeepLake and OpenAI Embeddings

Next, we'll initialize our OpenAI embeddings and DeepLake vector store. Remember, DeepLake is our vector database where we've stored our embeddings. We'll set `read_only=True` since we're only retrieving data, not adding or modifying it.

```python
# Set DeepLake dataset path
DEEPLAKE_PATH = os.getenv('DATASET_PATH')

# Initialize OpenAI embeddings and disallow special tokens
EMBEDDINGS = OpenAIEmbeddings(disallowed_special=())

# Initialize DeepLake vector store with OpenAI embeddings
deep_lake = DeepLake(
    dataset_path=DEEPLAKE_PATH,
    read_only=True,
    embedding_function=EMBEDDINGS,
)
```

### Setting Up the Retriever

Now, we'll set up our retriever. The retriever is the component that will search our vector database for the most relevant vectors to our query. We'll set the `distance_metric` to `cos`, which stands for cosine similarity, a common measure of vector similarity. We'll also set `fetch_k` to 100, meaning the retriever will initially fetch the top 100 most similar vectors. Then, using Maximal Marginal Relevance (MMR), it will refine this to the top 10 (`k=10`) most diverse and relevant vectors.

```python
# Initialize retriever and set search parameters
retriever = deep_lake.as_retriever()
retriever.search_kwargs.update({
    'distance_metric': 'cos',
    'fetch_k': 100,
    'maximal_marginal_relevance': True,
    'k': 10,
})
```

### Initializing the Language Model

Now is the time to initialize the language model using the `ChatOpenAI` class. We'll set the `temperature` parameter to 0.2, which controls the randomness of the model's output. A lower temperature results in more deterministic responses. When you use your own data, you can also set the `temperature` to `0` to ensure that the model has fewer chances to hallucinate.

<div data-node-type="callout">
<div data-node-type="callout-emoji">📘</div>
<div data-node-type="callout-text">By the way, hallucination in LLMs refers to the generation of information or content that wasn't present or implied in the input data. It's as if the model is dreaming up details, hence the term "hallucination". This can sometimes lead to creative and unexpected outputs but can also result in inaccuracies or inconsistencies in the generated text.</div>
</div>

```python
# Initialize ChatOpenAI model
model = ChatOpenAI(model_name=language_model, temperature=0.2) 

# We set gpt-3.5-turbo by default in the env variables. 
# Use gpt-4 for better and more accurate responses.
```

### Creating the Conversational Retrieval Chain

Let's start with the Conversational Retrieval Chain. This is where the magic happens! The Conversational Retrieval Chain takes our question, retrieves the most relevant vectors from our database, and uses them as context to generate an answer from our language model.

```python
# Initialize ConversationalRetrievalChain
qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)

# Initialize chat history
chat_history = []
```

### Helper functions for user interaction

This is where we set up how the user can ask questions and how the answers will be displayed.

We'll have two functions, `get_user_input()` and `print_answer()`. These functions take user input, process it, and provide a response.

```python
def get_user_input():
    """Get user input and handle 'quit' command."""
    question = input("\nPlease enter your question (or 'quit' to stop): ")
    if question.lower() == 'quit':
        return None
    return question

def print_answer(question, answer):
    """Format and print question and answer."""
    print(f"\nQuestion: {question}\nAnswer: {answer}\n")
```

The `get_user_input()` function is the gateway for user interaction. It prompts the user to enter a question with the message "Please enter your question (or 'quit' to stop): ". This function also handles a special command, 'quit'. If the user types 'quit', the function will return None, signaling the program to stop taking further inputs. This is an elegant way to exit the program instead of using `cmd` + `c`; otherwise, it returns the question entered by the user.

The `print_answer(question, answer)` function, on the other hand, is responsible for presenting the response to the user. It takes two parameters: the original question asked by the user and the answer generated by the system. It then formats these two pieces of information in a readable manner and prints them out. This function ensures the user can clearly see their original question and the corresponding answer, providing a smooth and understandable user experience.

## Running the Program

Finally, we're ready to run our program! We'll create a loop that prompts the user for a question, uses our Conversational Retrieval Chain to generate an answer, and then prints the answer. We'll also print the token usage for each question, which can help us keep track of our API usage.

```python
def main():
    """Main program loop."""
    while True:
        question = get_user_input()
        if question is None:  # User has quit
            break

        # Display token usage and approximate costs
        with get_openai_callback() as tokens_usage:
            result = qa({"question": question, "chat_history": chat_history})
            chat_history.append((question, result['answer']))
            print_answer(question, result['answer'])
            print(tokens_usage)

if __name__ == "__main__":
    main()
```

Now you just need to run it with `python3 chat.py` and ask some questions! Here is an example of a few interactions:

> Question: can you summarize what this episode of the podcast is about?
> 
> Answer: This episode of the podcast is mainly focused on web3 security, specifically discussing a faucet that was recently drained for almost 200 in gorillaith. The speakers talk about how it happened, why it's important to take precautionary measures, and the need for usability and mainstream adoption in the web3 space. They also mention previous episodes where they discussed AI adventures and the chain sector plugin.
> 
> Question: who are the speakers?
> 
> Answer: The speakers of this episode of the podcast are Tabasco (Ethan), Ache (Director of Developer Experience), and David (Developer Advocate).

And it will look like this in the terminal:

```python
Question: who are the speakers?
Answer: The speakers of this episode of the podcast are Tabasco (Ethan), Ache (Director of Developer Experience), and David (Developer Advocate).

Tokens Used: 2577
        Prompt Tokens: 2537
        Completion Tokens: 40
Successful Requests: 2
Total Cost (USD): $0.005154000000000001
```

We can see that the responses are extremely relevant and well formatted; considering how simple the application is, this is a great starting point.

But you will notice some details are partially wrong, and this is not a problem with the LLM or our app. Since we rely on the YouTube transcription of the audio from the video, things like names, especially, might be transcripted wrong; for example, 'gorillaith' should be 'Goerli ETH', and the name of the Chainstack's Director of Developer Experience is 'Ake' and not "Ache'. We can easily fix these details when scraping the data we need.

> Go check the podcast on YouTube to see how accurate our QA bot is!
> 
> [**FAFO & Chill Ep3**](https://www.youtube.com/watch?v=nFFA0lFswSA)

This code interacts with a local vector DB, but all you have to do to use a cloud instance is to change the path in the environment variables to your cloud vector DB, just like we did in the previous article explaining how to use Activeloop.

And you can use the same principle to interact with a Pinecone vector DB; check the article dedicated to Pinecone.

* [The ultimate LangChain series — Pinecone vector database](https://blog.davideai.dev/the-ultimate-langchain-series-pinecone-vector-database)
    

## Full code

```python
import os
from dotenv import load_dotenv
from langchain.vectorstores import DeepLake
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.embeddings import OpenAIEmbeddings
from langchain.callbacks import get_openai_callback

# Load environment variables from .env file
load_dotenv()

# Set environment variables
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')
os.environ['ACTIVELOOP_TOKEN'] = os.getenv('ACTIVELOOP_TOKEN')
language_model = os.getenv('LANGUAGE_MODEL')

# Set DeepLake dataset path
DEEPLAKE_PATH = os.getenv('DATASET_PATH')

# Initialize OpenAI embeddings and disallow special tokens
EMBEDDINGS = OpenAIEmbeddings(disallowed_special=())

# Initialize DeepLake vector store with OpenAI embeddings
deep_lake = DeepLake(
    dataset_path=DEEPLAKE_PATH,
    read_only=True,
    embedding_function=EMBEDDINGS,
)

# Initialize retriever and set search parameters
retriever = deep_lake.as_retriever()
retriever.search_kwargs.update({
    'distance_metric': 'cos',
    'fetch_k': 100,
    'maximal_marginal_relevance': True,
    'k': 10,
})

# Initialize ChatOpenAI model
model = ChatOpenAI(model_name=language_model, temperature=0.2) # gpt-3.5-turbo by default. Use gpt-4 for better and more accurate responses 

# Initialize ConversationalRetrievalChain
qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)

# Initialize chat history
chat_history = []

def get_user_input():
    """Get user input and handle 'quit' command."""
    question = input("\nPlease enter your question (or 'quit' to stop): ")
    if question.lower() == 'quit':
        return None
    return question

def print_answer(question, answer):
    """Format and print question and answer."""
    print(f"\nQuestion: {question}\nAnswer: {answer}\n")

def main():
    """Main program loop."""
    while True:
        question = get_user_input()
        if question is None:  # User has quit
            break

        # Display token usage and approximate costs
        with get_openai_callback() as tokens_usage:
            result = qa({"question": question, "chat_history": chat_history})
            chat_history.append((question, result['answer']))
            print_answer(question, result['answer'])
            print(tokens_usage)

if __name__ == "__main__":
    main()
```

## Conclusion

And there you have it! You've built a powerful question-answering system using LangChain, OpenAI, and DeepLake. This system can take a user's question, search a vector database for relevant context, and generate a detailed answer. It's like having your very own AI-powered oracle! In the next articles I'll show you some other use cases and applications!
