The ultimate LangChain series — Environment setup
Setup the perfect Python environment to develop with LangChain
Let me introduce you to something that's been shaking up the developer scene: LangChain. It's this awesome framework built just for AI applications. From chatbots to recommendation engines, you name it, LangChain has got the tools to make it happen. And the best part? It's super easy to use. I'm all about getting the most bang for my buck, and LangChain fits the bill perfectly.
This is the first guide in this series, I want to keep it short and sweet so it's easy to handle and I can give good details. This guide is Python-centered, and today we'll go over how to create the perfect Python environment for you to develop with LangChain.
Check the LangChain docs as well.
What is LangChain?
First of all, what the heck is LangChain?
LangChain is a pretty versatile framework built with the aim of making it easier to develop apps that use language models like OpenAI's GPT3 and GPT4. It's all about being aware of data and making things happen. It's designed to help language models connect with all kinds of data sources and let them interact with their environments. The modules it comes with can handle different stuff - we're talking about different model types, managing prompts, keeping memory states persistent, and more.
What's cool is that LangChain can be used in all sorts of ways - think autonomous agents, personal assistants, question-answering systems, and even chatbots. It's super helpful for tricky tasks like querying structured data, understanding code, working with APIs, pulling out info, summarizing documents, and checking out generative models. If you want to get the most out of language models in your apps, LangChain is your go-to toolbox. It's a great resource, whether you're just starting out or you've been at this for a while.
Now, for our project today, we're gonna be using LangChain for all our 'AI-related' tasks. We're talking about generating embedding vectors, creating and querying a vector database - that kind of stuff.
If you are curious about what you can do with LangChain, check out the tool I made: Scrape and chat with repositories.
Requisites you use LangChain
Python can be a bit tricky to set up, and a solid foundation can avoid many headaches. Make sure to check the next boxes:
Operating System: Python is pretty flexible when it comes to operating systems, but some offer a better developing experience than others, the following systems are in order of my personal preference, and you can make it work with any OS you prefer:
MacOS: MacOS, in my experience (I worked with all three of them), is the best option giving fewer conflicts and offering a smoother setup.
Linux: This is the second best option if you don't have access to a fancy Mac, and it will still offer a nice experience if your Linux is set up properly.
Windows with Subsystem for Linux (WSL): If you're on Windows, no problem, Python works fine, and I worked with it for a long time on Windows, but I recommend you have at least WSL set up. It's a compatibility layer that lets you run Linux binary executables natively on Windows; this will help.
Python 3.7 or higher: We're going to be working with many Python modules, so make sure your Python is up to date; I recommend Python 3.10.10.
Pip: This is Python's package installer. We'll need it to install some of the libraries that LangChain relies on. If you've got Python, chances are you've got Pip. But if not:
Python Virtual Environment (venv): Trust me, you'll want to keep your project and its dependencies nicely contained. That's where Python's venv comes in handy. If you've never used it before, don't sweat it, it's really easy, and it comes with Python.
Note that evrything in this guide is based on MacOS.
Set up the perfect Python environment
So, once you checked all the previous boxes, let's jump into the action.
Python virtual environment
The best way to keep your Python experience smooth is to always work in a virtual environment for every project you work on. My workflow is as follows:
Create a directory for my new shiny project that nobody is ever going to use LOL
Create a new virtual environment in it; I always try to name it something that reminds me of my project.
Activate the virtual environment.
Install the required packages and dependencies.
This has worked very well for me lately, allowing you to keep every project isolated in its little box.
Create a new virtual environment
Once you decide where to keep your project, create a new virtual environment in the root directory. Python comes with the venv
module built-in, and you can create a new one with this command:
python3 -m venv YOUR_VENV_NAME
So let's create a new virtual environment for our LangChain project:
python3 -m venv langchain
Then activate it:
source langchain/bin/activate
Now your terminal will display that you are in the virtual environment:
(langchain) soos@MacBookPro langchain-series %
Here you go; now you have a brand new and shiny Python environment and are ready to install some dependencies.
Install the main LangChain modules
LangChain has a lot of modules and tools built in, but in general, you will need some extra packages. You can install all of them with Pip, and the following are the packages that I usually install right away before starting development.
python3 -m pip install --upgrade langchain deeplake pinecone-client openai tiktoken python-dotenv
Let's go over what we are doing here:
python3 -m pip
: This runs thepip
module using Python3. This is a reliable way to ensure you're using the correct version ofpip
for your Python3 interpreter.install
: This is thepip
command to install packages.langchain deeplake pinecone openai tiktoken python-dotenv
: These are the names of the Python packages that you want to install. This command will install the latest versions of these packages.
Here's a quick summary of what each package is used for:
langchain
: This is the main LangChain module, where we'll be able to use document loaders, text splitters, and other utilities.deeplake
: Deep Lake is a vector database provider; we need to store the processed data as vectors, so we need a vector database. Deep Lake allows you to easily create indexes locally or on the cloud, so it's usually my choice.pinecone
: Pinecone is another vector database for machine learning applications and probably the most known one; it's reliable and performant but you might need to sign up on a waitlist to get a free account, so I don't always use it but I like to have it available in case I need it.openai
: This is the official Python client library for the OpenAI API, which allows you to access powerful AI models; we'll use it for embedding models, chat models, and so on.tiktoken
: A library from OpenAI that lets you count the number of tokens in a text string without making an API call.python-dotenv
: This package allows you to specify environment variables in a.env
file, which can be useful for managing secret keys and other configuration values.
And that's it; now you have the perfect Python environment to start with LangChain and language model applications.
In the next episode, we'll learn how to set up a project, import the main modules, and set up the environment variables.
Conclusion
And there you have it, folks. We've just dipped our toes into the exciting world of LangChain — a game-changer in the AI applications landscape, and you understand why I keep saying this once you realize how easily you can make powerful apps. It's versatile, easy to use, and packed with a variety of modules to help you build anything from chatbots to powerful question-answering systems.
In this guide, we've covered the basics, from setting up your Python environment to understanding the key modules you'll need to start with LangChain. We've taken a look at how to create a Python virtual environment — a crucial step to keep your projects neat and tidy.