Building a QA chatbot with memory using Langchain, Faiss, Streamlit and OpenAI (Retrieval-Augmented Generation (RAG))

5 min readFeb 12, 2024

We will be using the below tech stack to build the AI-Powered chatbot

Langchain (Framework to develop apps powered by language models)
Streamlit (To create machine learning and data science web apps)
Vector database (FAISS — Facebook AI Similarity Search)
Large Language Model (OpenAI GPT)

Retrieval-Augmented Generation (RAG)

Retrieval-augmented generation (RAG) is used to improve the accuracy of generative AI models by providing the required custom data. In simple words, it allows the LLM’s to chat with our local or domain-specific data easily. Although LLMs like GPTs are trained on huge data, they may not have access to all the information, for example, specific domain information or confidential data within a company. RAG comes in handy in such cases as we can give the external data and retrieve the relevant information. Additionally, RAG is a cost-effective approach to maintain the accuracy and relevance of the LLM’s output.

High level Overview of the chatbot:

Setup:

Before starting our application, you will be required to install the following python libraries: OpenAI, Langchain, FAISS, Streamlit.

Loading data using PyPDFLoader

Langchain provides various document loaders such as TextLoader, CSVLoader, UnstructuredHTMLLoader, JSONLoader, and more, depending on the input type. We will be using the PyPDFLoader to load the PDF document.

os.environ['OPENAI_API_KEY']="<OpenAI API Key>"

def document_data(query, chat_history):

    pdf_path ='<Pdf Path>'
    loader = PyPDFLoader(file_path=pdf_path)
    doc = loader.load()

Refer to the below link to know more about the types of loaders in langchain.

https://python.langchain.com/docs/modules/data_connection/document_loaders/

2. Text Splitting using RecursiveCharacterTextSplitter

We will be using RecursiveCharacterTextSplitter. It will be separating the input text recursively based on the separators mentioned in the list.

Chunk Size: Number of characters a chunk can contain.
Chunk Overlap: Number of characters that can overlap between nearby chunks.
Based on the problem you are solving, choose the appropriate chunk size and chunk overlap as it is crucial. Experiment with different sizes, as you will get to know what works best for your problem.

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, 
    chunk_overlap= 100, 
    separators=["\n\n","\n"," ",""]) 
    text = text_splitter.split_documents(documents= doc)

3. Creating embeddings using OpenAi and storing the vectors in FAISS

We will be using the OpenAI embeddings to create vectors. In general, embeddings will show much the words are related and close to each other. These embeddings will be stored in vector databases in numerical form. If the difference between vectors is less, they are closely related. In this way, the large language models will be able to retrieve the required specific info from huge data.

A vector database is designed to store and query vector data, enabling accurate data retrieval. There are several vector databases in the market like Pinecone, FAISS, Weaviate, ChromaDB, etc. We will be using FAISS (Facebook index similarity search). It uses the concept of similarity search to search for embeddings that are close to each other. If you are working with the same input data and want to avoid the cost of creating embeddings each time, you can create them once and save the vectors locally. Then, you can load them whenever required.

    embeddings = OpenAIEmbeddings()

    vectorstore = FAISS.from_documents(text, embeddings)
    vectorstore.save_local("vectors")

4. ConversationalRetrievalChain

To have a conversation with a document, ConversationalRetrievalChain can be used. It takes the input query and the whole chat history so that the chatbot will understand the conversation and answer the questions based on the context. You can also use prompt engineering techniques to speak effectively with the LLM. This allows large language models to generate precise answers to user questions.

   # Loading the saved embeddings 
    loaded_vectors =FAISS.load_local("vectors", embeddings)

   # ConversationalRetrievalChain 
    qa = ConversationalRetrievalChain.from_llm(
       llm=OpenAI(), 
       retriever= loaded_vectors.as_retriever()
    )
    
    return qa({"question":query, "chat_history":chat_history})

Refer to the link below to understand the different chains and their use cases.

https://python.langchain.com/docs/modules/chains

5. User Interface : Streamlit

There are several libraries available for creating data science web apps, such as Gradio, Chainlit, Streamlit, Flask, and others. For our chatbot, we will be using Streamlit, which is an open-source Python library that allows you to create and share web apps in minutes.

Getting the input (User Query)

if __name__ == '__main__':

    st.header("QA ChatBot")
    # ChatInput
    prompt = st.chat_input("Enter your questions here")

    if "user_prompt_history" not in st.session_state:
       st.session_state["user_prompt_history"]=[]
    if "chat_answers_history" not in st.session_state:
       st.session_state["chat_answers_history"]=[]
    if "chat_history" not in st.session_state:
       st.session_state["chat_history"]=[]

Appending the response, user query, and chat history

if prompt:
       with st.spinner("Generating......"):
           output=document_data(query=prompt, chat_history = st.session_state["chat_history"])

          # Storing the questions, answers and chat history

           st.session_state["chat_answers_history"].append(output['answer'])
           st.session_state["user_prompt_history"].append(prompt)
           st.session_state["chat_history"].append((prompt,output['answer']))

Displaying the chat

we can display the user queries & the bot’s reply by looping through the objects chat_answers_history and user_prompt_history.

    if st.session_state["chat_answers_history"]:
       for i, j in zip(st.session_state["chat_answers_history"],st.session_state["user_prompt_history"]):
          message1 = st.chat_message("user")
          message1.write(j)
          message2 = st.chat_message("assistant")
          message2.write(i)

Please find the complete code below.

import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import FAISS
from langchain.chains import ConversationalRetrievalChain
from langchain.llms import OpenAI
import streamlit as st


os.environ['OPENAI_API_KEY']= "<OpenAI API Key>" 

def document_data(query, chat_history):

    pdf_path = '<Pdf Path>'
    loader = PyPDFLoader(file_path=pdf_path)
    doc = loader.load()

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, 
    chunk_overlap= 100, 
    separators=["\n\n","\n"," ",""]) 
    text = text_splitter.split_documents(documents= doc) 

   # creating embeddings using OPENAI

    embeddings = OpenAIEmbeddings()

    vectorstore = FAISS.from_documents(text, embeddings)
    vectorstore.save_local("vectors")
    print("Embeddings successfully saved in vector Database and saved locally")

   # Loading the saved embeddings 
    loaded_vectors=FAISS.load_local("vectors", embeddings)

   # ConversationalRetrievalChain 
    qa = ConversationalRetrievalChain.from_llm(
       llm=OpenAI(), 
       retriever= loaded_vectors.as_retriever()
    )
    
    return qa({"question":query, "chat_history":chat_history})

if __name__ == '__main__':

    st.header("QA ChatBot")
    # ChatInput
    prompt = st.chat_input("Enter your questions here")

    if "user_prompt_history" not in st.session_state:
       st.session_state["user_prompt_history"]=[]
    if "chat_answers_history" not in st.session_state:
       st.session_state["chat_answers_history"]=[]
    if "chat_history" not in st.session_state:
       st.session_state["chat_history"]=[]

    if prompt:
       with st.spinner("Generating......"):
           output=document_data(query=prompt, chat_history = st.session_state["chat_history"])

          # Storing the questions, answers and chat history

           st.session_state["chat_answers_history"].append(output['answer'])
           st.session_state["user_prompt_history"].append(prompt)
           st.session_state["chat_history"].append((prompt,output['answer']))

    # Displaying the chat history

    if st.session_state["chat_answers_history"]:
       for i, j in zip(st.session_state["chat_answers_history"],st.session_state["user_prompt_history"]):
          message1 = st.chat_message("user")
          message1.write(j)
          message2 = st.chat_message("assistant")
          message2.write(i)

Run the below command to start the streamlit app

streamlit run <file name.py>

GitHub Repository Link

GitHub - Jeevaharan/Custom-QA-chatbot-using-OpenAI

Contribute to Jeevaharan/Custom-QA-chatbot-using-OpenAI development by creating an account on GitHub.

github.com

I hope this blog helped you to understand the concept of RAG and developing chatbots using custom data. Thanks for Reading !

Check out my other blog on RAG chatbots using Cohere with conversation buffer memory and prompt template given below.

RAG chatbot using Cohere with ConversationBufferMemory and prompt template

In this blog post, we will be discussing how to build a RAG chatbot using the command model from Cohere. Our approach…

jeevaharan.medium.com