RAG chatbot using Cohere with ConversationBufferMemory and prompt template

5 min readApr 27, 2024

In this blog post, we will be discussing how to build a RAG chatbot using the command model from Cohere. Our approach will involve using Langchain’s prompt template to instruct the LLM, ConversationBufferMemory, which will be used to store the chat conversation. Additionally, we will be using Streamlit to build the UI of the chatbot. I have also written another blog post that covers topics such as RAG, langchain input loaders, text splitters, embeddings, and vector databases. You can find the link to that blog post below.

Building a QA chatbot with memory using Langchain, Faiss, Streamlit and OpenAI…

We will be using the below tech stack to build the AI-Powered chatbot

jeevaharan.medium.com

Cohere API:

Create your API using the link given below.

Login | Cohere

Cohere provides access to advanced Large Language Models and NLP tools through one easy-to-use API. Get started for…

dashboard.cohere.com

Dot-env:

The dotenv package is a great way to keep APIs, and other sensitive data in your code. Install the dotenv package using the command given below.

pip install python-dotenv

.env file

from dotenv import load_dotenv

load_dotenv()

The load_dotenv() function will load the env variables from the .env file.

Embeddings:

We will be using the model “multilingual-22–12” to create embeddings. FAISS vector database will be used to store the embeddings to do our similarity search.

embeddings = CohereEmbeddings(
        model="multilingual-22-12"
    )
    vectorstore = FAISS.from_documents(text_chunks, embeddings)

Prompt Template:

We can utilize the prompt template to guide the language model on how to respond in a specific manner. Using the prompt template, the language model can comprehend the context and user query and provide more accurate responses. The Prompt Template requires both a template and input variables as inputs.

prompt_template = """Text: {context}
    Question: {question}
    you are a chatbot designed to assist the users.
    Answer only the questions based on the text provided. If the text doesn't contain the answer,
    reply that the answer is not available.
    keep the answers precise to the question"""

    PROMPT = PromptTemplate(
        template=prompt_template, input_variables=["context", "question"]
    )
    chain_type_kwargs = { "prompt" : PROMPT }

The above one just instructs the LLM, however, there is also an option to provide the LLM with some examples to help it understand the task, which is known as Few-Shot Prompting. For more information on Few-Shot Prompting templates, please refer to the link below.

Few-shot prompt templates | 🦜️🔗 LangChain

In this tutorial, we’ll learn how to create a prompt template that uses

python.langchain.com

Memory and Model :

Model:

We will be using the Command model from Cohere as our LLM. The command model can be used for language tasks like conversation, and summarizing with more reliability. Cohere offers other models in the Command family such as Command-light, Command-nightly, etc. You can check out the link below to explore different models that may be more suitable for your specific use case.

Command - Cohere Docs

Cohere's generative model Command is available in two sizes, with the command model showing better performance. Nightly…

docs.cohere.com

Conversation Buffer Memory:

The Conversation Buffer Memory keeps everything in the buffer memory up to the token limit of the model. The ‘chat_history’ is used as the memory key where all the human and AI Conversations are passed. As it is holding the whole conversation, the responses might be somewhat slower and the cost might be also high compared to the other memories in the langchain.

llm=Cohere(model="command", temperature=0)
    memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
    conversation_chain = ConversationalRetrievalChain.from_llm(
        llm = llm,
        retriever= vectorstore.as_retriever(),
        memory = memory,
        combine_docs_chain_kwargs=chain_type_kwargs
    )

There are other memories like ConversationBufferWindowMemory, ConversationSummaryMemory etc provided by langchain.

Memory types | 🦜️🔗 Langchain

There are many different types of memory. Each has their own parameters, their own return types, and is useful in…

js.langchain.com

We will be using streamlit for building the UI and storing the chat history. Run the below command to start the streamlit app

streamlit run <file name.py>

The complete code is given below :

import os
import streamlit as st
from dotenv import load_dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.prompts import PromptTemplate
from langchain_community.llms import Cohere
from langchain.embeddings.cohere import CohereEmbeddings
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from langchain_community.document_loaders import PyPDFLoader


def get_pdf_text():
    pdf_path = <Mention your PDF path here>
    loader = PyPDFLoader(file_path =pdf_path)
    doc = loader.load()
    return doc


def get_text_chunks(text):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,
    chunk_overlap= 500,
    separators=["\n\n","\n"," ",""])
    text = text_splitter.split_documents(documents= text)
    return text


def get_vectorstore(text_chunks, query):
    embeddings = CohereEmbeddings(
        model="multilingual-22-12"
    )
    vectorstore = FAISS.from_documents(text_chunks, embeddings)

    #Prompttemplate

    prompt_template = """Text: {context}
    Question: {question}
    you are a chatbot designed to assist the users.
    Answer only the questions based on the text provided. If the text doesn't contain the answer,
    reply that the answer is not available.
    keep the answers precise to the question"""

    PROMPT = PromptTemplate(
        template=prompt_template, input_variables=["context", "question"]
    )
    chain_type_kwargs = { "prompt" : PROMPT }

    #LLM

    llm=Cohere(model="command", temperature=0)
    memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
    conversation_chain = ConversationalRetrievalChain.from_llm(
        llm = llm,
        retriever= vectorstore.as_retriever(),
        memory = memory,
        combine_docs_chain_kwargs=chain_type_kwargs
    )

    response  = conversation_chain({"question": query})
    return(response.get("answer"))

def main():
    load_dotenv()
    st.set_page_config(page_title="Chat Assistant")

    # get pdf text
    raw_text = get_pdf_text()

    # get the text chunks
    text_chunks = get_text_chunks(raw_text)

    user_question =  st.chat_input("Ask a Question")


    if "messages" not in st.session_state.keys():
        st.session_state["messages"] = [{"role": "assistant",
                                         "content": "Hello there, how can i help you"}]

    if "messages" in st.session_state.keys():
        for message in st.session_state.messages:
            with st.chat_message(message["role"]):
                st.write(message["content"])

    if user_question is not None:
        st.session_state.messages.append({
            "role":"user",
            "content":user_question
        })

        with st.chat_message("user"):
            st.write(user_question)


    if st.session_state.messages[-1]["role"] != "assistant":
        with st.chat_message("assistant"):
            with st.spinner("Loading"):
                output = get_vectorstore(text_chunks, user_question)
                ai_response = output
                st.write(ai_response)

        new_ai_message = {"role":"assistant","content": ai_response}
        st.session_state.messages.append(new_ai_message)


if __name__ == '__main__':
    main()

GitHub Repository Link:

GitHub - Jeevaharan/RAG-chatbot-using-Cohere

Contribute to Jeevaharan/RAG-chatbot-using-Cohere development by creating an account on GitHub.

github.com

I hope this blog helped you to understand about Cohere, Prompt template and buffer memory.
Thanks for Reading !