RAG chatbot using Cohere with ConversationBufferMemory and prompt template

Jeevaharan
5 min readApr 27, 2024

--

In this blog post, we will be discussing how to build a RAG chatbot using the command model from Cohere. Our approach will involve using Langchain’s prompt template to instruct the LLM, ConversationBufferMemory, which will be used to store the chat conversation. Additionally, we will be using Streamlit to build the UI of the chatbot. I have also written another blog post that covers topics such as RAG, langchain input loaders, text splitters, embeddings, and vector databases. You can find the link to that blog post below.

Cohere API:

Create your API using the link given below.

Dot-env:

The dotenv package is a great way to keep APIs, and other sensitive data in your code. Install the dotenv package using the command given below.

pip install python-dotenv

.env file

from dotenv import load_dotenv

load_dotenv()

The load_dotenv() function will load the env variables from the .env file.

Embeddings:

We will be using the model “multilingual-22–12” to create embeddings. FAISS vector database will be used to store the embeddings to do our similarity search.

embeddings = CohereEmbeddings(
model="multilingual-22-12"
)
vectorstore = FAISS.from_documents(text_chunks, embeddings)

Prompt Template:

We can utilize the prompt template to guide the language model on how to respond in a specific manner. Using the prompt template, the language model can comprehend the context and user query and provide more accurate responses. The Prompt Template requires both a template and input variables as inputs.

prompt_template = """Text: {context}
Question: {question}
you are a chatbot designed to assist the users.
Answer only the questions based on the text provided. If the text doesn't contain the answer,
reply that the answer is not available.
keep the answers precise to the question"""

PROMPT = PromptTemplate(
template=prompt_template, input_variables=["context", "question"]
)
chain_type_kwargs = { "prompt" : PROMPT }

The above one just instructs the LLM, however, there is also an option to provide the LLM with some examples to help it understand the task, which is known as Few-Shot Prompting. For more information on Few-Shot Prompting templates, please refer to the link below.

Memory and Model :

Model:

We will be using the Command model from Cohere as our LLM. The command model can be used for language tasks like conversation, and summarizing with more reliability. Cohere offers other models in the Command family such as Command-light, Command-nightly, etc. You can check out the link below to explore different models that may be more suitable for your specific use case.

Conversation Buffer Memory:

The Conversation Buffer Memory keeps everything in the buffer memory up to the token limit of the model. The ‘chat_history’ is used as the memory key where all the human and AI Conversations are passed. As it is holding the whole conversation, the responses might be somewhat slower and the cost might be also high compared to the other memories in the langchain.

llm=Cohere(model="command", temperature=0)
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
conversation_chain = ConversationalRetrievalChain.from_llm(
llm = llm,
retriever= vectorstore.as_retriever(),
memory = memory,
combine_docs_chain_kwargs=chain_type_kwargs
)

There are other memories like ConversationBufferWindowMemory, ConversationSummaryMemory etc provided by langchain.

We will be using streamlit for building the UI and storing the chat history. Run the below command to start the streamlit app

streamlit run <file name.py>

The complete code is given below :

import os
import streamlit as st
from dotenv import load_dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.prompts import PromptTemplate
from langchain_community.llms import Cohere
from langchain.embeddings.cohere import CohereEmbeddings
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from langchain_community.document_loaders import PyPDFLoader


def get_pdf_text():
pdf_path = <Mention your PDF path here>
loader = PyPDFLoader(file_path =pdf_path)
doc = loader.load()
return doc


def get_text_chunks(text):
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,
chunk_overlap= 500,
separators=["\n\n","\n"," ",""])
text = text_splitter.split_documents(documents= text)
return text


def get_vectorstore(text_chunks, query):
embeddings = CohereEmbeddings(
model="multilingual-22-12"
)
vectorstore = FAISS.from_documents(text_chunks, embeddings)

#Prompttemplate

prompt_template = """Text: {context}
Question: {question}
you are a chatbot designed to assist the users.
Answer only the questions based on the text provided. If the text doesn't contain the answer,
reply that the answer is not available.
keep the answers precise to the question"""

PROMPT = PromptTemplate(
template=prompt_template, input_variables=["context", "question"]
)
chain_type_kwargs = { "prompt" : PROMPT }

#LLM

llm=Cohere(model="command", temperature=0)
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
conversation_chain = ConversationalRetrievalChain.from_llm(
llm = llm,
retriever= vectorstore.as_retriever(),
memory = memory,
combine_docs_chain_kwargs=chain_type_kwargs
)

response = conversation_chain({"question": query})
return(response.get("answer"))

def main():
load_dotenv()
st.set_page_config(page_title="Chat Assistant")

# get pdf text
raw_text = get_pdf_text()

# get the text chunks
text_chunks = get_text_chunks(raw_text)

user_question = st.chat_input("Ask a Question")


if "messages" not in st.session_state.keys():
st.session_state["messages"] = [{"role": "assistant",
"content": "Hello there, how can i help you"}]

if "messages" in st.session_state.keys():
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.write(message["content"])

if user_question is not None:
st.session_state.messages.append({
"role":"user",
"content":user_question
})

with st.chat_message("user"):
st.write(user_question)


if st.session_state.messages[-1]["role"] != "assistant":
with st.chat_message("assistant"):
with st.spinner("Loading"):
output = get_vectorstore(text_chunks, user_question)
ai_response = output
st.write(ai_response)

new_ai_message = {"role":"assistant","content": ai_response}
st.session_state.messages.append(new_ai_message)


if __name__ == '__main__':
main()

GitHub Repository Link:

I hope this blog helped you to understand about Cohere, Prompt template and buffer memory.

Thanks for Reading !

--

--

Jeevaharan

Passionate about Data engineering, ETL, Snowflake, Cloud Platforms (AWS) & Generative AI