import os
import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
from dotenv import load_dotenv
from langchain_community.document_loaders import PyPDFLoader
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_openai import ChatOpenAI
from langchain_text_splitters import RecursiveCharacterTextSplitter
load_dotenv()
What is Retrieval Augmented Generation (RAG)?
Retrieval Augmented Generation (RAG) is the most popular approach to providing LLMs with external information before they generate a response.
RAG is a technique where you retrieve the information required to solve a user’s query, then augment the context of the LLM with that information, and generate a response. In this tutorial, you’ll learn why RAG is useful, when to use it, and how to build your own RAG pipeline, step-by-step, using Python.
Let’s get started!
What is RAG?
It’s a technique to improve LLM answers by providing them with external information before they generate a response. It consists of three steps:
- Retrieve: The system starts by searching a specific knowledge base for relevant information about the query.
- Augment: This retrieved information is added to context that’s used by the LLM to generate a response.
- Generate: The LLM uses both your question and the provided information to generate an answer.
In addition to reducing costs and latency, RAG is useful because it reduces hallucinations, lets you use current data, and builds trust with users by (potentially) providing citations.
Vector databases
A vector database (VectorDB) is a database designed to store and query data as vector embeddings (numerical representations). So, provided with a user query, it’s the engine you use to find the most similar data in your database. It’s one of the most popular components of the retrieval step in RAG pipelines.
In recent years, many new vector databases have been created. But, in most cases, they had to re-discover that many of the ideas in the old generation of vector databases such as BM25-based retrieval were still valid and useful.
Some popular vector databases are:
- New generation: Qdrant, Chroma, Pinecone, Weaviate.
- Old generation: Elasticsearch/OpenSearch and Postgres+PGVector
In this tutorial, you’ll use Chroma. For client projects, I’ve used Elasticsearch, Postgres, Weaviate, and Qdrant. Many companies are already using Elasticsearch or Postgres, so it’s often easier to get started with them.
Why use a VectorDB?
If you have a small dataset, there’s no real reason to use a vector database. But if you’re dealing with thousands or millions of documents, you’ll need to use a vector database to efficiently retrieve the most relevant documents.
They’re useful because:
- The more noise in the context provided to the LLM, the more likely it is to produce bad output.
- It takes more time to process a longer context
- It costs more to process a longer context
Retrieval
Retrieval is the process of finding the most relevant documents in the vector database. There are two main approaches when dealing with text-based data: term-based retrieval and embedding-based retrieval.
Term-based retrieval
Term-based retrieval is a technique that uses the terms in the query to find the most relevant documents in the vector database.
It’s based on the following ideas:
- TF-IDF: Counts how often a term appears in this document (TF). Measures how rare the word is across all documents (IDF). Highlights terms important and unique to this specific document.
- Okapi BM25: Expands TF-IDF to introduce a weighting mechanism for term saturation and document length.
Embedding-based retrieval
Embedding-based retrieval is a technique that uses the embedding of the query to find the most relevant documents in the vector database.
For small datasets, you can use k Nearest Neighbors (k-NN) approach to find the most relevant documents, in which you calculate the similarity score between the query vector and every other vector stored in the VectorDB. Sort all the vectors based on these similarity scores and return the ‘k’ most similar vectors (relative to the query).
For larger datasets, you can use Approximate Nearest Neighbors (ANN) such as Locality-Sensitive Hashing (LSH) or Hierarchical Navigable Small World (HNSW) to find the most relevant documents.
Prerequisites
To follow this tutorial you’ll need to:
- Sign up and generate an API key in OpenAI.
- Set the API key as an environment variable called
OPENAI_API_KEY
. - Create a virtual environment in Python and install the requirements:
- Download the sample PDF file
python -m venv venv
source venv/bin/activate
pip install langchain chromadb langchain-openai langchain-community python-dotenv pypdf jupyter
Once you’ve completed the steps above, you can run copy and paste the code from the next sections. You can also download the notebook from here.
RAG without vector database
Let’s go through an example without a VectorDB. We’ll simply augment with the full text of the document.
First, import the necessary libraries and load the required variables from the .env file.
This code will import the necessary libraries and load the required variables from the .env file.
Read the document (retrieval)
Next, we’ll use a langchain DocumentLoader to load the document. Since, we’re dealing with a PDF file, we’ll use the PyPDFLoader.
There are many DocumentLoaders available in langchain. You can find the full list here.
A document loader is a class that processes a document and returns a list of Document objects. In the case of the PyPDFLoader, it will read each page of the PDF file and return the text of each page with some additional metadata.
A single page will look like this:
{'id': None,
'metadata': {'producer': 'Adobe PDF Library 15.0',
'creator': 'Adobe InDesign 16.1 (Windows)',
'creationdate': '2021-03-24T14:51:54+01:00',
'moddate': '2021-03-24T14:51:54+01:00',
'trapped': '/False',
'source': '../_extras/what-is-rag/bbva.pdf',
'total_pages': 4,
'page': 0,
'page_label': '1'},
'page_content': "EDICIÓN AQUA PREP 01-01\nBANCO BILBAO VIZCAYA ARGENTARIA, S.A. - Plaza de San Nicolás, 4 - 48005 BILBAO\nReg. Mer. Vizcaya -T omo 3858, Folio 1, Hoja BI-17 BIS-A, Inscripción 1035ª C.I.F.: A48265169\n1 / 4\nThis document contains the Pre-contractual information and the Prior General Information of the Aqua Pre-paid Card contract \n(hereinafter, the Card) in accordance with the provisions of the Ministerial Order ECE/1263/2019, on the transparency of \ninformation conditions applicable to payment services, and Bank of Spain Circular 5/2012, on the transparency of banking services \nand responsibility in the granting of loans.\nThe information highlighted in bold is especially important, in accordance with Circular 5/2012\n1. ON THE PAYMENT SERVICE PROVIDER\n1.1 Details and registration\nBANCO BILBAO VIZCAYA ARGENTARIA, S.A.\nAddress: Plaza San Nicolás, 4 - 48005 BILBAO. \nPhone number: 900 102 801\nWebsite address: www.bbva.es\nRegistered in the Biscay Commercial Register, Volume 2083, \nFolio 1, Sheet BI-17-A, Entry 1\n1.2 Supervisory Authorities:\nBanco de España (Registry 0182)\n[Spanish National Securities Market Commission]\n2. ON THE USE OF THE PAYMENT SERVICES\n2.1 Main characteristics: PREPAID CARD .\nThe Holder may specify that the card be physical or virtual. \nT erms and conditions governing the availability of funds: in \nother words, when and how the holder will obtain the money:\na) The Card, against a balance previously loaded on it, \nmay be used to purchase goods or services in any of \nthe physical or virtual establishments affiliated with the \ncard systems to which the Card belongs and that are \nshown on it.\nb) T o make online payments with the Card, the Account \nHolder must consult the details pertaining to the card \nnumber, expiration date and CVV via the BBVA website \nor mobile app.\nc) Withdraw money from ATMs, Bank branches and \nany other entities that allow it against the balance \npreviously loaded on it.\nT ransactions carried out with the Card will reduce the \navailable balance.\nUnder no circumstances may transactions be carried out \nin excess of the current unused loaded balance at any time \n(available balance).\n2.2 Conducting transactions. Consent.\nT o withdraw money or pay with the Card in physical \nestablishments, you must present the Card and enter your \npersonal identification number (PIN).\nThe Card's contactless technology can be used to pay or \nwithdraw cash with the Card without having to enter the PIN for \ntransactions under 50 euros.\nFor online shop purchases, you must identify yourself in the \nmanner indicated by the Bank, enter the security password and \nfollow the procedure specified by the Bank..\n2.3 Execution period\nThe transactions will be charged to the Direct Debit Account on \nthe date on which they were executed.\nPre-contractual information and \ninformation booklet prior to \nconcluding the payment services \ncontract\nAQUA PRE-PAID CARD",
'type': 'Document'}
You can see that in addition to the page content, it includes metadata about the source file, the page number, etc.
This document is about the conditions of some specific banking product. We’ll use it to answer a question about it.
Augment the context
Now that we have all the pages of the PDF available as a text, let’s build the context we’ll use to generate a response.
We’ll define a system and a user prompt. In the system prompt, we’ll define the role of the assistant and in the user prompt, we’ll provide the user question and the documents.
system_prompt = """
You are a helpful assistant that can answer questions about the provided context.
Please cite the page number used to answer the question. Write the page number in the format "Page X" at the end of your answer.
If the answer is not found in the context, please say so.
"""
user_prompt = """
Please answer the following question based on the context provided:
Question: {question}
Documents:
{documents}
"""
pages_str = ""
for i, page in enumerate(pages):
pages_str += f"--- PAGE {i + 1} ---\n{page.page_content}\n\n"
We’ve set up the system and user prompt, and a a variable that stores the pages we extracted as a single string. When we make a request to the model, we’ll combine all of these into messages and send them to the model.
Now, we’re ready to generate a response.
Generate response
To generate a response we’ll use gpt-4.1-mini
and combine the system and user prompts we’ve built to augment the model’s context.
model = ChatOpenAI(model="gpt-4.1-mini", temperature=0)
def get_response(context_vars: dict):
messages = [
SystemMessage(content=system_prompt),
HumanMessage(content=user_prompt.format(**context_vars)),
]
response = model.invoke(messages)
return response.content
question = "What is the main idea of the document?"
response = get_response({"question": question, "documents": pages_str})
print(response)
The main idea of the document is to provide the pre-contractual and general information regarding the Aqua Pre-paid Card offered by Banco Bilbao Vizcaya Argentaria, S.A. (BBVA). It outlines the terms and conditions of the card, including its features, usage, fees, security measures, responsibilities of the cardholder and the bank, contract duration, amendments, termination, applicable law, dispute resolution procedures, and other important legal aspects. The document aims to ensure transparency and inform potential cardholders about their rights and obligations before entering into the contract.
Page 1 to Page 4
In this code, we’ve combined the system, user prompt, the pages extracted from the document, and a user question (“What is the main idea of the document?”) into messages the model can understand.
If you run the code, you’ll get an accurate answer from the model. Try running it with a different question.
question = "What are the daily transaction limits?"
response = get_response({"question": question, "documents": pages_str})
print(response)
The daily transaction limits for the Aqua Pre-paid Card are as follows: The daily purchase limit will be determined by the Card's balance and up to a maximum of 1,000 euros per day. The Holder and the Bank may modify the initially specified limits. Additionally, the monthly limit for collecting lottery and gambling prizes is ten thousand euros. (Page 2)
As long as the document contains the information you need, you will likely get an accurate answer from the model.
But you can do better. Right now, the model is using the full text of the document to answer the question. Most questions only require a few sentences from the document.
To answer the “What are the daily transaction limits?”, the model used 3,528 input tokens. While in reality, it only needed less than 500 input tokens.
For small documents such as this one, the difference isn’t a big deal. But when you’re dealing with thousands of documents and potentially millions of tokens, the difference can be significant in terms of costs, latency, and accuracy.
Let’s see how we can use a VectorDB to improve improve this.
RAG with vector search
You’ll need to start by doing two things: defining an embedding function, and creating a VectorDB.
In this example, we’ll use the OpenAIEmbeddingFunction to create embeddings and Chroma to store them.
In this code, you’ve set up the embedding function and created a VectorDB. The embedding function converts chunks of text from the document into vectors. The VectorDB stores these vectors and allows you to query them based on similarity to the question.
Next, you’ll need to split the pages into smaller chunks that you can query the VectorDB with.
Split and index documents
The RecursiveCharacterTextSplitter is a class that splits text into chunks of a specified size. It’s a recursive approach that splits the text into smaller chunks using a hierarchy of delimiters (e.g., "\\n\\n"
, "\n"
, "."
, etc.).
In this example, we’ll use a chunk size of 1,000 characters and an overlap of 200 characters. However, in practice bigger chunks seem to work better. Popular embedding functions can handle up to 8,192 tokens, which is ~32,000 characters. You might want to start there.
This code will split the documents and save those splits into all_splits
. Then you need to add those chunks into your VectorDB.
ChromaDB provides you with a simple way to add chunks to your VectorDB:
This will add the chunks to your VectorDB. In addition to the chunks, this will add the metadata of each chunk and generate unique IDs for each chunk.
Query the database
Once the chunks are in the VectorDB, you can query them with the question.
{'ids': [['4']],
'embeddings': None,
'documents': [["EDICIÓN AQUA PREP 01-01\nBANCO BILBAO VIZCAYA ARGENTARIA, S.A. - Plaza de San Nicolás, 4 - 48005 BILBAO\nReg. Mer. Vizcaya -T omo 3858, Folio 1, Hoja BI-17 BIS-A, Inscripción 1035ª C.I.F.: A48265169\n2 / 4 \n2.4 T ransaction limits. \nThe daily purchase limit will be determined by the Card's \nbalance and up to a maximum of 1,000 euros per day. The \nHolder and the Bank may modify the initially specified limits. \nThe monthly limit for collecting lottery and gambling prizes is \nten thousand euros.\n2.5 T o sign up for the card, you do not need to take out \nany other accessory service.\n3. ON COSTS AND INTEREST AND EXCHANGE RATES\nMonthly top-up limit: Minimum of 6, maximum of 1000\nThe applicable fees for using the card may be:\na) Pre-paid card issue and maintenance fee: 5 euros.\nb) Fee for issuance of duplicates: 4 euros.\nc) Fee for using the card outside the Eurozone: 3% \napplicable to the exchange value in euros.\nd) Fees to withdraw cash against the card balance at ATMs:"]],
'uris': None,
'included': ['metadatas', 'documents', 'distances'],
'data': None,
'metadatas': [[{'page_label': '2',
'source': '../_extras/what-is-rag/bbva.pdf',
'producer': 'Adobe PDF Library 15.0',
'total_pages': 4,
'trapped': '/False',
'creationdate': '2021-03-24T14:51:54+01:00',
'page': 1,
'creator': 'Adobe InDesign 16.1 (Windows)',
'moddate': '2021-03-24T14:51:54+01:00'}]],
'distances': [[0.3241901397705078]]}
You can even query it with multiple questions at once:
collection.query(
query_texts=["What are the daily transaction limits?", "What is the maximum amount I can withdraw?"],
n_results=1,
)
{'ids': [['4'], ['4']],
'embeddings': None,
'documents': [["EDICIÓN AQUA PREP 01-01\nBANCO BILBAO VIZCAYA ARGENTARIA, S.A. - Plaza de San Nicolás, 4 - 48005 BILBAO\nReg. Mer. Vizcaya -T omo 3858, Folio 1, Hoja BI-17 BIS-A, Inscripción 1035ª C.I.F.: A48265169\n2 / 4 \n2.4 T ransaction limits. \nThe daily purchase limit will be determined by the Card's \nbalance and up to a maximum of 1,000 euros per day. The \nHolder and the Bank may modify the initially specified limits. \nThe monthly limit for collecting lottery and gambling prizes is \nten thousand euros.\n2.5 T o sign up for the card, you do not need to take out \nany other accessory service.\n3. ON COSTS AND INTEREST AND EXCHANGE RATES\nMonthly top-up limit: Minimum of 6, maximum of 1000\nThe applicable fees for using the card may be:\na) Pre-paid card issue and maintenance fee: 5 euros.\nb) Fee for issuance of duplicates: 4 euros.\nc) Fee for using the card outside the Eurozone: 3% \napplicable to the exchange value in euros.\nd) Fees to withdraw cash against the card balance at ATMs:"],
["EDICIÓN AQUA PREP 01-01\nBANCO BILBAO VIZCAYA ARGENTARIA, S.A. - Plaza de San Nicolás, 4 - 48005 BILBAO\nReg. Mer. Vizcaya -T omo 3858, Folio 1, Hoja BI-17 BIS-A, Inscripción 1035ª C.I.F.: A48265169\n2 / 4 \n2.4 T ransaction limits. \nThe daily purchase limit will be determined by the Card's \nbalance and up to a maximum of 1,000 euros per day. The \nHolder and the Bank may modify the initially specified limits. \nThe monthly limit for collecting lottery and gambling prizes is \nten thousand euros.\n2.5 T o sign up for the card, you do not need to take out \nany other accessory service.\n3. ON COSTS AND INTEREST AND EXCHANGE RATES\nMonthly top-up limit: Minimum of 6, maximum of 1000\nThe applicable fees for using the card may be:\na) Pre-paid card issue and maintenance fee: 5 euros.\nb) Fee for issuance of duplicates: 4 euros.\nc) Fee for using the card outside the Eurozone: 3% \napplicable to the exchange value in euros.\nd) Fees to withdraw cash against the card balance at ATMs:"]],
'uris': None,
'included': ['metadatas', 'documents', 'distances'],
'data': None,
'metadatas': [[{'source': '../_extras/what-is-rag/bbva.pdf',
'page': 1,
'total_pages': 4,
'creator': 'Adobe InDesign 16.1 (Windows)',
'creationdate': '2021-03-24T14:51:54+01:00',
'moddate': '2021-03-24T14:51:54+01:00',
'producer': 'Adobe PDF Library 15.0',
'page_label': '2',
'trapped': '/False'}],
[{'page_label': '2',
'trapped': '/False',
'source': '../_extras/what-is-rag/bbva.pdf',
'creator': 'Adobe InDesign 16.1 (Windows)',
'total_pages': 4,
'creationdate': '2021-03-24T14:51:54+01:00',
'moddate': '2021-03-24T14:51:54+01:00',
'producer': 'Adobe PDF Library 15.0',
'page': 1}]],
'distances': [[0.3241901397705078], [0.416978657245636]]}
Now, let’s add the VectorDB into our RAG pipeline.
RAG pipeline
First, start by defining a function that does the retrieval of the most relevant documents.
def get_relevant_docs(question: str, top_k: int = 1):
relevant_docs = collection.query(query_texts=question, n_results=top_k)
documents = relevant_docs["documents"][0]
metadatas = relevant_docs["metadatas"][0]
return [
{"page_content": doc, "type": "Document", "metadata": metadata}
for doc, metadata in zip(documents, metadatas)
]
This function will take a question and return the top_k
most relevant chunks from the document. Here’s an example:
[{'page_content': "EDICIÓN AQUA PREP 01-01\nBANCO BILBAO VIZCAYA ARGENTARIA, S.A. - Plaza de San Nicolás, 4 - 48005 BILBAO\nReg. Mer. Vizcaya -T omo 3858, Folio 1, Hoja BI-17 BIS-A, Inscripción 1035ª C.I.F.: A48265169\n2 / 4 \n2.4 T ransaction limits. \nThe daily purchase limit will be determined by the Card's \nbalance and up to a maximum of 1,000 euros per day. The \nHolder and the Bank may modify the initially specified limits. \nThe monthly limit for collecting lottery and gambling prizes is \nten thousand euros.\n2.5 T o sign up for the card, you do not need to take out \nany other accessory service.\n3. ON COSTS AND INTEREST AND EXCHANGE RATES\nMonthly top-up limit: Minimum of 6, maximum of 1000\nThe applicable fees for using the card may be:\na) Pre-paid card issue and maintenance fee: 5 euros.\nb) Fee for issuance of duplicates: 4 euros.\nc) Fee for using the card outside the Eurozone: 3% \napplicable to the exchange value in euros.\nd) Fees to withdraw cash against the card balance at ATMs:",
'type': 'Document',
'metadata': {'producer': 'Adobe PDF Library 15.0',
'creationdate': '2021-03-24T14:51:54+01:00',
'creator': 'Adobe InDesign 16.1 (Windows)',
'moddate': '2021-03-24T14:51:54+01:00',
'page': 1,
'trapped': '/False',
'source': '../_extras/what-is-rag/bbva.pdf',
'page_label': '2',
'total_pages': 4}}]
After you’ve retrieved the relevant chunks, you’d want to combine them into a single string that you can pass to the model. You can use get_context
to do that.
"--- PAGE 1 ---\nEDICIÓN AQUA PREP 01-01\nBANCO BILBAO VIZCAYA ARGENTARIA, S.A. - Plaza de San Nicolás, 4 - 48005 BILBAO\nReg. Mer. Vizcaya -T omo 3858, Folio 1, Hoja BI-17 BIS-A, Inscripción 1035ª C.I.F.: A48265169\n2 / 4 \n2.4 T ransaction limits. \nThe daily purchase limit will be determined by the Card's \nbalance and up to a maximum of 1,000 euros per day. The \nHolder and the Bank may modify the initially specified limits. \nThe monthly limit for collecting lottery and gambling prizes is \nten thousand euros.\n2.5 T o sign up for the card, you do not need to take out \nany other accessory service.\n3. ON COSTS AND INTEREST AND EXCHANGE RATES\nMonthly top-up limit: Minimum of 6, maximum of 1000\nThe applicable fees for using the card may be:\na) Pre-paid card issue and maintenance fee: 5 euros.\nb) Fee for issuance of duplicates: 4 euros.\nc) Fee for using the card outside the Eurozone: 3% \napplicable to the exchange value in euros.\nd) Fees to withdraw cash against the card balance at ATMs:\n\n--- PAGE 2 ---\nBBVA app or website, or via the phone numbers shown on the \ncards, and in any case within a maximum period of thirteen \nmonths after the date of the debit entry.\n5.3 Liability of the Bank in the event of unauthorized \npayment transactions.\nIf an unauthorized payment transaction is carried out, the \nBank will refund the amount of the unauthorized transaction.\n5.4 Liability of the Holder in the event of unauthorized \ntransactions.\nThe Account Holder will be liable for losses arising from \nunauthorized payment transactions made with the Card up \nto a maximum of 50 euros.\nThe Holder will be liable without any limitations in the \nevent of fraud or gross negligence on their part in meeting \ntheir obligations as respects the security credentials and \nsafekeeping if this situation is not reported to the Bank \nwithout delay.\n5.5 Blocking the Card.\nThe Bank reserves the right to block the Card on objectively \njustified grounds related to the security measures taken\n\n--- PAGE 2 ---\nEDICIÓN AQUA PREP 01-01\nBANCO BILBAO VIZCAYA ARGENTARIA, S.A. - Plaza de San Nicolás, 4 - 48005 BILBAO\nReg. Mer. Vizcaya -T omo 3858, Folio 1, Hoja BI-17 BIS-A, Inscripción 1035ª C.I.F.: A48265169\n3 / 4 \nd) Notify the Bank of any loss, theft or copying of the \nCard or misappropriation of the PIN and/or passwords \nwithout undue delay as soon as they become aware \nof it, at any of the Bank's branches during customer \nservice hours or via the phone numbers shown on the \nCard.\n5.2 Notify the Bank of any unauthorized transactions \nor incorrectly executed payment transactions.\nThe Holder must notify the Bank as soon as they become \naware of the posting of any unauthorized transaction to the \nDirect Debit Account of the Card without undue delay at any \nbranch of the Bank during customer service hours, on the \nBBVA app or website, or via the phone numbers shown on the \ncards, and in any case within a maximum period of thirteen \nmonths after the date of the debit entry.\n\n"
This will generate a string similar to the one we used in the previous example.
Finally, you can adapt get_response
to use these new steps in the RAG pipeline.
def get_messages(question: str, relevant_docs: dict):
context_vars = {"question": question, "documents": get_context(relevant_docs)}
messages = [
SystemMessage(content=system_prompt),
HumanMessage(content=user_prompt.format(**context_vars)),
]
return messages
def get_response(question: str):
relevant_docs = get_relevant_docs(question)
messages = get_messages(question, relevant_docs)
response = model.invoke(messages)
return response.content
question = "What are the daily transaction limits?"
response = get_response(question)
print(response)
The daily purchase limit for transactions is determined by the Card's balance and can be up to a maximum of 1,000 euros per day. Additionally, the monthly limit for collecting lottery and gambling prizes is ten thousand euros. The Holder and the Bank may modify the initially specified limits.
(Page 1)
And, you’re done! You’ve built a RAG pipeline that can answer questions about a document.
Conclusion
In this post, you’ve learned about what RAG is, how it works, and how to implement it in Python. You’ve learned why you’d want to use it, and how to do it.
You’ve walked through the process of: - Extracting text from a PDF file - Creating embeddings for the chunks - Storing the embeddings in a VectorDB - Querying the VectorDB to find the most relevant chunks - Using the model to generate a response
Hope you find this article usefl. If you have any questions or comments, put them in the comments section below.
Citation
@online{castillo2025,
author = {Castillo, Dylan},
title = {What Is {Retrieval} {Augmented} {Generation} {(RAG)?}},
date = {2025-06-29},
url = {https://dylancastillo.co/posts/what-is-rag.html},
langid = {en}
}