Building a Smart Retriever in LangChain using Gemini Embeddings and ChromaDB

Retrieval-Augmented Generation (RAG) is one of the most exciting and practical applications of AI today allowing chatbots and assistants to “recall” facts from custom data instead of relying only on model memory.
In this step-by-step tutorial, we’ll explore how to build a Retriever system using LangChain, Google Gemini embeddings, and ChromaDB, a fast and lightweight vector store.
By the end of this guide, you’ll understand how retrieval works, how to index data, and how to use it to power intelligent, context-aware applications.
What You’ll Learn
What a Retriever is and how it fits in the RAG pipeline
How embeddings work with Gemini
How to create and query a Chroma vector store
How to retrieve the most relevant documents for your queries

Understanding Retrievers in LangChain
A retriever is a LangChain component that finds relevant pieces of information from a knowledge base when given a query.
It doesn’t generate text it retrieves it.
It’s typically used in the RAG pipeline, where the process looks like this:
User Query → Embedding → Retriever → Relevant Documents → LLM → Final AnswerThink of it as your AI’s memory search engine before answering a question, the retriever looks up relevant facts from your stored documents.
Step 1: Setting Up the Environment
Let’s start by preparing our workspace.
mkdir langchain_retriever
cd langchain_retriever
python -m venv venv
source venv/bin/activate # Mac/Linux
venv\Scripts\activate # Windows
Now install the required dependencies:
pip install langchain langchain-google-genai chromadb python-dotenv
Step 2: Set Up Your .env File
Create a .env file to store your Google API Key securely:
GOOGLE_API_KEY=your_api_key_hereNote:
Never upload your .env file to GitHub.
Instead, create a .gitignore file and add this line:
.envThis ensures your API key remains private and safe.
Step 3: Creating and Embedding Your Documents
Let’s create some sample documents to test our retriever.
from langchain.docstore.document import Document
documents = [
Document(page_content="LangChain helps developers build LLM applications easily."),
Document(page_content="Chroma is a vector database optimized for LLM-based search."),
Document(page_content="Embeddings convert text into high-dimensional vectors."),
Document(page_content="Google Gemini provides powerful embedding models."),
]
Now we’ll create the Gemini Embedding Model:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from dotenv import load_dotenv
import os
load_dotenv()
embedding_model = GoogleGenerativeAIEmbeddings(
model="models/embedding-001",
google_api_key=os.getenv("GOOGLE_API_KEY")
)
This embedding model will convert each document into numeric vector form so it can be stored and compared in the vector database.

Step 4: Building the Vector Store (ChromaDB)
Now, let’s store and index these embeddings in ChromaDB:
from langchain_community.vectorstores import Chroma
vectorstore = Chroma.from_documents(
documents=documents,
embedding=embedding_model,
collection_name="my_collection"
)
Here’s what’s happening:
Each document is converted to a high-dimensional vector.
The vectors are stored in Chroma under the collection my_collection.
Step 5: Creating the Retriever
We now turn our vector store into a retriever:
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})
This means that for each query, the retriever will return the top 2 most similar documents based on cosine similarity.
Step 6: Querying the Retriever
Now, let’s see it in action:
query = "What does Chroma do?"
retrieved_docs = retriever.get_relevant_documents(query)
for i, doc in enumerate(retrieved_docs, 1):
print(f"Document {i}: {doc.page_content}\n")
Output might look like:
Document 1: Chroma is a vector database optimized for LLM-based search.
Document 2: Embeddings convert text into high-dimensional vectors.
This shows the retriever found and ranked the most relevant chunks for your query all powered by Gemini embeddings.
Step 7: Combine Retriever with an LLM
To create a full RAG pipeline, you can feed these retrieved documents to a Gemini chat model:
from langchain.chains import RetrievalQA
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash", api_key=os.getenv("GOOGLE_API_KEY"))
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever
)
response = qa_chain.invoke({"query": "What is Chroma in LangChain?"})
print(response["result"])
Now the model doesn’t just “guess” it retrieves and then reasons over your actual data.
Output might look like:
Based on the provided context:
* **Chroma** is a vector database optimized for LLM-based search.
* **LangChain** helps developers build LLM applications easily.
The provided context does not specify what Chroma is *in* LangChain, or how they are directly related within the LangChain framework.Common Mistakes to Avoid
Forgetting to load your (.env) file (load_dotenv) is essential)
Using mismatched embedding and vectorstore dimensions
Forgetting to call .as_retriever() before querying
Conclusion
You’ve just built a working Retriever system using:
LangChain for orchestration
Gemini embeddings for representation
ChromaDB for efficient search
Retrievers are the backbone of any intelligent, data-aware AI application.
From here, you can experiment with:
Multi-query retrievers
Context compression
Document chunking and metadata filtering
Together, LangChain and Gemini make RAG systems simpler, faster, and smarter.
Join the conversation
Sign in to share your thoughts and engage with other readers.
No comments yet
Be the first to share your thoughts!