Building a Smart Retriever in LangChain using Gemini Embeddings and ChromaDB

SShahab Afridy

October 23, 2025

4 min read

240 views

Retrieval-Augmented Generation (RAG) is one of the most exciting and practical applications of AI today allowing chatbots and assistants to “recall” facts from custom data instead of relying only on model memory.

In this step-by-step tutorial, we’ll explore how to build a Retriever system using LangChain, Google Gemini embeddings, and ChromaDB, a fast and lightweight vector store.

By the end of this guide, you’ll understand how retrieval works, how to index data, and how to use it to power intelligent, context-aware applications.

What You’ll Learn

What a Retriever is and how it fits in the RAG pipeline
How embeddings work with Gemini
How to create and query a Chroma vector store
How to retrieve the most relevant documents for your queries

Understanding Retrievers in LangChain

A retriever is a LangChain component that finds relevant pieces of information from a knowledge base when given a query.
It doesn’t generate text it retrieves it.

It’s typically used in the RAG pipeline, where the process looks like this:

User Query → Embedding → Retriever → Relevant Documents → LLM → Final Answer

Think of it as your AI’s memory search engine before answering a question, the retriever looks up relevant facts from your stored documents.

Step 1: Setting Up the Environment

Let’s start by preparing our workspace.

mkdir langchain_retriever
cd langchain_retriever
python -m venv venv
source venv/bin/activate    # Mac/Linux
venv\Scripts\activate       # Windows

Now install the required dependencies:

pip install langchain langchain-google-genai chromadb python-dotenv

Step 2: Set Up Your .env File

Create a .env file to store your Google API Key securely:

GOOGLE_API_KEY=your_api_key_here

Note:
Never upload your .env file to GitHub.
Instead, create a .gitignore file and add this line:

.env

This ensures your API key remains private and safe.

Step 3: Creating and Embedding Your Documents

Let’s create some sample documents to test our retriever.

from langchain.docstore.document import Document

documents = [
    Document(page_content="LangChain helps developers build LLM applications easily."),
    Document(page_content="Chroma is a vector database optimized for LLM-based search."),
    Document(page_content="Embeddings convert text into high-dimensional vectors."),
    Document(page_content="Google Gemini provides powerful embedding models."),
]

Now we’ll create the Gemini Embedding Model:

from langchain_google_genai import GoogleGenerativeAIEmbeddings
from dotenv import load_dotenv
import os

load_dotenv()

embedding_model = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
    google_api_key=os.getenv("GOOGLE_API_KEY")
)

This embedding model will convert each document into numeric vector form so it can be stored and compared in the vector database.

Step 4: Building the Vector Store (ChromaDB)

Now, let’s store and index these embeddings in ChromaDB:

from langchain_community.vectorstores import Chroma

vectorstore = Chroma.from_documents(
    documents=documents,
    embedding=embedding_model,
    collection_name="my_collection"
)

Here’s what’s happening:

Each document is converted to a high-dimensional vector.
The vectors are stored in Chroma under the collection my_collection.

Step 5: Creating the Retriever

We now turn our vector store into a retriever:

retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

This means that for each query, the retriever will return the top 2 most similar documents based on cosine similarity.

Step 6: Querying the Retriever

Now, let’s see it in action:

query = "What does Chroma do?"
retrieved_docs = retriever.get_relevant_documents(query)

for i, doc in enumerate(retrieved_docs, 1):
    print(f"Document {i}: {doc.page_content}\n")

Output might look like:

Document 1: Chroma is a vector database optimized for LLM-based search.

Document 2: Embeddings convert text into high-dimensional vectors.

This shows the retriever found and ranked the most relevant chunks for your query all powered by Gemini embeddings.

Step 7: Combine Retriever with an LLM

To create a full RAG pipeline, you can feed these retrieved documents to a Gemini chat model:

from langchain.chains import RetrievalQA
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash", api_key=os.getenv("GOOGLE_API_KEY"))

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever
)

response = qa_chain.invoke({"query": "What is Chroma in LangChain?"})
print(response["result"])

Now the model doesn’t just “guess” it retrieves and then reasons over your actual data.

Output might look like:

Based on the provided context:

*   **Chroma** is a vector database optimized for LLM-based search.
*   **LangChain** helps developers build LLM applications easily.

The provided context does not specify what Chroma is *in* LangChain, or how they are directly related within the LangChain framework.

Common Mistakes to Avoid

Forgetting to load your (.env) file (load_dotenv) is essential)
Using mismatched embedding and vectorstore dimensions
Forgetting to call .as_retriever() before querying

Conclusion

You’ve just built a working Retriever system using:

LangChain for orchestration
Gemini embeddings for representation
ChromaDB for efficient search

Retrievers are the backbone of any intelligent, data-aware AI application.
From here, you can experiment with:

Multi-query retrievers
Context compression
Document chunking and metadata filtering

Together, LangChain and Gemini make RAG systems simpler, faster, and smarter.