Building a Context-Aware Retriever in LangChain using Gemini and FAISS

SShahab Afridy
October 28, 2025
5 min read
117 views

Retrieval-Augmented Generation (RAG) systems are only as powerful as the retrievers that feed them information.
A good retriever doesn’t just find data it filters, compresses, and contextualizes it to help the language model focus on what really matters.

In this tutorial, you’ll learn how to build a Contextual Compression Retriever in LangChain, powered by Gemini embeddings, FAISS, and Gemini Flash as the compressor LLM.

This method ensures that your chatbot or AI assistant retrieves only the most relevant text segments cutting out noise while keeping context intact.

What You’ll Learn

By following this tutorial, you’ll understand:

  • How FAISS retrieves text chunks efficiently

  • How to use Gemini Embeddings to represent documents

  • How Contextual Compression works in LangChain

  • How to combine a retriever with Gemini Flash for better accuracy

  • How to integrate the retriever with an LLM for Q&A

compressions

1. What is Contextual Compression?

When you run a normal retriever (like FAISS or Chroma), it fetches the top-k most similar documents.
But often, those documents are long and contain irrelevant sentences.

For example:
If you ask “What is photosynthesis?” a document about “The Grand Canyon and photosynthesis” might appear, but only one line in it is relevant.

A Contextual Compression Retriever solves this by combining two steps:

  1. Retrieve top documents using a standard retriever (FAISS in our case).

  2. Compress them using a language model extracting only the relevant parts related to the query.

The result is a clean, focused, and context-rich set of documents ready for the LLM.

2. Setting Up Your Environment

You’ll need the following packages installed:

pip install langchain langchain-community langchain-google-genai faiss-cpu python-dotenv

Make sure your .env file contains your Google API key:

GOOGLE_API_KEY=your_google_api_key_here

Note: If you’re pushing code to GitHub, include .env in your .gitignore to keep your API keys private

3. Importing the Required Libraries

We’ll use LangChain’s retriever components, FAISS for vector search, and Gemini for embeddings and LLM responses.

from langchain_community.vectorstores import FAISS
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_core.documents import Document
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from dotenv import load_dotenv
import os

load_dotenv()

4. Creating Example Documents

We’ll create a few sample text documents that mix multiple topics only some of which are related to photosynthesis.
This will show how the retriever focuses on relevant text.

docs = [
    Document(page_content=(
        """The Grand Canyon is one of the most visited natural wonders in the world.
        Photosynthesis is the process by which green plants convert sunlight into energy.
        Millions of tourists travel to see it every year. The rocks date back millions of years."""
    ), metadata={"source": "Doc1"}),

    Document(page_content=(
        """In medieval Europe, castles were built primarily for defense.
        The chlorophyll in plant cells captures sunlight during photosynthesis.
        Knights wore armor made of metal. Siege weapons were often used to breach castle walls."""
    ), metadata={"source": "Doc2"}),

    Document(page_content=(
        """Basketball was invented by Dr. James Naismith in the late 19th century.
        It was originally played with a soccer ball and peach baskets. NBA is now a global league."""
    ), metadata={"source": "Doc3"}),

    Document(page_content=(
        """The history of cinema began in the late 1800s. Silent films were the earliest form.
        Thomas Edison was among the pioneers. Photosynthesis does not occur in animal cells.
        Modern filmmaking involves complex CGI and sound design."""
    ), metadata={"source": "Doc4"})
]

5. Creating the FAISS Vector Store

We’ll use Gemini embeddings to convert the text into numerical vectors, which FAISS can index and search efficiently.

embedding_model = GoogleGenerativeAIEmbeddings(
    model="models/gemini-embedding-001",
    google_api_key=os.getenv("GOOGLE_API_KEY")
)

vectorstore = FAISS.from_documents(docs, embedding_model)
base_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

Here:

  • FAISS stores the document embeddings.

  • k=5 retrieves the top 5 most similar chunks for a query.

context

6. Creating the Compressor with Gemini Flash

The compressor uses a language model to extract only the parts of the retrieved text that directly answer your question.

llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash-lite",
    api_key=os.getenv("GOOGLE_API_KEY")
)

compressor = LLMChainExtractor.from_llm(llm)

Gemini Flash is fast, efficient, and ideal for contextual summarization tasks like this.

7. Building the Contextual Compression Retriever

Now, we combine the FAISS retriever with our LLM-based compressor.

compression_retriever = ContextualCompressionRetriever(
    base_retriever=base_retriever,
    base_compressor=compressor
)

8. Querying the Retriever

Let’s test it with a query about photosynthesis.

query = "What is photosynthesis?"
compressed_results = compression_retriever.invoke(query)

print("\n===== Contextually Compressed Results =====")
for i, doc in enumerate(compressed_results):
    print(f"\n--- Result {i+1} ---")
    print(doc.page_content)

Example Output

--- Result 1 ---
Photosynthesis is the process by which green plants convert sunlight into energy.

--- Result 2 ---
The chlorophyll in plant cells captures sunlight during photosynthesis.

--- Result 3 ---
Photosynthesis does not occur in animal cells.

The retriever correctly extracts only the relevant text segments and ignores unrelated parts like castles or basketball.

Compression

9. Integrating with an LLM for Q&A

Finally, we’ll connect this retriever to a RetrievalQA chain, allowing the LLM to generate a final summarized answer.

from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=compression_retriever,
    chain_type="stuff"
)

response = qa_chain.invoke({"query": "Explain photosynthesis simply."})

print("\n===== LLM Response =====")
print(response["result"])

Example Output

Photosynthesis is the process through which plants use sunlight, water, and carbon dioxide to produce their own food and release oxygen.

This shows how Gemini efficiently uses compressed, relevant context for an accurate and simple explanation.

10. How It All Works Together

  1. Documents are embedded using Gemini embeddings.

  2. FAISS retrieves the top relevant chunks.

  3. Gemini Flash (LLM) compresses those chunks contextually.

  4. The QA chain uses this clean, compressed text to generate the answer.

This architecture improves efficiency, accuracy, and interpretability a key advantage for scalable AI systems.

Conclusion

You’ve now built a context-aware retriever pipeline using LangChain, FAISS, and Gemini.
Unlike a traditional retriever, this setup doesn’t just fetch it understands and filters context, ensuring your language model sees only what’s truly relevant.

This same principle powers modern RAG-based chat systems and intelligent assistants.

Comments (0)

Join the conversation

Sign in to share your thoughts and engage with other readers.

No comments yet

Be the first to share your thoughts!