Advanced RAG
Graph RAG — Structured Knowledge for Better Reasoning
Knowledge graphs provide structured, relational knowledge that complements unstructured text retrieval. Graph RAG combines graph-based reasoning with LLMs for multi-hop, complex question answering.
- Entity Extraction — Identify entities and relationships from text
- Graph Construction — Build knowledge graphs from documents
- Community Detection — Find thematic clusters for global queries
The best retrieval understands not just what was said, but how things are connected.
Graph RAG and Knowledge Graphs
Traditional RAG retrieves text chunks based on semantic similarity. Graph RAG retrieves based on relationships between entities, enabling multi-hop reasoning and global understanding that flat text retrieval cannot achieve.
DfGraph RAG
Graph RAG combines knowledge graph structure with retrieval-augmented generation. It extracts entities and relationships from documents, builds a graph, and uses graph traversal and community detection to retrieve contextually relevant information for complex queries.
Knowledge Graph Construction
Entity and Relationship Extraction
from openai import OpenAI
def extract_entities_and_relations(text, model="gpt-4"):
prompt = f"""Extract all entities and relationships from this text.
Text: {text}
Return JSON format:
{{
"entities": [
{{"name": "Entity Name", "type": "PERSON|ORG|LOCATION|CONCEPT", "description": "Brief description"}}
],
"relations": [
{{"source": "Entity1", "target": "Entity2", "relation": "relationship type", "weight": 0.0-1.0}}
]
}}"""
response = OpenAI().chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"}
)
return json.loads(response.choices[0].message.content)
Graph Construction
import networkx as nx
class KnowledgeGraph:
def __init__(self):
self.graph = nx.DiGraph()
def add_document(self, doc_id, text):
"""Extract and add entities from a document."""
extraction = extract_entities_and_relations(text)
for entity in extraction["entities"]:
self.graph.add_node(
entity["name"],
type=entity["type"],
description=entity["description"],
documents=[doc_id]
)
for relation in extraction["relations"]:
if self.graph.has_edge(relation["source"], relation["target"]):
# Update weight
self.graph[relation["source"]][relation["target"]]["weight"] += relation["weight"]
else:
self.graph.add_edge(
relation["source"],
relation["target"],
relation=relation["relation"],
weight=relation["weight"]
)
def find_path(self, entity1, entity2, max_hops=3):
"""Find shortest path between two entities."""
try:
path = nx.shortest_path(self.graph, entity1, entity2)
return path[:max_hops + 1]
except nx.NetworkXNoPath:
return None
def get_entity_context(self, entity_name, max_hops=2):
"""Get all related entities and their descriptions."""
if entity_name not in self.graph:
return ""
context = []
for neighbor in nx.neighbors(self.graph, entity_name):
edge_data = self.graph[entity_name][neighbor]
node_data = self.graph.nodes[neighbor]
context.append(
f"{entity_name} --[{edge_data['relation']}]--> {neighbor} "
f"({node_data.get('description', '')})"
)
return "\n".join(context)
Community Detection for Global Queries
DfGraph Communities
Graph communities are groups of densely connected nodes that represent thematic clusters. For global queries (e.g., "What are the main themes in this dataset?"), community summaries provide comprehensive coverage.
import community as community_louvain
def detect_communities(knowledge_graph):
"""Detect communities in the knowledge graph."""
partition = community_louvain.best_partition(knowledge_graph.graph.to_undirected())
communities = {}
for node, comm_id in partition.items():
if comm_id not in communities:
communities[comm_id] = []
communities[comm_id].append(node)
return communities
def summarize_community(community_nodes, knowledge_graph, llm):
"""Generate a summary for a community."""
node_descriptions = []
for node in community_nodes:
desc = knowledge_graph.graph.nodes[node].get("description", "")
node_descriptions.append(f"{node}: {desc}")
prompt = f"""Summarize the following group of related entities and their relationships:
{chr(10).join(node_descriptions)}
Provide a concise summary of the key themes and relationships:"""
response = llm.generate(prompt)
return response
Microsoft's GraphRAG paper (Edge et al., 2024) demonstrated that community-based summarization significantly outperforms traditional RAG for global queries that require understanding of the entire dataset.
Graph-Based Retrieval Strategies
Local Retrieval
DfLocal Graph Retrieval
Local retrieval finds entities mentioned in the query and retrieves their immediate neighborhood (1-2 hops) from the knowledge graph. Best for specific factual questions.
def local_retrieval(query, knowledge_graph, llm):
# Extract entities from query
query_entities = extract_entities_from_query(query, llm)
context_parts = []
for entity in query_entities:
# Get entity neighborhood
context = knowledge_graph.get_entity_context(entity, max_hops=2)
if context:
context_parts.append(context)
return "\n\n".join(context_parts)
Global Retrieval
DfGlobal Graph Retrieval
Global retrieval identifies which communities are relevant to the query and retrieves community summaries. Best for broad, thematic questions.
def global_retrieval(query, communities, community_summaries, llm):
# Identify relevant communities
relevant_comms = []
for comm_id, summary in community_summaries.items():
relevance = compute_relevance(query, summary, llm)
if relevance > 0.5:
relevant_comms.append((comm_id, summary))
# Combine summaries
context = "\n\n".join([s for _, s in relevant_comms])
return context
Hybrid Retrieval
def hybrid_graph_retrieval(query, knowledge_graph, text_index, llm):
"""Combine graph and text retrieval."""
# Graph retrieval for entities
graph_context = local_retrieval(query, knowledge_graph, llm)
# Text retrieval for supporting details
text_context = text_index.search(query, top_k=5)
# Combine and rank
combined_context = f"Graph Knowledge:\n{graph_context}\n\nSupporting Text:\n{text_context}"
return combined_context
Graph RAG vs Standard RAG
| Feature | Standard RAG | Graph RAG |
|---|---|---|
| Retrieval basis | Semantic similarity | Entity relationships |
| Multi-hop reasoning | Limited | Excellent |
| Global queries | Poor | Excellent |
| Entity relationships | Implicit | Explicit |
| Indexing complexity | Low | High |
| Query latency | Low | Medium |
| Best for | Specific facts | Complex reasoning |
Graph RAG is most valuable when queries require understanding relationships between entities (e.g., "How does Company A's acquisition of Company B affect their competitors?"). For simple factual retrieval, standard RAG is often sufficient.
Practice Exercises
-
Graph Construction: Build a knowledge graph from 10 Wikipedia articles about a specific topic. Extract entities, relationships, and identify communities.
-
Multi-Hop Query: Using your knowledge graph, answer a 3-hop question: "Entity A is related to Entity B, which is located in Entity C. What companies operate in C?"
-
Community Summarization: Detect communities in your graph and generate summaries. How well do community summaries capture the main themes?
-
Hybrid Retrieval: Compare standard RAG vs Graph RAG on a set of queries that require multi-hop reasoning. What is the performance difference?
Key Takeaways
Summary: Graph RAG and Knowledge Graphs
- Knowledge graphs provide structured, relational knowledge
- Entity extraction identifies entities and relationships from text
- Community detection finds thematic clusters for global queries
- Local retrieval finds entity neighborhoods for specific questions
- Global retrieval uses community summaries for broad questions
- Hybrid retrieval combines graph and text for best coverage
- Graph RAG excels at multi-hop reasoning and global understanding
- Standard RAG is sufficient for simple factual retrieval
What to Learn Next
-> RAG System Design Advanced RAG architecture and design patterns.
-> Retrieval-Augmented Generation RAG fundamentals and basic implementation.
-> Multi-Modal RAG RAG with images, audio, and other modalities.
-> Self-RAG and Adaptive Retrieval When to retrieve and when to rely on parametric knowledge.
-> Graph Neural Networks Learning on graph-structured data.
-> Agentic RAG Systems Agent-based approaches to retrieval and generation.