đźš§ This page needs work
Note: Verify AI-generated output
Document RAG Guide
Query documents using vector embeddings and semantic search
Document RAG (also called “basic RAG”, “naive RAG”, or simply “RAG”) is a retrieval-augmented generation approach that uses vector embeddings to find relevant document chunks and provides them as context to an LLM for generating responses.
What is Document RAG?
Document RAG works by:
- Chunking documents into smaller pieces
- Embedding each chunk as a vector
- Storing vectors in a vector database
- Retrieving similar chunks based on query embedding
- Generating responses using retrieved context
When to Use Document RAG
âś… Use Document RAG when:
- You need semantic search over documents
- Questions can be answered from isolated passages
- You want simple, fast implementation
- Document context is self-contained
⚠️ Consider alternatives when:
- You need to understand relationships between entities → Use Graph RAG
- You need structured schema-based extraction → Use Ontology RAG
- Answers require connecting information across documents → Use Graph RAG
Prerequisites
Before starting:
- âś… TrustGraph deployed (Quick Start)
- âś… Understanding of Core Concepts
- âś… Documents ready to load
Step-by-Step Guide
Step 1: Prepare Your Documents
TrustGraph supports multiple document formats:
- PDF files (
.pdf) - Text files (
.txt) - Markdown (
.md) - HTML (
.html)
Best practices:
- Keep documents focused on specific topics
- Use clear formatting and structure
- Remove unnecessary metadata or headers
- Ensure text is extractable (not scanned images)
Step 2: Configure Document Processing
Configure chunking parameters in your flow:
Chunk Size: Number of characters per chunk
- Small (500-800): Better precision, more chunks
- Medium (1000-1500): Balanced approach (recommended)
- Large (2000-3000): More context, fewer chunks
Chunk Overlap: Characters shared between consecutive chunks
- Typical: 50-100 characters
- Purpose: Ensures context continuity at boundaries
Example configuration:
chunker:
type: recursive
chunk_size: 1000
overlap: 50
Step 3: Load Documents
Using CLI
Load a single PDF:
tg-load-pdf my-document.pdf
Load from a directory:
for file in documents/*.pdf; do
tg-load-pdf "$file"
done
Load with specific collection:
tg-load-pdf --collection my-project document.pdf
Using the Workbench
- Navigate to Library page at
http://localhost:8888 - Click Upload or drag-and-drop documents
- Documents appear in the library
- Select documents and click Submit
- Choose a processing flow
- Click Submit to start processing
Step 4: Process Documents
Documents must be processed to create embeddings:
Using CLI:
# Check flow status
tg-show-flows
# Start the default flow
tg-start-flow default-flow
# Monitor processing
tg-show-processor-state
Using Workbench:
- Go to Library page
- Select unprocessed documents
- Click Submit in action bar
- Select processing flow
- Click Submit
Monitor in Grafana:
- Access
http://localhost:3000 - Watch processing backlog
- Track chunk embeddings created
- Monitor LLM token usage
Step 5: Query Using Document RAG
CLI Method
Basic query:
tg-invoke-document-rag "What is the main topic of these documents?"
Query specific collection:
tg-invoke-document-rag --collection my-project "Summarize the key findings"
Adjust number of retrieved chunks:
tg-invoke-document-rag --limit 5 "What are the main conclusions?"
API Method
Endpoint: /api/document-rag
Request:
{
"query": "What is the main topic?",
"collection": "my-project",
"limit": 3
}
Response:
{
"answer": "The main topic is...",
"sources": [
{
"text": "Relevant chunk...",
"score": 0.85,
"document": "document-name.pdf"
}
]
}
Workbench Method
- Navigate to Document RAG tab
- Select collection (optional)
- Enter your question
- Click Submit
- View answer and source chunks
- Click sources to see context
Step 6: Verify and Refine
Check retrieval quality:
# View vector search results
tg-invoke-vector-search "your query term"
Tune parameters if needed:
- Increase chunk size if answers lack context
- Decrease chunk size if results are too broad
- Adjust overlap if context boundaries are poor
- Increase retrieval limit if missing relevant information
Understanding Document RAG Results
Source Attribution
Document RAG returns:
- Answer: LLM-generated response
- Sources: Retrieved chunks used for context
- Scores: Similarity scores for each chunk
- Documents: Origin documents for each chunk
Confidence Indicators
High confidence (score > 0.8):
- Query closely matches document content
- Retrieved chunks directly relevant
Medium confidence (score 0.6-0.8):
- Semantic similarity present
- May need broader context
Low confidence (score < 0.6):
- Weak match to query
- Consider query reformulation
Common Patterns
Multi-Document Search
Query across all documents:
tg-invoke-document-rag "What trends appear across all reports?"
Collection-Specific Queries
Query within a specific project:
tg-invoke-document-rag --collection project-2024 "What are the Q4 results?"
Iterative Refinement
Start broad, then narrow:
# Broad query
tg-invoke-document-rag "What topics are covered?"
# Focused follow-up
tg-invoke-document-rag "Explain the methodology in detail"
Troubleshooting
Poor Retrieval Quality
Problem: Irrelevant chunks retrieved
Solutions:
- Verify documents processed successfully:
tg-show-processor-state - Check embedding quality:
tg-invoke-vector-search "test query" - Adjust chunk size in flow configuration
- Reformulate query for better semantic match
Missing Context
Problem: Answers lack necessary context
Solutions:
- Increase chunk size (e.g., 1000 → 1500)
- Increase retrieval limit (more chunks)
- Increase chunk overlap (50 → 100)
- Use Graph RAG for relationship-based context
Slow Queries
Problem: Document RAG queries take too long
Solutions:
- Reduce number of documents in collection
- Optimize vector database configuration
- Use more powerful hardware
- Consider indexing strategies
Empty Results
Problem: No results returned
Solutions:
- Verify documents are processed:
tg-show-processor-state - Check collection name is correct
- Verify embeddings created:
tg-show-graph - Check for processing errors in logs
Advanced Configuration
Custom Embedding Models
Configure different embedding models in your flow:
embeddings:
model: sentence-transformers/all-mpnet-base-v2
dimension: 768
Popular choices:
all-mpnet-base-v2: Balanced quality/speed (768d)all-MiniLM-L6-v2: Fast, smaller (384d)bge-large-en: High quality (1024d)
Retrieval Tuning
Adjust retrieval parameters:
# Get more context (more chunks)
tg-invoke-document-rag --limit 10 "query"
# Focus on top matches (fewer chunks)
tg-invoke-document-rag --limit 2 "query"
Collection Management
Create collection:
tg-set-collection my-project
List collections:
tg-list-collections
Delete collection:
tg-delete-collection my-project
Document RAG vs. Other Approaches
| Aspect | Document RAG | Graph RAG | Ontology RAG |
|---|---|---|---|
| Retrieval | Vector similarity | Graph relationships | Schema-based |
| Context | Isolated chunks | Connected entities | Structured data |
| Best for | Semantic search | Complex relationships | Typed extraction |
| Setup | Simple | Medium | Complex |
| Speed | Fast | Medium | Medium |
Use multiple approaches:
- Document RAG for quick semantic search
- Graph RAG when relationships matter
- Ontology RAG for structured extraction
Next Steps
Explore Other RAG Types
- Graph RAG - Leverage knowledge graph relationships
- Ontology RAG - Use structured schemas for extraction
Advanced Features
- Structured Processing - Extract typed objects
- Agent Extraction - AI-powered extraction workflows
- Object Extraction - Domain-specific extraction
API Integration
- Document RAG API - API reference
- CLI Reference - Command-line tools
- Examples - Code samples
Related Resources
- Core Concepts - Understanding embeddings and chunks
- Vector Search - How semantic search works
- Deployment - Scaling for production
- Troubleshooting - Common issues