đźš§ This page needs work
Note: Verify AI-generated output
Ontology RAG Guide
Query structured data using schema-based extraction and typed entities
Ontology RAG (also called “Structured RAG” or “Schema-based RAG”) uses predefined schemas to extract and query typed, structured data from unstructured documents. Unlike basic RAG which retrieves text chunks, Ontology RAG extracts structured objects that conform to your defined schema.
What is Ontology RAG?
Ontology RAG works by:
- Defining schemas for entity types (products, people, events, etc.)
- Extracting objects that match schema definitions
- Storing typed entities with validated structure
- Querying structured data using natural language or GraphQL
- Returning typed results in structured formats
How Ontology RAG Works
Schema Definition → Document Processing → Entity Extraction → Validation → Storage → Structured Query
- Define schema: Specify entity types, fields, and relationships
- Process documents: Extract entities matching schema
- Validate data: Ensure extracted data conforms to schema
- Store objects: Save typed entities in object store
- Query: Use natural language queries or GraphQL
- Return: Get structured JSON/CSV results
Ontology RAG vs. Other Approaches
| Aspect | Document RAG | Graph RAG | Ontology RAG |
|---|---|---|---|
| Output | Text chunks | Relationships | Structured objects |
| Schema | None | Implicit | Explicit types |
| Validation | None | Minimal | Type-checked |
| Query | Semantic | Relationships | Structured |
| Format | Text | Triples | JSON/CSV |
| Best for | Reading | Connections | Data extraction |
When to Use Ontology RAG
âś… Use Ontology RAG when:
- You need structured, typed data extraction
- Output should be in database/spreadsheet format
- You want type validation and consistency
- Need to extract specific entity types (products, people, financial data)
- Building data pipelines or integration workflows
- Require queryable structured data
⚠️ Consider alternatives when:
- Just need semantic search → Use Document RAG
- Need relationship understanding → Use Graph RAG
- Documents don’t have structured data to extract
Prerequisites
Before starting:
- âś… TrustGraph deployed (Quick Start)
- âś… Understanding of structured processing
- âś… SDL (Schema Definition Language) basics
- âś… Documents with extractable structured data
Step-by-Step Guide
Step 1: Define Your Schema
Create a schema using SDL (Schema Definition Language):
Example: Product schema
type Product {
name: String!
price: Float
category: String
manufacturer: String
description: String
}
Example: Person schema
type Person {
name: String!
title: String
organization: String
email: String
location: String
}
Example: Financial data schema
type FinancialRecord {
company: String!
revenue: Float
profit: Float
quarter: String
year: Int
}
Schema guidelines:
- Use
!for required fields - Choose appropriate types: String, Float, Int, Boolean
- Keep schemas focused (one entity type per schema)
- Use clear, descriptive field names
See Schema Guide for complete syntax.
Step 2: Configure Extraction Flow
Set up a flow with your schema:
Using CLI:
# Set the schema
tg-set-schema product-schema products.sdl
# Configure flow to use schema
tg-put-flow-class product-extraction \
--schema product-schema \
--collection products
Schema location: Store schemas in a dedicated directory:
schemas/
├── products.sdl
├── people.sdl
└── financials.sdl
Step 3: Load and Process Documents
Load documents for extraction:
# Load a single document
tg-load-doc --schema product-schema --collection products catalog.pdf
# Load multiple documents
for file in catalogs/*.pdf; do
tg-load-doc --schema product-schema --collection products "$file"
done
Monitor processing:
# Check processing status
tg-show-processor-state
# View extracted objects
tg-invoke-objects-query --collection products "Show all products"
Step 4: Query Structured Data
Natural Language Queries
Using NLP Query (converts natural language to GraphQL):
# Simple query
tg-invoke-nlp-query --collection products "Show all products"
# Filtered query
tg-invoke-nlp-query --collection products "Products over $100"
# Aggregation
tg-invoke-nlp-query --collection products "Average price by category"
# Sorting
tg-invoke-nlp-query --collection products "Top 10 most expensive products"
Direct GraphQL Queries
Using Objects Query:
# Get all products
tg-invoke-objects-query --collection products "{ products { name price } }"
# Filter by criteria
tg-invoke-objects-query --collection products \
"{ products(where: { price: { gt: 100 } }) { name price } }"
# Complex query
tg-invoke-objects-query --collection products \
"{ products(where: { category: \"Electronics\" }, orderBy: price_DESC, limit: 10) { name price manufacturer } }"
API Method
Endpoint: /api/nlp-query or /api/objects-query
NLP Query Request:
{
"query": "Show all products over $100",
"collection": "products",
"format": "json"
}
GraphQL Request:
{
"query": "{ products(where: { price: { gt: 100 } }) { name price } }",
"collection": "products"
}
Response:
{
"data": {
"products": [
{
"name": "Product A",
"price": 150.00
},
{
"name": "Product B",
"price": 200.00
}
]
}
}
Step 5: Export Structured Data
Export to JSON:
tg-invoke-objects-query --collection products \
--format json "{ products { name price } }" > products.json
Export to CSV:
tg-invoke-objects-query --collection products \
--format csv "{ products { name price } }"
Export for analysis:
# Get all data
tg-invoke-objects-query --collection products \
"{ products { name price category manufacturer } }" | jq '.' > all_products.json
# Load into pandas, Excel, or database
Common Patterns
Product Catalog Extraction
Schema:
type Product {
name: String!
sku: String
price: Float
category: String
inStock: Boolean
}
Queries:
# All products
tg-invoke-nlp-query "Show all products"
# Out of stock
tg-invoke-nlp-query "Products that are out of stock"
# Price range
tg-invoke-nlp-query "Products between $50 and $200"
Financial Data Extraction
Schema:
type FinancialRecord {
company: String!
revenue: Float
profit: Float
quarter: String
year: Int
}
Queries:
# Q4 results
tg-invoke-nlp-query "Q4 2024 financial results"
# Profitable companies
tg-invoke-nlp-query "Companies with profit over 1 million"
# Revenue comparison
tg-invoke-nlp-query "Compare revenue across companies"
People/Contact Extraction
Schema:
type Person {
name: String!
title: String
organization: String
email: String
phone: String
}
Queries:
# All contacts
tg-invoke-nlp-query "Show all people"
# By organization
tg-invoke-nlp-query "People at Acme Corp"
# By title
tg-invoke-nlp-query "All CEOs"
Event/Meeting Extraction
Schema:
type Meeting {
title: String!
date: String
attendees: [String]
location: String
agenda: String
}
Queries:
# Upcoming meetings
tg-invoke-nlp-query "Meetings in December"
# By attendee
tg-invoke-nlp-query "Meetings with John Smith"
Advanced Usage
Complex Schemas
Nested objects:
type Company {
name: String!
headquarters: Address
revenue: Float
}
type Address {
street: String
city: String
country: String
}
Arrays:
type Product {
name: String!
categories: [String]
tags: [String]
prices: [PricePoint]
}
type PricePoint {
amount: Float!
currency: String!
date: String
}
Combining with Other RAG Types
Use Ontology RAG + Graph RAG:
# Extract structured data
tg-invoke-nlp-query "All products"
# Understand relationships
tg-invoke-graph-rag "How are products related to manufacturers?"
Use Ontology RAG + Document RAG:
# Get structured data
tg-invoke-nlp-query "Q4 revenue by company"
# Get context/explanation
tg-invoke-document-rag "Why did Q4 revenue increase?"
Validation and Quality Control
Check extraction quality:
# Count extracted objects
tg-invoke-objects-query "{ products { count } }"
# Sample extracted data
tg-invoke-objects-query "{ products(limit: 10) { name price } }"
# Check for missing fields
tg-invoke-nlp-query "Products without prices"
Improve extraction:
- Refine schema definitions
- Improve extraction prompts
- Add validation rules
- Use better source documents
Troubleshooting
No Objects Extracted
Problem: Schema defined but no objects extracted
Solutions:
- Verify schema is loaded:
tg-show-schemas - Check processing status:
tg-show-processor-state - Review extraction prompt configuration
- Ensure documents contain relevant data
- Check logs for extraction errors
Incorrect Field Values
Problem: Extracted data has wrong types or values
Solutions:
- Refine schema field types
- Add field descriptions to guide extraction
- Improve source document quality
- Adjust extraction prompt
- Add validation rules
Query Returns No Results
Problem: NLP queries return empty results
Solutions:
- Verify objects exist:
tg-invoke-objects-query "{ products { count } }" - Check collection name is correct
- Try direct GraphQL query first
- Simplify natural language query
- Check field names match schema
Poor NLP Query Translation
Problem: Natural language doesn’t convert to correct GraphQL
Solutions:
- Use more explicit field names in query
- Try direct GraphQL query instead
- Add more context to natural language
- Use simpler query structure
- Check NLP query examples
Schema Best Practices
Schema Design
Keep schemas focused:
- âś… One entity type per schema
- âś… Clear, descriptive field names
- âś… Appropriate data types
- âś… Required fields marked with
!
Avoid:
- ❌ Overly complex nested structures
- ❌ Ambiguous field names
- ❌ Too many optional fields
- ❌ Mixing multiple entity types
Field Naming
Good names:
- âś…
firstNameandlastName(specific) - âś…
priceUSD(includes unit) - âś…
publishedDate(clear type) - âś…
isActive(boolean convention)
Poor names:
- ❌
data(too generic) - ❌
value(ambiguous) - ❌
field1(meaningless) - ❌
info(vague)
Type Selection
| Data | SDL Type | Example |
|---|---|---|
| Text | String | “Product Name” |
| Number (int) | Int | 42 |
| Number (decimal) | Float | 19.99 |
| True/False | Boolean | true |
| List | [String] | [“tag1”, “tag2”] |
| Date | String | “2024-12-01” |
Comparing Approaches
When to Use Each
| Need | Use This |
|---|---|
| Semantic search | Document RAG |
| Relationship queries | Graph RAG |
| Structured extraction | Ontology RAG |
| “Tell me about X” | Document RAG |
| “How is X related to Y” | Graph RAG |
| “Extract all X entities” | Ontology RAG |
| Text summaries | Document RAG |
| Connected information | Graph RAG |
| Database-like queries | Ontology RAG |
Combined Workflow
# 1. Extract structured data (Ontology RAG)
tg-invoke-nlp-query "All products over $100" > products.json
# 2. Understand relationships (Graph RAG)
tg-invoke-graph-rag "How are these products related to manufacturers?"
# 3. Get detailed context (Document RAG)
tg-invoke-document-rag "Detailed specifications for product X"
Next Steps
Explore Related Guides
- Schema Definition - Complete SDL syntax
- Document RAG - Semantic search basics
- Graph RAG - Relationship-aware retrieval
Advanced Features
- Agent Extraction - AI-powered extraction workflows
- Object Extraction - Domain-specific extraction patterns
- Structured Processing - Complete structured data workflow
API Integration
- NLP Query API - Natural language query API
- Objects Query API - GraphQL query API
- CLI Reference - Command-line tools
Related Resources
- SDL Reference - Schema definition language
- Structured Query - Query concepts
- Examples - Code samples
- Troubleshooting - Common issues