Querying Structured Data
Learn how to process documents and extract structured data using TrustGraph’s schema-based extraction capabilities.
This feature was introduced in TrustGraph 1.3.
Overview
TrustGraph provides capabilities for querying structured data using defined schema
Note: TrustGraph 1.3 introduces fully integrated query capabilities for structured data. You can now query extracted data using natural language, GraphQL, or direct object queries through the CLI commands.
This guide walks through defining extraction schemas, loading structured data, processing documents, and querying the extracted data using TrustGraph’s integrated query tools.
What You’ll Learn
- How to query extracted data using natural language, GraphQL, and object queries
Prerequisites
Before starting this guide, ensure you have:
- A running TrustGraph instance version 1.3 or later (see Installation Guide)
- Python 3.10 or later with the TrustGraph CLI tools installed (
pip install trustgraph-cli
) - Sample documents or structured data files to process
Workbench
The Structured Query page on the workbench UI allows you to run the queries we’ll be running here. Make sure:
- You have set the collection parameter correctly in the session state popover, top-right.
- Be sure to set a flow which has object processing enabled e.g. the
obj-ex
flow which you created if you are following this guide.
NLP query operation
This operation takes a natural language query, and uses an LLM prompt to convert to a GraphQL query. This uses defined schema, so you need to have the schemas loaded in the previous guide steps.
This is a building block for more complete functionality, but it may be useful for you to be able to look at converted queries to check that your application is performing well.
tg-invoke-nlp-query -f obj-ex -q 'Cities with more than 22.8m people'
If successful the output is something like…
Generated GraphQL Query:
----------------------------------------
query { cities(where: {population: {gt: 22800000}}) { city country population } }
----------------------------------------
Detected Schemas: cities
Confidence: 95.00%
Querying the pies data:
tg-invoke-nlp-query -f obj-ex \
-q 'Which pies have more than 20cm diameter?'
If successful the output is something like…
Generated GraphQL Query:
----------------------------------------
query { pies(where: {diameter_cm: {gt: 20}}) { pie_type region diameter_cm } }
----------------------------------------
Detected Schemas: pies
Confidence: 95.00%
Objects query operation
This operation takes a GraphQL query, and executes it on the object store.
City example:
tg-invoke-objects-query -f obj-ex --collection cities -q '
{
cities(where: {population: {gt: 22800000}}) { city country population }
}
'
+-----------+------------+------------+
| city | country | population |
+-----------+------------+------------+
| Shanghai | China | 30482140 |
| São Paulo | Brazil | 22990007 |
| Delhi | India | 34665569 |
| Tokyo | Japan | 37036204 |
| Dhaka | Bangladesh | 24652864 |
| Cairo | Egypt | 23074225 |
+-----------+------------+------------+
Pies example:
tg-invoke-objects-query -f obj-ex \
--collection uk-pies \
-q '
{
pies (where: {diameter_cm: {gt: 20}})
{ pie_type region diameter_cm }
}'
If successful the output is something like…
+-------------------+-----------+-------------+
| pie_type | region | diameter_cm |
+-------------------+-----------+-------------+
| Veggie Wellington | London | 25.0 |
| Toad in the Hole | Yorkshire | 22.0 |
+-------------------+-----------+-------------+
You can use --format
to request CSV or JSON output.
Structured query operation
This is an API which uses the above two operations in sequence.
Cities example:
tg-invoke-structure-query -f obj-ex --collection cities \
-q 'Cities with more than 22.8m people'
+-----------+------------+------------+
| city | country | population |
+-----------+------------+------------+
| Shanghai | China | 30482140 |
| São Paulo | Brazil | 22990007 |
| Delhi | India | 34665569 |
| Tokyo | Japan | 37036204 |
| Dhaka | Bangladesh | 24652864 |
| Cairo | Egypt | 23074225 |
+-----------+------------+------------+
Pies example:
tg-invoke-structured-query -f obj-ex \
--collection uk-pies \
-q 'Which pies have more than 20cm diameter?'
If successful the output is something like…
+-------------------+-----------+-------------+
| pie_type | region | diameter_cm |
+-------------------+-----------+-------------+
| Veggie Wellington | London | 25.0 |
| Toad in the Hole | Yorkshire | 22.0 |
+-------------------+-----------+-------------+
You can use --format
to request CSV or JSON output.
With collections
Using the same schema with different collections allows you to group data:
tg-invoke-structured-query -f obj-ex \
--collection fr-pies \
-q 'Which pies have more than 20cm diameter?'
+-----------------------+------------+-------------+
| pie_type | region | diameter_cm |
+-----------------------+------------+-------------+
| Tarte Flambée | Alsace | 28.0 |
| Tarte Alsacienne | Alsace | 20.5 |
| Quiche Lorraine | Lorraine | 22.0 |
| Pissaladière | Provence | 25.0 |
| Galette des Rois | Nationwide | 21.0 |
| Flamiche aux Poireaux | Picardy | 22.5 |
+-----------------------+------------+-------------+
tg-invoke-structured-query -f obj-ex \
--collection uk-pies \
-q 'Which pies have more than 20cm diameter?'
+-------------------+-----------+-------------+
| pie_type | region | diameter_cm |
+-------------------+-----------+-------------+
| Veggie Wellington | London | 25.0 |
| Toad in the Hole | Yorkshire | 22.0 |
+-------------------+-----------+-------------+
Best Practices
Schema Design
- Keep schemas focused on specific domains
- Use clear, descriptive property names
- Include helpful descriptions for each property
- Start simple and iterate
Further Reading
- tg-invoke-structured-query - Execute GraphQL queries
- tg-invoke-nlp-query - Convert natural language to GraphQL
- tg-invoke-objects-query - Query objects in collections
- TrustGraph CLI Reference - Complete CLI documentation