Object Storage API

The Object Storage service provides a persistent store for structured objects extracted from documents or created through agent workflows. It supports schema-based storage, querying, and import/export of structured data.

Overview

The Object Storage service:

Stores structured objects with defined schemas
Supports bulk import of objects
Enables querying objects by schema type
Integrates with extraction workflows
Maintains metadata and relationships

REST API

Object Import Endpoint

Import structured objects into the knowledge graph.

POST /api/v1/flow/{flow-id}/objects/import

Request Format

{
  "metadata": {
    "id": "import-001",
    "user": "system",
    "collection": "products",
    "metadata": {
      "source": "catalog.pdf",
      "extraction_date": "2024-01-15"
    }
  },
  "schema_name": "Product",
  "values": [
    {
      "id": "prod-1",
      "name": "Laptop",
      "price": 1299.99,
      "category": "Electronics",
      "description": "High-performance laptop"
    },
    {
      "id": "prod-2",
      "name": "Mouse",
      "price": 29.99,
      "category": "Electronics",
      "description": "Wireless mouse"
    }
  ]
}

Request Parameters

Parameter	Type	Required	Description
`metadata`	object	Yes	Import metadata including ID, user, and collection
`metadata.id`	string	Yes	Unique identifier for this import batch
`metadata.user`	string	Yes	User or system initiating the import
`metadata.collection`	string	Yes	Target collection name
`metadata.metadata`	object	No	Additional metadata (source, date, etc.)
`schema_name`	string	Yes	Name of the schema these objects conform to
`values`	array	Yes	Array of objects to import

Object Query Endpoint

Query stored objects by schema and filters.

POST /api/v1/flow/{flow-id}/objects/query

Request Format

{
  "schema_name": "Product",
  "filters": {
    "category": "Electronics",
    "price": {"$gt": 100}
  },
  "limit": 50,
  "offset": 0
}

Query Parameters

Parameter	Type	Required	Description
`schema_name`	string	Yes	Schema type to query
`filters`	object	No	Filter criteria
`limit`	integer	No	Maximum results (default: 100)
`offset`	integer	No	Pagination offset (default: 0)

Response Format

{
  "objects": [
    {
      "id": "prod-1",
      "name": "Laptop",
      "price": 1299.99,
      "category": "Electronics",
      "description": "High-performance laptop",
      "_metadata": {
        "created_at": "2024-01-15T10:30:00Z",
        "updated_at": "2024-01-15T10:30:00Z",
        "collection": "products"
      }
    }
  ],
  "total": 25,
  "limit": 50,
  "offset": 0
}

Python API

Importing Objects

from trustgraph.api import Api

api = Api("http://localhost:8088/").flow().id("default")

# Import product data
products = [
    {
        "id": "p1",
        "name": "Laptop",
        "price": 1299.99,
        "category": "Electronics"
    },
    {
        "id": "p2",
        "name": "Chair",
        "price": 199.99,
        "category": "Furniture"
    }
]

response = api.import_objects(
    metadata={
        "id": "batch-001",
        "user": "admin",
        "collection": "products"
    },
    schema_name="Product",
    values=products
)

print(f"Imported {len(products)} products")

Querying Objects

# Query products by category
response = api.query_objects(
    schema_name="Product",
    filters={"category": "Electronics"},
    limit=10
)

products = response["objects"]
for product in products:
    print(f"{product['name']}: ${product['price']}")

Bulk Import from Files

import json

# Load data from JSON file
with open("products.json", "r") as f:
    product_data = json.load(f)

# Import in batches
batch_size = 100
for i in range(0, len(product_data), batch_size):
    batch = product_data[i:i+batch_size]
    
    api.import_objects(
        metadata={
            "id": f"batch-{i//batch_size}",
            "user": "import-script",
            "collection": "products",
            "metadata": {
                "source": "products.json",
                "batch_number": i//batch_size
            }
        },
        schema_name="Product",
        values=batch
    )

Integration with Extraction Workflows

The Object Storage service integrates seamlessly with extraction workflows:

Agent-Based Extraction

# Agent extracts structured data from document
agent_response = api.invoke_agent(
    prompt="Extract product information as structured data",
    text=document_text
)

# Extracted objects are automatically stored
extracted_objects = agent_response["extracted_objects"]

# Query the stored objects
products = api.query_objects(
    schema_name="Product",
    filters={"_metadata.extraction_id": agent_response["extraction_id"]}
)

Object Extraction Process

# Configure extraction with schema
extraction_config = {
    "schema": "Product",
    "fields": ["name", "price", "category", "description"],
    "extraction_rules": {
        "price": {"type": "number", "format": "currency"},
        "category": {"type": "enum", "values": ["Electronics", "Furniture", "Clothing"]}
    }
}

# Extract and store objects
response = api.extract_objects(
    document=document_text,
    config=extraction_config
)

# Objects are automatically stored and queryable
object_ids = response["stored_object_ids"]

Schema Management

Defining Schemas

# Define a product schema
product_schema = {
    "name": "Product",
    "fields": {
        "id": {"type": "string", "required": True},
        "name": {"type": "string", "required": True},
        "price": {"type": "number", "required": True},
        "category": {"type": "string"},
        "description": {"type": "text"},
        "tags": {"type": "array", "items": {"type": "string"}}
    },
    "indexes": ["category", "price"]
}

api.create_schema(product_schema)

Schema Validation

Objects are validated against their schema during import:

try:
    api.import_objects(
        metadata={...},
        schema_name="Product",
        values=[
            {"name": "Invalid Product"}  # Missing required 'id' and 'price'
        ]
    )
except ValidationError as e:
    print(f"Validation failed: {e.errors}")
    # Output: Missing required fields: id, price

Metadata Management

Each stored object includes system metadata:

Field	Description
`_id`	Unique object identifier
`_schema`	Schema name
`_created_at`	Creation timestamp
`_updated_at`	Last update timestamp
`_collection`	Collection name
`_user`	User who created/imported
`_metadata`	Custom metadata

Error Codes

Code	Description
400	Invalid request format or schema validation failure
404	Schema or collection not found
409	Duplicate object ID
413	Import batch too large
500	Internal storage error
503	Storage service unavailable

Best Practices

Batch Imports: Import objects in batches of 100-1000 for optimal performance
Schema Design: Define clear schemas before importing data
Unique IDs: Ensure object IDs are unique within a collection
Metadata: Include source and timestamp metadata for traceability
Indexing: Define indexes on frequently queried fields

Object Storage API

Overview

REST API

Object Import Endpoint

Request Format

Request Parameters

Object Query Endpoint

Request Format

Query Parameters

Response Format

Python API

Importing Objects

Querying Objects

Bulk Import from Files

Integration with Extraction Workflows

Agent-Based Extraction

Object Extraction Process

Schema Management

Defining Schemas

Schema Validation

Metadata Management

Error Codes

Best Practices

See Also