TrustGraph Core Import/Export API
This API provides bulk import and export capabilities for TrustGraph knowledge cores. It handles efficient transfer of both RDF triples and graph embeddings using MessagePack binary format for high-performance data exchange.
Overview
The Core Import/Export API enables:
- Bulk Import: Import large knowledge cores from binary streams
- Bulk Export: Export knowledge cores as binary streams
- Efficient Format: Uses MessagePack for compact, fast serialization
- Dual Data Types: Handles both RDF triples and graph embeddings
- Streaming: Supports streaming for large datasets
Import Endpoint
Endpoint: POST /api/v1/import-core
Query Parameters:
id
: Knowledge core identifieruser
: User identifier
Content-Type: application/octet-stream
Request Body: MessagePack-encoded binary stream
Import Process
- Stream Processing: Reads binary data in 128KB chunks
- MessagePack Decoding: Unpacks binary data into structured messages
- Knowledge Storage: Stores data via Knowledge API
- Response: Returns success/error status
Import Data Format
The import stream contains MessagePack-encoded tuples with type indicators:
Triples Data
("t", {
"m": { # metadata
"i": "core-id",
"m": [], # metadata triples
"u": "user",
"c": "collection"
},
"t": [ # triples array
{
"s": {"value": "subject", "is_uri": true},
"p": {"value": "predicate", "is_uri": true},
"o": {"value": "object", "is_uri": false}
}
]
})
Graph Embeddings Data
("ge", {
"m": { # metadata
"i": "core-id",
"m": [], # metadata triples
"u": "user",
"c": "collection"
},
"e": [ # entities array
{
"e": {"value": "entity", "is_uri": true},
"v": [[0.1, 0.2, 0.3]] # vectors
}
]
})
Export Endpoint
Endpoint: GET /api/v1/export-core
Query Parameters:
id
: Knowledge core identifieruser
: User identifier
Content-Type: application/octet-stream
Response Body: MessagePack-encoded binary stream
Export Process
- Knowledge Retrieval: Fetches data via Knowledge API
- MessagePack Encoding: Encodes data into binary format
- Streaming Response: Sends data as binary stream
- Type Identification: Uses type prefixes for data classification
Usage Examples
Import Knowledge Core
# Import from file
curl -X POST \
-H "Authorization: Bearer your-token" \
-H "Content-Type: application/octet-stream" \
--data-binary @knowledge-core.msgpack \
"http://api-gateway:8080/api/v1/import-core?id=core-123&user=alice"
Export Knowledge Core
# Export to file
curl -H "Authorization: Bearer your-token" \
"http://api-gateway:8080/api/v1/export-core?id=core-123&user=alice" \
-o knowledge-core.msgpack
Python Integration
Import Example
import msgpack
import requests
def import_knowledge_core(core_id, user, triples_data, embeddings_data, token):
# Prepare data
messages = []
# Add triples
if triples_data:
messages.append(("t", {
"m": {
"i": core_id,
"m": [],
"u": user,
"c": "default"
},
"t": triples_data
}))
# Add embeddings
if embeddings_data:
messages.append(("ge", {
"m": {
"i": core_id,
"m": [],
"u": user,
"c": "default"
},
"e": embeddings_data
}))
# Pack data
binary_data = b''.join(msgpack.packb(msg) for msg in messages)
# Upload
response = requests.post(
f"http://api-gateway:8080/api/v1/import-core?id={core_id}&user={user}",
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/octet-stream"
},
data=binary_data
)
return response.status_code == 200
# Usage
triples = [
{
"s": {"value": "Person1", "is_uri": True},
"p": {"value": "hasName", "is_uri": True},
"o": {"value": "John Doe", "is_uri": False}
}
]
embeddings = [
{
"e": {"value": "Person1", "is_uri": True},
"v": [[0.1, 0.2, 0.3, 0.4]]
}
]
success = import_knowledge_core("core-123", "alice", triples, embeddings, "your-token")
Export Example
import msgpack
import requests
def export_knowledge_core(core_id, user, token):
response = requests.get(
f"http://api-gateway:8080/api/v1/export-core?id={core_id}&user={user}",
headers={"Authorization": f"Bearer {token}"}
)
if response.status_code != 200:
return None
# Decode MessagePack stream
data = response.content
unpacker = msgpack.Unpacker()
unpacker.feed(data)
triples = []
embeddings = []
for unpacked in unpacker:
msg_type, msg_data = unpacked
if msg_type == "t":
triples.extend(msg_data["t"])
elif msg_type == "ge":
embeddings.extend(msg_data["e"])
return {
"triples": triples,
"embeddings": embeddings
}
# Usage
data = export_knowledge_core("core-123", "alice", "your-token")
if data:
print(f"Exported {len(data['triples'])} triples")
print(f"Exported {len(data['embeddings'])} embeddings")
Data Format Specification
MessagePack Tuples
Each message is a tuple: (type_indicator, data_object)
Type Indicators:
"t"
: RDF triples data"ge"
: Graph embeddings data
Metadata Structure
{
"i": "core-identifier", # ID
"m": [...], # Metadata triples array
"u": "user-identifier", # User
"c": "collection-name" # Collection
}
Triple Structure
{
"s": {"value": "subject", "is_uri": boolean},
"p": {"value": "predicate", "is_uri": boolean},
"o": {"value": "object", "is_uri": boolean}
}
Entity Embedding Structure
{
"e": {"value": "entity", "is_uri": boolean},
"v": [[float, float, ...]] # Array of vectors
}
Performance Characteristics
Import Performance
- Streaming: Processes data in 128KB chunks
- Memory Efficient: Incremental unpacking
- Concurrent: Multiple imports can run simultaneously
- Error Handling: Robust error recovery
Export Performance
- Direct Streaming: Data streamed directly from knowledge store
- Efficient Encoding: MessagePack for minimal overhead
- Large Dataset Support: Handles cores of any size
Error Handling
Import Errors
- Format Errors: Invalid MessagePack data
- Type Errors: Unknown type indicators
- Storage Errors: Knowledge API failures
- Authentication: Invalid user credentials
Export Errors
- Not Found: Core ID doesn’t exist
- Access Denied: User lacks permissions
- System Errors: Knowledge API failures
Error Responses
- HTTP 400: Bad request (invalid parameters)
- HTTP 401: Unauthorized access
- HTTP 404: Core not found
- HTTP 500: Internal server error
Use Cases
Data Migration
- System Upgrades: Export/import during system migrations
- Environment Sync: Copy cores between environments
- Backup/Restore: Full knowledge core backup operations
Batch Processing
- Bulk Loading: Load large knowledge datasets efficiently
- Data Integration: Merge knowledge from multiple sources
- ETL Pipelines: Extract-Transform-Load operations
Performance Optimization
- Faster Than REST: Binary format reduces transfer time
- Atomic Operations: Complete import/export as single operation
- Resource Efficient: Minimal memory footprint during transfer
Security Considerations
- Authentication Required: Bearer token authentication
- User Isolation: Access restricted to user’s own cores
- Data Validation: Input validation on import
- Audit Logging: Operations logged for security auditing