tg-load-doc-embeds

Loads document embeddings from MessagePack format into TrustGraph processing pipelines.

Synopsis

tg-load-doc-embeds -i INPUT_FILE [options]

Description

The tg-load-doc-embeds command loads document embeddings from MessagePack files into a running TrustGraph system. This is typically used to restore previously saved document embeddings or to load embeddings generated by external systems.

The command reads document embedding data and streams it to TrustGraph’s document embeddings import API via WebSocket connections.

Options

Required Arguments

Option Description
-i, --input-file FILE Input MessagePack file containing document embeddings

Optional Arguments

Option Default Description
-u, --url URL $TRUSTGRAPH_API or http://localhost:8088/ TrustGraph API URL
-t, --token TOKEN $TRUSTGRAPH_TOKEN Authentication token
-f, --flow-id ID default Flow instance ID to use
--format FORMAT msgpack Input format - msgpack or json
--user USER (from input) Override user ID from input data
--collection COLLECTION (from input) Override collection ID from input data

Examples

Load Document Embeddings

tg-load-doc-embeds -i document-embeddings.msgpack

Load with Custom Flow

tg-load-doc-embeds \
  -i embeddings.msgpack \
  -f "document-processing-flow"

Override Collection

tg-load-doc-embeds \
  -i embeddings.msgpack \
  --collection "research-docs"

Load from JSON Format

tg-load-doc-embeds \
  -i embeddings.json \
  --format json

Input Data Format

MessagePack Structure

Document embeddings are stored as MessagePack records:

["de", {
  "m": {
    "i": "document-id",
    "u": "user-id",
    "c": "collection-id"
  },
  "c": [{
    "c": "text chunk content",
    "v": [0.1, 0.2, 0.3, ...]
  }]
}]

Components:

  • Metadata (m): Document ID, user, and collection
  • Chunks (c): Text chunks with their vector embeddings

Environment Variables

  • TRUSTGRAPH_API: Default API URL
  • TRUSTGRAPH_TOKEN: Default authentication token

API Integration

This command uses the Document Embeddings Import API via WebSocket for efficient streaming.