Flow Class Configuration

Flow classes define complete dataflow pattern templates in TrustGraph. When instantiated, they create interconnected networks of processors that handle data ingestion, processing, storage, and querying as a unified system.

Overview

A flow class serves as a blueprint for creating flow instances. Each flow class defines:

Shared services that are used by all flow instances of the same class
Flow-specific processors that are unique to each flow instance
Interfaces that define how external systems interact with the flow
Queue patterns that route messages between processors

Flow classes are stored in TrustGraph’s configuration system with the configuration type flow-classes and are managed through dedicated CLI commands.

Structure

Every flow class definition has four main sections:

1. Class Section

Defines shared service processors that are instantiated once per flow class. These processors handle requests from all flow instances of this class.

{
  "class": {
    "embeddings:{class}": {
      "request": "non-persistent://tg/request/embeddings:{class}",
      "response": "non-persistent://tg/response/embeddings:{class}"
    },
    "text-completion:{class}": {
      "request": "non-persistent://tg/request/text-completion:{class}",
      "response": "non-persistent://tg/response/text-completion:{class}"
    }
  }
}

Characteristics:

Shared across all flow instances of the same class
Typically expensive or stateless services (LLMs, embedding models)
Use {class} template variable for queue naming
Examples: embeddings:{class}, text-completion:{class}, graph-rag:{class}

2. Flow Section

Defines flow-specific processors that are instantiated for each individual flow instance. Each flow gets its own isolated set of these processors.

{
  "flow": {
    "chunker:{id}": {
      "input": "persistent://tg/flow/chunk:{id}",
      "output": "persistent://tg/flow/chunk-load:{id}"
    },
    "pdf-decoder:{id}": {
      "input": "persistent://tg/flow/document-load:{id}",
      "output": "persistent://tg/flow/chunk:{id}"
    }
  }
}

Characteristics:

Unique instance per flow
Handle flow-specific data and state
Use {id} template variable for queue naming
Examples: chunker:{id}, pdf-decoder:{id}, kg-extract-relationships:{id}

3. Interfaces Section

Defines the entry points and interaction contracts for the flow. These form the API surface for external systems and internal component communication.

Interfaces can take two forms:

Fire-and-Forget Pattern (single queue):

{
  "interfaces": {
    "document-load": "persistent://tg/flow/document-load:{id}",
    "triples-store": "persistent://tg/flow/triples-store:{id}"
  }
}

Request/Response Pattern (object with request/response fields):

{
  "interfaces": {
    "embeddings": {
      "request": "non-persistent://tg/request/embeddings:{class}",
      "response": "non-persistent://tg/response/embeddings:{class}"
    },
    "text-completion": {
      "request": "non-persistent://tg/request/text-completion:{class}",
      "response": "non-persistent://tg/response/text-completion:{class}"
    }
  }
}

Types of Interfaces:

Entry Points: Where external systems inject data (document-load, agent)
Service Interfaces: Request/response patterns for services (embeddings, text-completion)
Data Interfaces: Fire-and-forget data flow connection points (triples-store, entity-contexts-load)

4. Metadata

Additional information about the flow class:

{
  "description": "Standard RAG pipeline with document processing and query capabilities",
  "tags": ["rag", "document-processing", "embeddings", "graph-query"]
}

Parameters

New in v1.4: Flow classes can define configurable parameters that allow customization of flow behavior without modifying the flow class definition. Parameters enable users to select different LLM models, adjust processing settings, and control flow behavior when starting flow instances.

Parameter Definition Schema

Parameters are defined in the flow class definition using this structure:

{
  "description": "Flow class description",
  "tags": ["tag1", "tag2"],
  "parameters": {
    "param-name": {
      "type": "parameter-type-ref",
      "description": "Human-readable description",
      "order": 1,
      "controlled-by": "other-param-name"
    }
  },
  "class": { ... },
  "flow": { ... },
  "interfaces": { ... }
}

Parameter Fields

type (required)

Reference to a parameter type definition stored in the configuration system. Parameter types define the schema, validation rules, default values, and allowed values.

Example:

"parameters": {
  "model": {
    "type": "llm-model"
  }
}

The llm-model type is looked up in the parameter type configuration, which defines valid models, defaults, and constraints.

description (optional)

Human-readable description of what this parameter controls in the context of this flow. Overrides or supplements the parameter type’s description.

Example:

"parameters": {
  "model": {
    "type": "llm-model",
    "description": "LLM model for document analysis and extraction"
  }
}

order (optional)

Display order for the parameter in user interfaces and CLI output. Parameters are shown in ascending order.

Example:

"parameters": {
  "model": {
    "type": "llm-model",
    "order": 1
  },
  "temperature": {
    "type": "temperature",
    "order": 2
  },
  "chunk-size": {
    "type": "chunk-size",
    "order": 3
  }
}

controlled-by (optional)

Indicates that this parameter’s value is automatically inherited from another parameter. Used when multiple services in a flow should use the same setting.

Example:

"parameters": {
  "llm-model": {
    "type": "llm-model",
    "description": "Primary LLM model",
    "order": 1
  },
  "rag-model": {
    "type": "llm-model",
    "description": "Model for RAG queries",
    "order": 2,
    "controlled-by": "llm-model"
  }
}

When controlled-by is specified:

The parameter inherits the value from the controlling parameter
Users can optionally override the inherited value
UI can display the inheritance relationship

Complete Parameter Example

{
  "description": "Customizable RAG pipeline with LLM selection",
  "tags": ["rag", "configurable"],
  "parameters": {
    "llm-model": {
      "type": "llm-model",
      "description": "Primary language model for processing",
      "order": 1
    },
    "rag-model": {
      "type": "llm-model",
      "description": "Model for RAG query generation",
      "order": 2,
      "controlled-by": "llm-model"
    },
    "temperature": {
      "type": "temperature",
      "description": "Response randomness (0.0 = deterministic, 2.0 = very random)",
      "order": 3
    },
    "chunk-size": {
      "type": "chunk-size",
      "description": "Maximum text chunk size for processing",
      "order": 4
    },
    "embedding-model": {
      "type": "embedding-model",
      "description": "Model for generating document embeddings",
      "order": 5
    }
  },
  "class": { ... },
  "flow": { ... },
  "interfaces": { ... }
}

Parameter Types

Parameter types are centrally defined in the configuration system with type parameter-types. Each parameter type specifies:

Data type: string, number, integer, boolean, array, object
Default value: Value used when not specified by user
Enum values: List of allowed values with descriptions
Constraints: Validation rules (min/max, length, pattern, required)

Common parameter types include:

Type	Description	Example Values
`llm-model`	LLM model selection	`gpt-4`, `claude-3-opus`, `mistral-large`
`temperature`	LLM temperature	`0.0` to `2.0` (default: `0.7`)
`chunk-size`	Text chunking size	`100` to `10000` (default: `1000`)
`embedding-model`	Embedding model	`text-embedding-ada-002`, `text-embedding-3-large`

See Parameter Types for complete parameter type documentation.

Parameter Resolution

When a flow instance is started, parameters are resolved in this order:

User-provided values: Explicit values from tg-start-flow --param or API
Default values: From parameter type definitions
Controlled-by relationships: Inherited from controlling parameters
Required validation: Error if required parameters are missing

Example:

Given parameter definitions with defaults:

llm-model: default gpt-4
temperature: default 0.7
chunk-size: default 1000

Starting a flow with:

tg-start-flow -n my-flow -i flow1 -d "Test" --param llm-model=claude-3-opus

Results in:

llm-model: claude-3-opus (user-provided)
temperature: 0.7 (default)
chunk-size: 1000 (default)

Using Parameters in Flow Definitions

Parameters can be referenced in flow class definitions using the {param:name} syntax. This allows queue names, processor configurations, and other settings to be parameterized.

Example:

{
  "parameters": {
    "model": {
      "type": "llm-model",
      "order": 1
    }
  },
  "class": {
    "text-completion:{class}": {
      "request": "non-persistent://tg/request/text-completion:{class}",
      "response": "non-persistent://tg/response/text-completion:{class}",
      "config": {
        "model": "{param:model}"
      }
    }
  }
}

When the flow is started with --param model=gpt-4, the configuration becomes:

{
  "config": {
    "model": "gpt-4"
  }
}

Parameter Storage

All parameter values are stored as strings internally, regardless of their input format. When starting flows:

Numbers: --param temperature=0.7 → stored as "0.7"
Booleans: --param enabled=true → stored as "true"
Strings: --param model=gpt-4 → stored as "gpt-4"

Processors are responsible for converting string values to appropriate types based on parameter type definitions.

Benefits of Parameters

Flexibility: Customize flow behavior without modifying flow classes
Reusability: Single flow class supports multiple configurations
Consistency: Centralized parameter type definitions ensure validation
Discoverability: Users can see available parameters with tg-show-flow-classes
Documentation: Parameter types include descriptions and constraints

Template Variables

Flow class definitions use template variables that are replaced when flow instances are created:

{id}

Purpose: Creates isolated resources for each flow instance
Usage: Flow-specific processors and data pathways
Example: persistent://tg/flow/chunk-load:{id} becomes persistent://tg/flow/chunk-load:customer-A-flow

{class}

Purpose: Creates shared resources across flows of the same class
Usage: Shared services and expensive processors
Example: non-persistent://tg/request/embeddings:{class} becomes non-persistent://tg/request/embeddings:standard-rag

Queue Patterns

Flow classes use Apache Pulsar for messaging. Queue names follow the Pulsar format:

<persistence>://<tenant>/<namespace>/<topic>

Queue Components

Component	Description	Examples
persistence	Pulsar persistence mode	`persistent`, `non-persistent`
tenant	Organization identifier	`tg` (TrustGraph)
namespace	Messaging pattern	`flow`, `request`, `response`
topic	Queue/topic name	`chunk-load:{id}`, `embeddings:{class}`

Persistent Queues

Used for fire-and-forget services and durable data flow:

persistent://tg/flow/<topic>:{id}

Characteristics:

Data persists in Pulsar storage across restarts
Used for document processing pipelines
Ensures data durability and reliability
Examples: persistent://tg/flow/chunk-load:{id}, persistent://tg/flow/triples-store:{id}

Non-Persistent Queues

Used for request/response messaging patterns:

non-persistent://tg/request/<topic>:{class}
non-persistent://tg/response/<topic>:{class}

Characteristics:

Ephemeral, not persisted to disk
Lower latency, suitable for RPC-style communication
Used for shared services like embeddings and LLM calls
Examples: non-persistent://tg/request/embeddings:{class}, non-persistent://tg/response/text-completion:{class}

Complete Example

Here’s a simplified flow class definition for a standard RAG pipeline:

{
  "description": "Standard RAG pipeline with document processing and query capabilities",
  "tags": ["rag", "document-processing", "embeddings"],
  
  "class": {
    "embeddings:{class}": {
      "request": "non-persistent://tg/request/embeddings:{class}",
      "response": "non-persistent://tg/response/embeddings:{class}"
    },
    "text-completion:{class}": {
      "request": "non-persistent://tg/request/text-completion:{class}",
      "response": "non-persistent://tg/response/text-completion:{class}"
    }
  },
  
  "flow": {
    "pdf-decoder:{id}": {
      "input": "persistent://tg/flow/document-load:{id}",
      "output": "persistent://tg/flow/chunk:{id}"
    },
    "chunker:{id}": {
      "input": "persistent://tg/flow/chunk:{id}",
      "output": "persistent://tg/flow/chunk-load:{id}"
    },
    "vectorizer:{id}": {
      "input": "persistent://tg/flow/chunk-load:{id}",
      "output": "persistent://tg/flow/doc-embeds-store:{id}"
    }
  },
  
  "interfaces": {
    "document-load": "persistent://tg/flow/document-load:{id}",
    "embeddings": {
      "request": "non-persistent://tg/request/embeddings:{class}",
      "response": "non-persistent://tg/response/embeddings:{class}"
    },
    "text-completion": {
      "request": "non-persistent://tg/request/text-completion:{class}",
      "response": "non-persistent://tg/response/text-completion:{class}"
    }
  }
}

Flow Instantiation

When a flow instance is created from this class:

Given:

Flow Instance ID: customer-A-flow
Flow Class: standard-rag

Template Expansions:

persistent://tg/flow/chunk-load:{id} → persistent://tg/flow/chunk-load:customer-A-flow
non-persistent://tg/request/embeddings:{class} → non-persistent://tg/request/embeddings:standard-rag

Result:

Isolated document processing pipeline for customer-A-flow
Shared embedding service for all standard-rag flows
Complete dataflow from document ingestion through querying

Dataflow Architecture

Flow classes create unified dataflows where:

Document Processing Pipeline: Flows from ingestion through transformation to storage
Query Services: Integrated processors that query the same data stores and services
Shared Services: Centralized processors that all flows can utilize
Storage Writers: Persist processed data to appropriate stores

All processors (both {id} and {class}) work together as a cohesive dataflow graph, not as separate systems.

Benefits

Resource Efficiency

Expensive services (LLMs, embedding models) are shared across flows
Reduces computational costs and resource usage

Flow Isolation

Each flow has its own data processing pipeline
Prevents data mixing between different flows

Scalability

Can instantiate multiple flows from the same template
Horizontal scaling by adding more flow instances

Modularity

Clear separation between shared and flow-specific components
Easy to modify and extend flow capabilities

Unified Architecture

Query and processing are part of the same dataflow
Consistent data handling across ingestion and retrieval

Common Patterns

Standard RAG Flow

Document ingestion → chunking → embedding → storage
Query interface for retrieval and generation

Knowledge Graph Flow

Document ingestion → entity extraction → relationship extraction → graph storage
Query interface for graph traversal and reasoning

Object Extraction Flow

Document ingestion → structured data extraction → object storage
Query interface for structured data retrieval

Best Practices

Queue Design

Use persistent queues for data that must survive restarts
Use non-persistent queues for fast request/response patterns
Include template variables in queue names for proper isolation

Share expensive services (LLMs, embeddings) at the class level
Keep data processing isolated at the flow level

Interface Design

Provide clear entry points for external systems
Use request/response patterns for synchronous operations
Use fire-and-forget patterns for asynchronous data flow

Template Variables

Use {id} for flow-specific resources
Use {class} for shared resources
Be consistent with naming conventions

Flow Class Configuration

Overview

Structure

1. Class Section

2. Flow Section

3. Interfaces Section

4. Metadata

Parameters

Parameter Definition Schema

Parameter Fields

type (required)

description (optional)

order (optional)

controlled-by (optional)

Complete Parameter Example

Parameter Types

Parameter Resolution

Using Parameters in Flow Definitions

Parameter Storage

Benefits of Parameters

Template Variables

{id}

{class}

Queue Patterns

Queue Components

Persistent Queues

Non-Persistent Queues

Complete Example

Flow Instantiation

Dataflow Architecture

Benefits

Resource Efficiency

Flow Isolation

Scalability

Modularity

Unified Architecture

Common Patterns

Standard RAG Flow

Knowledge Graph Flow

Object Extraction Flow

Best Practices

Queue Design

Service Sharing

Interface Design

Template Variables

See Also