tg-load-text

Loads text documents into TrustGraph processing pipelines with rich metadata support.

Synopsis

tg-load-text [options] file1 [file2 ...]

Description

The tg-load-text command loads text documents into TrustGraph for processing. It creates a SHA256 hash-based document ID and supports comprehensive metadata including copyright information, publication details, and keywords.

Note: Consider using tg-add-library-document followed by tg-start-library-processing for better document management and processing control.

Options

Connection & Flow

  • -u, --url URL: TrustGraph API URL (default: $TRUSTGRAPH_URL or http://localhost:8088/)
  • -f, --flow-id FLOW: Flow ID for processing (default: default)
  • -U, --user USER: User identifier (default: trustgraph)
  • -C, --collection COLLECTION: Collection identifier (default: default)

Document Metadata

  • --name NAME: Document name/title
  • --description DESCRIPTION: Document description
  • --document-url URL: Document source URL
  • --copyright-notice NOTICE: Copyright notice text
  • --copyright-holder HOLDER: Copyright holder name
  • --copyright-year YEAR: Copyright year
  • --license LICENSE: Copyright license

Publication Information

  • --publication-organization ORG: Publishing organization
  • --publication-description DESC: Publication description
  • --publication-date DATE: Publication date

Keywords

  • --keyword KEYWORD [KEYWORD ...]: Document keywords (can specify multiple)

Arguments

  • file1 [file2 ...]: One or more text files to load

Examples

Basic Document Loading

tg-load-text document.txt

Loading with Metadata

tg-load-text \
  --name "Research Paper on AI" \
  --description "Comprehensive study of machine learning algorithms" \
  --keyword "AI" "machine learning" "research" \
  research-paper.txt

Complete Metadata Example

tg-load-text \
  --name "TrustGraph Documentation" \
  --description "Complete user guide for TrustGraph system" \
  --copyright-holder "TrustGraph Project" \
  --copyright-year "2024" \
  --license "MIT" \
  --publication-organization "TrustGraph Foundation" \
  --publication-date "2024-01-15" \
  --keyword "documentation" "guide" "tutorial" \
  --flow-id research-flow \
  trustgraph-guide.txt

Multiple Files

tg-load-text chapter1.txt chapter2.txt chapter3.txt

Custom Flow and Collection

tg-load-text \
  --flow-id medical-research \
  --user researcher \
  --collection medical-papers \
  medical-study.txt

Output

For each file processed, the command outputs:

Success

document.txt: Loaded successfully.

Failure

document.txt: Failed: Connection refused

Document Processing

  1. File Reading: Reads the text file content
  2. Hash Generation: Creates SHA256 hash for unique document ID
  3. URI Creation: Converts hash to document URI format
  4. Metadata Assembly: Combines all metadata into RDF triples
  5. API Submission: Sends to TrustGraph via Text Load API

Document ID Generation

Documents are assigned IDs based on their content hash:

  • SHA256 hash of file content
  • Converted to TrustGraph document URI format
  • Example: http://trustgraph.ai/d/abc123...

Metadata Format

The metadata is stored as RDF triples including:

Standard Properties

  • dc:title: Document name
  • dc:description: Document description
  • dc:creator: Copyright holder
  • dc:date: Publication date
  • dc:rights: Copyright notice
  • dc:license: License information

Keywords

  • dc:subject: Each keyword as separate triple

Organization Information

  • foaf:Organization: Publication organization details

Error Handling

File Errors

document.txt: Failed: No such file or directory

Solution: Verify the file path exists and is readable.

Connection Errors

document.txt: Failed: Connection refused

Solution: Check the API URL and ensure TrustGraph is running.

Flow Errors

document.txt: Failed: Invalid flow

Solution: Verify the flow exists and is running using tg-show-flows.

Environment Variables

  • TRUSTGRAPH_URL: Default API URL

API Integration

This command uses the Text Load API to submit documents for processing. The text content is base64-encoded for transmission.

Use Cases

Academic Research

tg-load-text \
  --name "Climate Change Impact Study" \
  --publication-organization "University Research Center" \
  --keyword "climate" "research" "environment" \
  climate-study.txt

Corporate Documentation

tg-load-text \
  --name "Product Manual" \
  --copyright-holder "Acme Corp" \
  --license "Proprietary" \
  --keyword "manual" "product" "guide" \
  product-manual.txt

Technical Documentation

tg-load-text \
  --name "API Reference" \
  --description "Complete API documentation" \
  --keyword "API" "reference" "technical" \
  api-docs.txt

Best Practices

  1. Use Descriptive Names: Provide clear document names and descriptions
  2. Add Keywords: Include relevant keywords for better searchability
  3. Complete Metadata: Fill in copyright and publication information
  4. Batch Processing: Load multiple related files together
  5. Use Collections: Organize documents by topic or project using collections