tg-load-text
Loads text documents into TrustGraph processing pipelines with rich metadata support.
Synopsis
tg-load-text [options] file1 [file2 ...]
Description
The tg-load-text command loads text documents into TrustGraph for processing. It creates a SHA256 hash-based document ID and supports comprehensive metadata including copyright information, publication details, and keywords.
Note: Consider using tg-add-library-document followed by tg-start-library-processing for better document management and processing control.
Options
Connection & Flow
-u, --url URL: TrustGraph API URL (default:$TRUSTGRAPH_URLorhttp://localhost:8088/)-f, --flow-id FLOW: Flow ID for processing (default:default)-U, --user USER: User identifier (default:trustgraph)-C, --collection COLLECTION: Collection identifier (default:default)
Document Metadata
--name NAME: Document name/title--description DESCRIPTION: Document description--document-url URL: Document source URL
Copyright Information
--copyright-notice NOTICE: Copyright notice text--copyright-holder HOLDER: Copyright holder name--copyright-year YEAR: Copyright year--license LICENSE: Copyright license
Publication Information
--publication-organization ORG: Publishing organization--publication-description DESC: Publication description--publication-date DATE: Publication date
Keywords
--keyword KEYWORD [KEYWORD ...]: Document keywords (can specify multiple)
Arguments
file1 [file2 ...]: One or more text files to load
Examples
Basic Document Loading
tg-load-text document.txt
Loading with Metadata
tg-load-text \
--name "Research Paper on AI" \
--description "Comprehensive study of machine learning algorithms" \
--keyword "AI" "machine learning" "research" \
research-paper.txt
Complete Metadata Example
tg-load-text \
--name "TrustGraph Documentation" \
--description "Complete user guide for TrustGraph system" \
--copyright-holder "TrustGraph Project" \
--copyright-year "2024" \
--license "MIT" \
--publication-organization "TrustGraph Foundation" \
--publication-date "2024-01-15" \
--keyword "documentation" "guide" "tutorial" \
--flow-id research-flow \
trustgraph-guide.txt
Multiple Files
tg-load-text chapter1.txt chapter2.txt chapter3.txt
Custom Flow and Collection
tg-load-text \
--flow-id medical-research \
--user researcher \
--collection medical-papers \
medical-study.txt
Output
For each file processed, the command outputs:
Success
document.txt: Loaded successfully.
Failure
document.txt: Failed: Connection refused
Document Processing
- File Reading: Reads the text file content
- Hash Generation: Creates SHA256 hash for unique document ID
- URI Creation: Converts hash to document URI format
- Metadata Assembly: Combines all metadata into RDF triples
- API Submission: Sends to TrustGraph via Text Load API
Document ID Generation
Documents are assigned IDs based on their content hash:
- SHA256 hash of file content
- Converted to TrustGraph document URI format
- Example:
http://trustgraph.ai/d/abc123...
Metadata Format
The metadata is stored as RDF triples including:
Standard Properties
dc:title: Document namedc:description: Document descriptiondc:creator: Copyright holderdc:date: Publication datedc:rights: Copyright noticedc:license: License information
Keywords
dc:subject: Each keyword as separate triple
Organization Information
foaf:Organization: Publication organization details
Error Handling
File Errors
document.txt: Failed: No such file or directory
Solution: Verify the file path exists and is readable.
Connection Errors
document.txt: Failed: Connection refused
Solution: Check the API URL and ensure TrustGraph is running.
Flow Errors
document.txt: Failed: Invalid flow
Solution: Verify the flow exists and is running using tg-show-flows.
Environment Variables
TRUSTGRAPH_URL: Default API URL
Related Commands
tg-add-library-document- Add documents to library (recommended)tg-load-pdf- Load PDF documentstg-show-library-documents- List loaded documentstg-start-library-processing- Start document processing
API Integration
This command uses the Text Load API to submit documents for processing. The text content is base64-encoded for transmission.
Use Cases
Academic Research
tg-load-text \
--name "Climate Change Impact Study" \
--publication-organization "University Research Center" \
--keyword "climate" "research" "environment" \
climate-study.txt
Corporate Documentation
tg-load-text \
--name "Product Manual" \
--copyright-holder "Acme Corp" \
--license "Proprietary" \
--keyword "manual" "product" "guide" \
product-manual.txt
Technical Documentation
tg-load-text \
--name "API Reference" \
--description "Complete API documentation" \
--keyword "API" "reference" "technical" \
api-docs.txt
Best Practices
- Use Descriptive Names: Provide clear document names and descriptions
- Add Keywords: Include relevant keywords for better searchability
- Complete Metadata: Fill in copyright and publication information
- Batch Processing: Load multiple related files together
- Use Collections: Organize documents by topic or project using collections