tg-load-text
Loads text documents into TrustGraph processing pipelines with rich metadata support.
Synopsis
tg-load-text [options] file1 [file2 ...]
Description
The tg-load-text
command loads text documents into TrustGraph for processing. It creates a SHA256 hash-based document ID and supports comprehensive metadata including copyright information, publication details, and keywords.
Note: Consider using tg-add-library-document
followed by tg-start-library-processing
for better document management and processing control.
Options
Connection & Flow
-u, --url URL
: TrustGraph API URL (default:$TRUSTGRAPH_URL
orhttp://localhost:8088/
)-f, --flow-id FLOW
: Flow ID for processing (default:default
)-U, --user USER
: User identifier (default:trustgraph
)-C, --collection COLLECTION
: Collection identifier (default:default
)
Document Metadata
--name NAME
: Document name/title--description DESCRIPTION
: Document description--document-url URL
: Document source URL
Copyright Information
--copyright-notice NOTICE
: Copyright notice text--copyright-holder HOLDER
: Copyright holder name--copyright-year YEAR
: Copyright year--license LICENSE
: Copyright license
Publication Information
--publication-organization ORG
: Publishing organization--publication-description DESC
: Publication description--publication-date DATE
: Publication date
Keywords
--keyword KEYWORD [KEYWORD ...]
: Document keywords (can specify multiple)
Arguments
file1 [file2 ...]
: One or more text files to load
Examples
Basic Document Loading
tg-load-text document.txt
Loading with Metadata
tg-load-text \
--name "Research Paper on AI" \
--description "Comprehensive study of machine learning algorithms" \
--keyword "AI" "machine learning" "research" \
research-paper.txt
Complete Metadata Example
tg-load-text \
--name "TrustGraph Documentation" \
--description "Complete user guide for TrustGraph system" \
--copyright-holder "TrustGraph Project" \
--copyright-year "2024" \
--license "MIT" \
--publication-organization "TrustGraph Foundation" \
--publication-date "2024-01-15" \
--keyword "documentation" "guide" "tutorial" \
--flow-id research-flow \
trustgraph-guide.txt
Multiple Files
tg-load-text chapter1.txt chapter2.txt chapter3.txt
Custom Flow and Collection
tg-load-text \
--flow-id medical-research \
--user researcher \
--collection medical-papers \
medical-study.txt
Output
For each file processed, the command outputs:
Success
document.txt: Loaded successfully.
Failure
document.txt: Failed: Connection refused
Document Processing
- File Reading: Reads the text file content
- Hash Generation: Creates SHA256 hash for unique document ID
- URI Creation: Converts hash to document URI format
- Metadata Assembly: Combines all metadata into RDF triples
- API Submission: Sends to TrustGraph via Text Load API
Document ID Generation
Documents are assigned IDs based on their content hash:
- SHA256 hash of file content
- Converted to TrustGraph document URI format
- Example:
http://trustgraph.ai/d/abc123...
Metadata Format
The metadata is stored as RDF triples including:
Standard Properties
dc:title
: Document namedc:description
: Document descriptiondc:creator
: Copyright holderdc:date
: Publication datedc:rights
: Copyright noticedc:license
: License information
Keywords
dc:subject
: Each keyword as separate triple
Organization Information
foaf:Organization
: Publication organization details
Error Handling
File Errors
document.txt: Failed: No such file or directory
Solution: Verify the file path exists and is readable.
Connection Errors
document.txt: Failed: Connection refused
Solution: Check the API URL and ensure TrustGraph is running.
Flow Errors
document.txt: Failed: Invalid flow
Solution: Verify the flow exists and is running using tg-show-flows
.
Environment Variables
TRUSTGRAPH_URL
: Default API URL
Related Commands
tg-add-library-document
- Add documents to library (recommended)tg-load-pdf
- Load PDF documentstg-show-library-documents
- List loaded documentstg-start-library-processing
- Start document processing
API Integration
This command uses the Text Load API to submit documents for processing. The text content is base64-encoded for transmission.
Use Cases
Academic Research
tg-load-text \
--name "Climate Change Impact Study" \
--publication-organization "University Research Center" \
--keyword "climate" "research" "environment" \
climate-study.txt
Corporate Documentation
tg-load-text \
--name "Product Manual" \
--copyright-holder "Acme Corp" \
--license "Proprietary" \
--keyword "manual" "product" "guide" \
product-manual.txt
Technical Documentation
tg-load-text \
--name "API Reference" \
--description "Complete API documentation" \
--keyword "API" "reference" "technical" \
api-docs.txt
Best Practices
- Use Descriptive Names: Provide clear document names and descriptions
- Add Keywords: Include relevant keywords for better searchability
- Complete Metadata: Fill in copyright and publication information
- Batch Processing: Load multiple related files together
- Use Collections: Organize documents by topic or project using collections