tg-add-library-document

Adds documents to the TrustGraph library with comprehensive metadata support.

Synopsis

tg-add-library-document [options] file1 [file2 ...]

Description

The tg-add-library-document command adds documents to the TrustGraph library system, which provides persistent document storage with rich metadata management. Unlike direct document loading, the library approach offers better document lifecycle management, metadata preservation, and processing control.

Documents added to the library can later be processed using tg-start-library-processing for controlled batch processing operations.

Options

Connection & User

  • -u, --url URL: TrustGraph API URL (default: $TRUSTGRAPH_URL or http://localhost:8088/)
  • -U, --user USER: User identifier (default: trustgraph)

Document Information

  • --name NAME: Document name/title
  • --description DESCRIPTION: Document description
  • --id ID: Custom document identifier (if not specified, uses content hash)
  • --kind MIMETYPE: Document MIME type (auto-detected if not specified)
  • --tags TAGS: Comma-separated list of tags
  • --copyright-notice NOTICE: Copyright notice text
  • --copyright-holder HOLDER: Copyright holder name
  • --copyright-year YEAR: Copyright year
  • --license LICENSE: Copyright license

Publication Information

  • --publication-organization ORG: Publishing organization name
  • --publication-description DESC: Publication description
  • --publication-date DATE: Publication date
  • --publication-url URL: Publication URL

Document Source

  • --document-url URL: Original document source URL
  • --keyword KEYWORDS: Document keywords (space-separated)

Arguments

  • file1 [file2 ...]: One or more files to add to the library

Examples

Basic Document Addition

tg-add-library-document report.pdf

With Complete Metadata

tg-add-library-document \
  --name "Annual Research Report 2024" \
  --description "Comprehensive analysis of research outcomes" \
  --copyright-holder "Research Institute" \
  --copyright-year "2024" \
  --license "CC BY 4.0" \
  --tags "research,annual,analysis" \
  --keyword "research" "analysis" "2024" \
  annual-report.pdf

Academic Paper

tg-add-library-document \
  --name "Machine Learning in Healthcare" \
  --description "Study on ML applications in medical diagnosis" \
  --publication-organization "University Medical School" \
  --publication-date "2024-03-15" \
  --copyright-holder "Dr. Jane Smith" \
  --tags "machine-learning,healthcare,medical" \
  --keyword "ML" "healthcare" "diagnosis" \
  ml-healthcare-paper.pdf

Multiple Documents with Shared Metadata

tg-add-library-document \
  --publication-organization "Tech Company" \
  --copyright-holder "Tech Company Inc." \
  --copyright-year "2024" \
  --license "Proprietary" \
  --tags "documentation,technical" \
  manual-v1.pdf manual-v2.pdf manual-v3.pdf

Custom Document ID

tg-add-library-document \
  --id "PROJ-2024-001" \
  --name "Project Specification" \
  --description "Technical requirements document" \
  project-spec.docx

Document Processing

  1. File Reading: Reads document content as binary data
  2. ID Generation: Creates SHA256 hash-based ID (unless custom ID provided)
  3. Metadata Assembly: Combines all metadata into structured format
  4. Library Storage: Stores document and metadata in library system
  5. URI Creation: Generates TrustGraph document URI

Document ID Generation

  • Automatic: SHA256 hash of file content converted to TrustGraph URI
  • Custom: Use --id parameter for specific identifiers
  • Format: http://trustgraph.ai/d/[hash-or-custom-id]

MIME Type Detection

The system automatically detects document types:

  • PDF: application/pdf
  • Word: application/vnd.openxmlformats-officedocument.wordprocessingml.document
  • Text: text/plain
  • HTML: text/html

Override with --kind parameter if needed.

Metadata Format

Metadata is stored as RDF triples including:

Dublin Core Properties

  • dc:title: Document name
  • dc:description: Document description
  • dc:creator: Copyright holder
  • dc:date: Publication date
  • dc:rights: Copyright notice
  • dc:license: License information
  • dc:subject: Keywords and tags

Organization Information

  • foaf:Organization: Publisher details
  • foaf:name: Organization name
  • vcard:hasURL: Organization website

Document Properties

  • bibo:doi: DOI if applicable
  • bibo:url: Document source URL

Output

For each successfully added document:

report.pdf: Loaded successfully.

For failures:

invalid.pdf: Failed: File not found

Error Handling

File Errors

document.pdf: Failed: No such file or directory

Solution: Verify file path exists and is readable.

Permission Errors

document.pdf: Failed: Permission denied

Solution: Check file permissions and user access rights.

Connection Errors

document.pdf: Failed: Connection refused

Solution: Verify API URL and ensure TrustGraph is running.

Library Errors

document.pdf: Failed: Document already exists

Solution: Use different ID or update existing document.

Library Management Workflow

1. Add Documents

tg-add-library-document research-paper.pdf

2. Verify Addition

tg-show-library-documents

3. Start Processing

tg-start-library-processing --flow-id research-flow

4. Monitor Processing

tg-show-library-processing

Environment Variables

  • TRUSTGRAPH_URL: Default API URL

API Integration

This command uses the Librarian API with the add-document operation to store documents with metadata.

Use Cases

Research Document Management

tg-add-library-document \
  --name "Climate Change Analysis" \
  --publication-organization "Climate Research Institute" \
  --tags "climate,research,environment" \
  climate-study.pdf

Corporate Documentation

tg-add-library-document \
  --name "Product Manual v2.1" \
  --copyright-holder "Acme Corporation" \
  --license "Proprietary" \
  --tags "manual,product,v2.1" \
  product-manual.pdf
tg-add-library-document \
  --name "Contract Template" \
  --description "Standard service agreement template" \
  --copyright-holder "Legal Department" \
  --tags "legal,contract,template" \
  contract-template.docx

Academic Paper Collection

tg-add-library-document \
  --publication-organization "IEEE" \
  --copyright-year "2024" \
  --tags "academic,ieee,conference" \
  paper1.pdf paper2.pdf paper3.pdf

Best Practices

  1. Consistent Metadata: Use standardized metadata fields for better organization
  2. Meaningful Tags: Add relevant tags for document discovery
  3. Copyright Information: Include complete copyright details for legal compliance
  4. Batch Operations: Process related documents together with shared metadata
  5. Version Control: Use clear naming and tagging for document versions
  6. Library Organization: Use collections and user assignments for multi-tenant systems

Advantages over Direct Loading

Library Benefits

  • Persistent Storage: Documents preserved in library system
  • Metadata Management: Rich metadata storage and querying
  • Processing Control: Controlled batch processing with start/stop
  • Document Lifecycle: Full document management capabilities
  • Search and Discovery: Better document organization and retrieval

When to Use Library vs Direct Loading

  • Use Library: For document management, metadata preservation, controlled processing
  • Use Direct Loading: For immediate processing, simple workflows, temporary documents