tg-dump-msgpack

Reads and analyzes knowledge core files in MessagePack format for diagnostic purposes.

Synopsis

tg-dump-msgpack -i INPUT_FILE [options]

Description

The tg-dump-msgpack command is a diagnostic utility that reads knowledge core files stored in MessagePack format and outputs their contents in JSON format or provides a summary analysis. This tool is primarily used for debugging, data inspection, and understanding the structure of knowledge cores.

MessagePack is a binary serialization format that TrustGraph uses for efficient storage and transfer of knowledge graph data.

Options

Required Arguments

  • -i, --input-file FILE: Input MessagePack file to read

Optional Arguments

  • -s, --summary: Show a summary analysis of the file contents
  • -r, --records: Dump individual records in JSON format (default behavior)

Examples

Dump Records as JSON

tg-dump-msgpack -i knowledge-core.msgpack

Show Summary Analysis

tg-dump-msgpack -i knowledge-core.msgpack --summary

Save Output to File

tg-dump-msgpack -i knowledge-core.msgpack > analysis.json

Analyze Multiple Files

for file in *.msgpack; do
  echo "=== $file ==="
  tg-dump-msgpack -i "$file" --summary
  echo
done

Output Formats

Record Output (Default)

With -r or --records (default behavior), the command outputs each record as a separate JSON object:

["t", {"m": {"m": [{"s": {"v": "uri1"}, "p": {"v": "predicate"}, "o": {"v": "object"}}]}}]
["ge", {"v": [[0.1, 0.2, 0.3, ...]]}]
["de", {"metadata": {...}, "chunks": [...]}]

Summary Output

With -s or --summary, the command provides an analytical overview:

Vector dimension: 384
- NASA Challenger Report
- Technical Documentation
- Safety Engineering Guidelines

Record Types

MessagePack files may contain different types of records:

Triple Records (“t”)

RDF triples representing knowledge graph relationships:

["t", {
  "m": {
    "m": [{
      "s": {"v": "http://example.org/subject"},
      "p": {"v": "http://example.org/predicate"}, 
      "o": {"v": "object value"}
    }]
  }
}]

Graph Embeddings (“ge”)

Vector embeddings for graph entities:

["ge", {
  "v": [[0.1, 0.2, 0.3, 0.4, ...]]
}]

Document Embeddings (“de”)

Document chunk embeddings with metadata:

["de", {
  "metadata": {
    "id": "doc-123",
    "user": "trustgraph",
    "collection": "default"
  },
  "chunks": [{
    "chunk": "text content",
    "vectors": [0.1, 0.2, 0.3, ...]
  }]
}]

Use Cases

Data Inspection

# Quick peek at file structure
tg-dump-msgpack -i mystery-core.msgpack --summary

# Detailed record analysis
tg-dump-msgpack -i knowledge-core.msgpack | head -20

Debugging Knowledge Cores

# Check if file contains expected data types
tg-dump-msgpack -i core.msgpack | grep -o '^\["[^"]*"' | sort | uniq -c

# Find specific entities
tg-dump-msgpack -i core.msgpack | grep "NASA"

# Check vector dimensions
tg-dump-msgpack -i core.msgpack --summary | grep "Vector dimension"

Quality Assurance

# Validate file completeness
validate_msgpack() {
  local file="$1"
  
  echo "Validating: $file"
  
  # Check file exists and is readable
  if [ ! -r "$file" ]; then
    echo "Error: Cannot read file $file"
    return 1
  fi
  
  # Get summary
  summary=$(tg-dump-msgpack -i "$file" --summary 2>/dev/null)
  
  if [ $? -ne 0 ]; then
    echo "Error: Failed to read MessagePack file"
    return 1
  fi
  
  # Check for vector dimension (indicates embeddings present)
  if echo "$summary" | grep -q "Vector dimension:"; then
    dim=$(echo "$summary" | grep "Vector dimension:" | awk '{print $3}')
    echo "✓ Contains embeddings (dimension: $dim)"
  else
    echo "⚠ No embeddings found"
  fi
  
  # Count labels (indicates entities present)
  label_count=$(echo "$summary" | grep "^-" | wc -l)
  echo "✓ Found $label_count labeled entities"
  
  return 0
}

# Validate multiple files
for file in cores/*.msgpack; do
  validate_msgpack "$file"
done

Data Migration

# Convert MessagePack to JSON for processing
convert_to_json() {
  local input="$1"
  local output="$2"
  
  echo "Converting $input to $output..."
  tg-dump-msgpack -i "$input" > "$output"
  
  # Add array wrapper for valid JSON array
  sed -i '1i[' "$output"
  sed -i '$a]' "$output"
  sed -i 's/$/,/' "$output"
  sed -i '$s/,$//' "$output"
  
  echo "Conversion complete"
}

convert_to_json "knowledge.msgpack" "knowledge.json"

Analysis and Reporting

# Generate comprehensive analysis report
analyze_msgpack() {
  local file="$1"
  local report_file="${file%.msgpack}_analysis.txt"
  
  echo "MessagePack Analysis Report" > "$report_file"
  echo "File: $file" >> "$report_file"
  echo "Generated: $(date)" >> "$report_file"
  echo "=============================" >> "$report_file"
  echo "" >> "$report_file"
  
  # Summary information
  echo "Summary:" >> "$report_file"
  tg-dump-msgpack -i "$file" --summary >> "$report_file"
  echo "" >> "$report_file"
  
  # Record type analysis
  echo "Record Type Distribution:" >> "$report_file"
  tg-dump-msgpack -i "$file" | \
    grep -o '^\["[^"]*"' | \
    sort | uniq -c | \
    awk '{print "  " $2 ": " $1 " records"}' >> "$report_file"
  echo "" >> "$report_file"
  
  # File statistics
  file_size=$(stat -c%s "$file")
  echo "File Statistics:" >> "$report_file"
  echo "  Size: $file_size bytes" >> "$report_file"
  echo "  Size (human): $(numfmt --to=iec-i --suffix=B $file_size)" >> "$report_file"
  
  echo "Analysis saved to: $report_file"
}

# Analyze all MessagePack files
for file in *.msgpack; do
  analyze_msgpack "$file"
done

Comparative Analysis

# Compare two knowledge cores
compare_msgpack() {
  local file1="$1"
  local file2="$2"
  
  echo "Comparing MessagePack files:"
  echo "File 1: $file1"
  echo "File 2: $file2"
  echo "=========================="
  
  # Compare summaries
  echo "Summary comparison:"
  echo "File 1:"
  tg-dump-msgpack -i "$file1" --summary | sed 's/^/  /'
  echo ""
  echo "File 2:"
  tg-dump-msgpack -i "$file2" --summary | sed 's/^/  /'
  echo ""
  
  # Compare record counts
  echo "Record type comparison:"
  echo "File 1:"
  tg-dump-msgpack -i "$file1" | \
    grep -o '^\["[^"]*"' | \
    sort | uniq -c | \
    awk '{print "  " $2 ": " $1}' | \
    sort
  
  echo "File 2:"
  tg-dump-msgpack -i "$file2" | \
    grep -o '^\["[^"]*"' | \
    sort | uniq -c | \
    awk '{print "  " $2 ": " $1}' | \
    sort
}

compare_msgpack "core1.msgpack" "core2.msgpack"

Advanced Usage

Large File Processing

# Process large files in chunks
process_large_msgpack() {
  local file="$1"
  local chunk_size=1000
  
  echo "Processing large file: $file"
  
  # Count total records first
  total_records=$(tg-dump-msgpack -i "$file" | wc -l)
  echo "Total records: $total_records"
  
  # Process in chunks
  tg-dump-msgpack -i "$file" | \
    split -l $chunk_size - "chunk_"
  
  echo "Split into chunks of $chunk_size records each"
  
  # Process each chunk
  for chunk in chunk_*; do
    echo "Processing $chunk..."
    # Add your processing logic here
    wc -l "$chunk"
  done
  
  # Clean up
  rm chunk_*
}

Data Extraction

# Extract specific data types
extract_triples() {
  local file="$1"
  local output="triples.json"
  
  echo "Extracting triples from $file..."
  tg-dump-msgpack -i "$file" | \
    grep '^\["t"' > "$output"
  
  echo "Triples saved to: $output"
}

extract_embeddings() {
  local file="$1"
  local output="embeddings.json"
  
  echo "Extracting embeddings from $file..."
  tg-dump-msgpack -i "$file" | \
    grep -E '^\["(ge|de)"' > "$output"
  
  echo "Embeddings saved to: $output"
}

# Extract all data types
extract_triples "knowledge.msgpack"
extract_embeddings "knowledge.msgpack"

Integration with Other Tools

# Convert MessagePack to formats for other tools
msgpack_to_turtle() {
  local input="$1"
  local output="$2"
  
  echo "Converting MessagePack to Turtle format..."
  
  # Extract triples and convert to Turtle
  tg-dump-msgpack -i "$input" | \
    grep '^\["t"' | \
    jq -r '.[1].m.m[] | 
      "<" + .s.v + "> <" + .p.v + "> " + 
      (if .o.e then "<" + .o.v + ">" else "\"" + .o.v + "\"" end) + " ."' \
    > "$output"
  
  echo "Turtle format saved to: $output"
}

msgpack_to_turtle "knowledge.msgpack" "knowledge.ttl"

Error Handling

File Not Found

Exception: [Errno 2] No such file or directory: 'missing.msgpack'

Solution: Check file path and ensure the file exists.

Invalid MessagePack Format

Exception: Unpack failed

Solution: Verify the file is a valid MessagePack file and not corrupted.

Memory Issues with Large Files

MemoryError: Unable to allocate memory

Solution: Process large files in chunks or use streaming approaches.

Permission Errors

Exception: [Errno 13] Permission denied

Solution: Check file permissions and ensure read access.

Performance Considerations

File Size Optimization

# Check file compression efficiency
check_compression() {
  local file="$1"
  
  original_size=$(stat -c%s "$file")
  
  # Test compression
  gzip -c "$file" > "${file}.gz"
  compressed_size=$(stat -c%s "${file}.gz")
  
  ratio=$(echo "scale=2; $compressed_size * 100 / $original_size" | bc)
  
  echo "Original: $(numfmt --to=iec-i --suffix=B $original_size)"
  echo "Compressed: $(numfmt --to=iec-i --suffix=B $compressed_size)"
  echo "Compression ratio: ${ratio}%"
  
  rm "${file}.gz"
}

Processing Speed

# Time processing operations
time_msgpack_ops() {
  local file="$1"
  
  echo "Timing MessagePack operations for: $file"
  
  # Time summary generation
  echo "Summary generation:"
  time tg-dump-msgpack -i "$file" --summary > /dev/null
  
  # Time full dump
  echo "Full record dump:"
  time tg-dump-msgpack -i "$file" > /dev/null
}

Best Practices

  1. File Validation: Always validate MessagePack files before processing
  2. Memory Management: Be cautious with large files to avoid memory issues
  3. Backup: Keep backups of original MessagePack files before analysis
  4. Incremental Processing: Process large files incrementally when possible
  5. Documentation: Document the structure and content of your MessagePack files
  6. Version Control: Track changes in MessagePack file formats over time

Troubleshooting

Corrupted Files

# Test file integrity
if tg-dump-msgpack -i "test.msgpack" --summary > /dev/null 2>&1; then
  echo "File appears valid"
else
  echo "File may be corrupted"
fi

Empty or Incomplete Files

# Check for empty files
if [ ! -s "test.msgpack" ]; then
  echo "File is empty"
fi

# Check record count
record_count=$(tg-dump-msgpack -i "test.msgpack" 2>/dev/null | wc -l)
echo "Records found: $record_count"

Format Issues

# Validate JSON output
tg-dump-msgpack -i "test.msgpack" | head -1 | jq . > /dev/null
if [ $? -eq 0 ]; then
  echo "JSON output is valid"
else
  echo "JSON output may be malformed"
fi