Benchmarks

Data ingest token rates

Why?

This benchmark measures the token load rate while loading a standard document. TrustGraph knowledge extraction build a big backlog of stuff to process, so it’s a good load test. In reality, the knowledge extraction (definitions and relationships) is the bottleneck, so this test is largely measuring:

  • The LLM service itself
  • How well TrustGraph can load the LLM service.
  • The configuration that we’re testing with

Results

Platform GPU Model Config Token Rate Time to Process
vLLM on Intel Gaudi 2 🏆 Gaudi 2, 8 cards meta-llama/Llama-3.3-70B-Instruct TC1 In: 1493.6
Out: 1545.8
Total: 3039.5
8.5 min
vllm-server on NVidia H100-SXM5-80GB (Tensordock) TheBloke/Mistral-7B-v0.1-AWQ TC1 In: 304.3
Out: 1845.6
Total: 2150.0
12.0 min
VertexAI n/a Gemini 2.0 Flash TC1 In: 216.2
Out: 155.8
Total: 372.0
69.4 min
LMStudio Radeon RX 7900 XTX Gemma3 4B QAT TC1 In: 116.2
Out: 133.9
Total: 250.1
103.3 min
Granite Ridge 128 Xeon Gen 6 CPU mistralai/Mistral-7B-Instruct-v0.3 TC1 In: 117.7
Out: 90.0
Total: 207.8
124.3 min
LMStudio Radeon RX 7900 XTX Gemma2 9B TC1 In: 119.6
Out: 73.0
Total: 192.6
134.1 min
Granite Ridge 128 Xeon Gen 6 CPU meta-llama/Llama-3.3-70B-Instruct TC1 In: 67.0
Out: 22.4
Total: 89.3
289.3 min

Test Configurations

Config ID Flow Source Material
TC1 document-rag+graph-rag NASA Challenger Report Volume 1 (1,549,890 tokens)

Procedure

How it works:

  • Start TrustGraph
  • Load the Challenger report volume 1
  • Submit with the default flow (document RAG + graph RAG)
  • Monitor the throughput in Grafana.
    • With this document expect an early boost from the content and preface pages, and then throughput plateaus out. So leaving for a couple of minutes lets this phase go away
    • Some cloud model-as-a-service facilities will let you have some high rate action as a boost, and then reduce the rate so again, let this phase subside. important to look at the pub/sub backlog chart so you can see when things have settled in.
  • Start tg-show-token-rate
  • Wait for it to finish, runs for 1 minute
  • Record the last line produced which is an average across the period.