Benchmarks
Data ingest token rates
Why?
This benchmark measures the token load rate while loading a standard document. TrustGraph knowledge extraction build a big backlog of stuff to process, so it’s a good load test. In reality, the knowledge extraction (definitions and relationships) is the bottleneck, so this test is largely measuring:
- The LLM service itself
- How well TrustGraph can load the LLM service.
- The configuration that we’re testing with
Results
Platform | GPU | Model | Config | Token Rate | Time to Process |
---|---|---|---|---|---|
vLLM on Intel Gaudi 2 🏆 | Gaudi 2, 8 cards | meta-llama/Llama-3.3-70B-Instruct | TC1 | In: 1493.6 Out: 1545.8 Total: 3039.5 | 8.5 min |
vllm-server on NVidia | H100-SXM5-80GB (Tensordock) | TheBloke/Mistral-7B-v0.1-AWQ | TC1 | In: 304.3 Out: 1845.6 Total: 2150.0 | 12.0 min |
VertexAI | n/a | Gemini 2.0 Flash | TC1 | In: 216.2 Out: 155.8 Total: 372.0 | 69.4 min |
LMStudio | Radeon RX 7900 XTX | Gemma3 4B QAT | TC1 | In: 116.2 Out: 133.9 Total: 250.1 | 103.3 min |
Granite Ridge | 128 Xeon Gen 6 CPU | mistralai/Mistral-7B-Instruct-v0.3 | TC1 | In: 117.7 Out: 90.0 Total: 207.8 | 124.3 min |
LMStudio | Radeon RX 7900 XTX | Gemma2 9B | TC1 | In: 119.6 Out: 73.0 Total: 192.6 | 134.1 min |
Granite Ridge | 128 Xeon Gen 6 CPU | meta-llama/Llama-3.3-70B-Instruct | TC1 | In: 67.0 Out: 22.4 Total: 89.3 | 289.3 min |
Test Configurations
Config ID | Flow | Source Material |
---|---|---|
TC1 | document-rag+graph-rag | NASA Challenger Report Volume 1 (1,549,890 tokens) |
Procedure
How it works:
- Start TrustGraph
- Load the Challenger report volume 1
- Submit with the default flow (document RAG + graph RAG)
- Monitor the throughput in Grafana.
- With this document expect an early boost from the content and preface pages, and then throughput plateaus out. So leaving for a couple of minutes lets this phase go away
- Some cloud model-as-a-service facilities will let you have some high rate action as a boost, and then reduce the rate so again, let this phase subside. important to look at the pub/sub backlog chart so you can see when things have settled in.
- Start tg-show-token-rate
- Wait for it to finish, runs for 1 minute
- Record the last line produced which is an average across the period.