Benchmarks

Data ingest token rates

Why?

This benchmark measures the token load rate while loading a standard document. TrustGraph knowledge extraction build a big backlog of stuff to process, so it’s a good load test. In reality, the knowledge extraction (definitions and relationships) is the bottleneck, so this test is largely measuring:

The LLM service itself
How well TrustGraph can load the LLM service.
The configuration that we’re testing with

Results

Platform	GPU	Model	Config	Token Rate	Time to Process
vLLM on Intel Gaudi 2 🏆	Gaudi 2, 8 cards	meta-llama/Llama-3.3-70B-Instruct	TC1	In: 1493.6 Out: 1545.8 Total: 3039.5	8.5 min
vllm-server on NVidia	H100-SXM5-80GB (Tensordock)	TheBloke/Mistral-7B-v0.1-AWQ	TC1	In: 304.3 Out: 1845.6 Total: 2150.0	12.0 min
VertexAI	n/a	Gemini 2.0 Flash	TC1	In: 216.2 Out: 155.8 Total: 372.0	69.4 min
LMStudio	Radeon RX 7900 XTX	Gemma3 4B QAT	TC1	In: 116.2 Out: 133.9 Total: 250.1	103.3 min
Granite Ridge	128 Xeon Gen 6 CPU	mistralai/Mistral-7B-Instruct-v0.3	TC1	In: 117.7 Out: 90.0 Total: 207.8	124.3 min
LMStudio	Radeon RX 7900 XTX	Gemma2 9B	TC1	In: 119.6 Out: 73.0 Total: 192.6	134.1 min
Granite Ridge	128 Xeon Gen 6 CPU	meta-llama/Llama-3.3-70B-Instruct	TC1	In: 67.0 Out: 22.4 Total: 89.3	289.3 min

Test Configurations

Config ID	Flow	Source Material
TC1	document-rag+graph-rag	NASA Challenger Report Volume 1 (1,549,890 tokens)

Procedure

How it works:

Start TrustGraph
Load the Challenger report volume 1
Submit with the default flow (document RAG + graph RAG)
Monitor the throughput in Grafana.
- With this document expect an early boost from the content and preface pages, and then throughput plateaus out. So leaving for a couple of minutes lets this phase go away
- Some cloud model-as-a-service facilities will let you have some high rate action as a boost, and then reduce the rate so again, let this phase subside. important to look at the pub/sub backlog chart so you can see when things have settled in.
Start tg-show-token-rate
Wait for it to finish, runs for 1 minute
Record the last line produced which is an average across the period.