🚧 This page needs work
Note: Placeholder page - content to be added.
Self-Hosting with OpenVINO and Intel GPU
Self-hosting TrustGraph with OpenVINO on Intel GPU accelerators
Advanced
2 - 4 hr
- Intel GPU with sufficient VRAM (e.g., Intel Arc B60 24GB)
- Intel GPU drivers installed
- Python 3.11+ for CLI tools
- Basic command-line familiarity
Deploy TrustGraph with OpenVINO running on Intel GPU hardware for high-performance local inference.
Why Intel GPU?
Intel’s edge GPUs may not match the raw specifications of NVIDIA and AMD offerings, but they provide a compelling option for Edge AI deployments. With a lower price point and excellent compute performance per watt, Intel Arc GPUs are well-suited for scenarios where power efficiency and cost-effectiveness are priorities. Intel provides OpenVINO support for model hosting which provides a polished experience.
What is OpenVINO?
OpenVINO (Open Visual Inference and Neural Network Optimization) is an open-source toolkit developed by Intel for optimizing and deploying deep learning models. It supports inference on Intel CPUs, integrated and discrete GPUs, and NPU accelerators.
OpenVINO provides flexible model support, allowing you to use models trained with popular frameworks such as PyTorch, TensorFlow, and ONNX. It can directly integrate models from Hugging Face using Optimum Intel. The toolkit accelerates AI inference with lower latency and higher throughput while maintaining accuracy and optimizing hardware utilization.
The model: Mistral Nemo 12B
Mistral Nemo is a 12 billion parameter large language model developed by Mistral AI. It offers strong performance across a range of tasks while remaining small enough to run on consumer hardware. In this guide, we will use the 4-bit quantized version of the model, which significantly reduces memory requirements while maintaining good quality output.
Roadmap
In this guide we will:
- Pull the OpenVINO model serving image
- Get a Hugging Face token (if you don’t already have one)
- Run some CLI commands to verify the GPU is visible
- Launch the OpenVINO container and check it’s running
- Build and deploy TrustGraph
Deploying
Intel GPU drivers
Intel GPUs require the standard driver package to be installed. Intel provides Edge Developer Kit Reference Scripts to assist with setup. Note that these are reference setups; your host may already have similar drivers installed depending on your acquisition path.
Hugging Face token
You will need a Hugging Face token to access the model, which will be downloaded from Hugging Face when the container starts. If you don’t already have an account, sign up at huggingface.co. Once logged in, navigate to your Access Tokens settings and create a new token. Keep this token safe as you will need it later.
Pull the OpenVINO model server image
Pull the OpenVINO model server image with GPU support:
docker pull docker.io/openvino/model_server:latest-gpu
podman pull docker.io/openvino/model_server:latest-gpu
Verify GPU is accessible
Before running the model server, verify that your Intel GPU is visible to the system.
First, check that the GPU render devices are present:
ls -l /dev/dri/render*
You should see one or more render device files, for example:
crw-rw---- 1 root render 226, 128 Jan 29 13:18 /dev/dri/renderD128
You can also use xpu-smi to check GPU status and statistics:
xpu-smi discovery
This lists available Intel GPUs. You should see output similar to:
+-----------+--------------------------------------------------------------------------------------+
| Device ID | Device Information |
+-----------+--------------------------------------------------------------------------------------+
| 0 | Device Name: Intel(R) Arc(TM) Pro B60 Graphics |
| | Vendor Name: Intel(R) Corporation |
| | SOC UUID: 00000000-0000-002c-0000-0000e2118086 |
| | PCI BDF Address: 0000:2c:00.0 |
| | DRM Device: /dev/dri/card1 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
To view statistics for device 0:
xpu-smi stats -d 0
For example, to check memory usage:
xpu-smi stats -d 0 | grep -E 'Memory'
| GPU Memory Temperature (C) | 32 |
| GPU Memory Read (kB/s) | 6420 |
| GPU Memory Write (kB/s) | 1136 |
| GPU Memory Bandwidth (%) | 0 |
| GPU Memory Used (MiB) | 230 |
| GPU Memory Util (%) | 1 |
If these commands are not available, you may need to install the Intel GPU tools package.
Run the OpenVINO container
First, set your Hugging Face token as an environment variable:
export HF_TOKEN=your-huggingface-token
Then launch the OpenVINO model server container:
docker run --user $(id -u):$(id -g) -d \
--device /dev/dri \
--group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) \
--rm -p 7000:7000 \
-v $(pwd)/models:/models:rw \
-e HF_TOKEN=$HF_TOKEN \
docker.io/openvino/model_server:latest-gpu \
--source_model llmware/mistral-nemo-instruct-2407-ov \
--model_repository_path models \
--task text_generation \
--rest_port 7000 \
--target_device GPU \
--cache_size 2
podman run --user $(id -u):$(id -g) -d \
--device /dev/dri \
--group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) \
--rm -p 7000:7000 \
-v $(pwd)/models:/models:rw \
-e HF_TOKEN=$HF_TOKEN \
docker.io/openvino/model_server:latest-gpu \
--source_model llmware/mistral-nemo-instruct-2407-ov \
--model_repository_path models \
--task text_generation \
--rest_port 7000 \
--target_device GPU \
--cache_size 2
Verify OpenVINO is running
The first time you run the container, it will download the model from Hugging Face which can take several minutes depending on your connection speed.
You can check the container logs to monitor progress:
docker logs -f <container-id>
podman logs -f <container-id>
Once the server is ready, you can verify it is responding by querying the API:
curl http://localhost:7000/v3/models
Deploy TrustGraph
With the OpenVINO model server running, you can now deploy TrustGraph by following the Docker/Podman Compose deployment guide.
When configuring TrustGraph:
- Select the OpenAI integration for the LLM
- Before launching, set the OpenAI base URL to point to your OpenVINO server:
export OPENAI_BASE_URL=http://localhost:7000/v3
Then continue with the rest of the compose deployment guide to launch TrustGraph.