🚧 This page needs work

Note: Placeholder page - content to be added.

Self-Hosting with OpenVINO and Intel GPU

Description

Self-hosting TrustGraph with OpenVINO on Intel GPU accelerators

Difficulty

Advanced

Duration

2 - 4 hr

You will need

Intel GPU with sufficient VRAM (e.g., Intel Arc B60 24GB)
Intel GPU drivers installed
Python 3.11+ for CLI tools
Basic command-line familiarity

Goal

Deploy TrustGraph with OpenVINO running on Intel GPU hardware for high-performance local inference.

Why Intel GPU?

Intel’s edge GPUs may not match the raw specifications of NVIDIA and AMD offerings, but they provide a compelling option for Edge AI deployments. With a lower price point and excellent compute performance per watt, Intel Arc GPUs are well-suited for scenarios where power efficiency and cost-effectiveness are priorities. Intel provides OpenVINO support for model hosting which provides a polished experience.

What is OpenVINO?

OpenVINO (Open Visual Inference and Neural Network Optimization) is an open-source toolkit developed by Intel for optimizing and deploying deep learning models. It supports inference on Intel CPUs, integrated and discrete GPUs, and NPU accelerators.

OpenVINO provides flexible model support, allowing you to use models trained with popular frameworks such as PyTorch, TensorFlow, and ONNX. It can directly integrate models from Hugging Face using Optimum Intel. The toolkit accelerates AI inference with lower latency and higher throughput while maintaining accuracy and optimizing hardware utilization.

The model: Mistral Nemo 12B

Mistral Nemo is a 12 billion parameter large language model developed by Mistral AI. It offers strong performance across a range of tasks while remaining small enough to run on consumer hardware. In this guide, we will use the 4-bit quantized version of the model, which significantly reduces memory requirements while maintaining good quality output.

Roadmap

In this guide we will:

Pull the OpenVINO model serving image
Get a Hugging Face token (if you don’t already have one)
Run some CLI commands to verify the GPU is visible
Launch the OpenVINO container and check it’s running
Build and deploy TrustGraph

Deploying

Intel GPU drivers

Intel GPUs require the standard driver package to be installed. Intel provides Edge Developer Kit Reference Scripts to assist with setup. Note that these are reference setups; your host may already have similar drivers installed depending on your acquisition path.

Hugging Face token

You will need a Hugging Face token to access the model, which will be downloaded from Hugging Face when the container starts. If you don’t already have an account, sign up at huggingface.co. Once logged in, navigate to your Access Tokens settings and create a new token. Keep this token safe as you will need it later.

Pull the OpenVINO model server image

Pull the OpenVINO model server image with GPU support:

docker pull docker.io/openvino/model_server:latest-gpu

podman pull docker.io/openvino/model_server:latest-gpu

Verify GPU is accessible

Before running the model server, verify that your Intel GPU is visible to the system.

First, check that the GPU render devices are present:

ls -l /dev/dri/render*

You should see one or more render device files, for example:

crw-rw---- 1 root render 226, 128 Jan 29 13:18 /dev/dri/renderD128

You can also use xpu-smi to check GPU status and statistics:

xpu-smi discovery

This lists available Intel GPUs. You should see output similar to:

+-----------+--------------------------------------------------------------------------------------+
| Device ID | Device Information                                                                   |
+-----------+--------------------------------------------------------------------------------------+
| 0         | Device Name: Intel(R) Arc(TM) Pro B60 Graphics                                       |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | SOC UUID: 00000000-0000-002c-0000-0000e2118086                                       |
|           | PCI BDF Address: 0000:2c:00.0                                                        |
|           | DRM Device: /dev/dri/card1                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+

To view statistics for device 0:

xpu-smi stats -d 0

For example, to check memory usage:

xpu-smi stats -d 0 | grep -E 'Memory'

| GPU Memory Temperature (C)  | 32                                                                 |
| GPU Memory Read (kB/s)      | 6420                                                               |
| GPU Memory Write (kB/s)     | 1136                                                               |
| GPU Memory Bandwidth (%)    | 0                                                                  |
| GPU Memory Used (MiB)       | 230                                                                |
| GPU Memory Util (%)         | 1                                                                  |

If these commands are not available, you may need to install the Intel GPU tools package.

Run the OpenVINO container

First, set your Hugging Face token as an environment variable:

export HF_TOKEN=your-huggingface-token

Then launch the OpenVINO model server container:

docker run --user $(id -u):$(id -g) -d \
  --device /dev/dri \
  --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) \
  --rm -p 7000:7000 \
  -v $(pwd)/models:/models:rw \
  -e HF_TOKEN=$HF_TOKEN \
  docker.io/openvino/model_server:latest-gpu \
      --source_model llmware/mistral-nemo-instruct-2407-ov \
      --model_repository_path models \
      --task text_generation \
      --rest_port 7000 \
      --target_device GPU \
      --cache_size 2

podman run --user $(id -u):$(id -g) -d \
  --device /dev/dri \
  --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) \
  --rm -p 7000:7000 \
  -v $(pwd)/models:/models:rw \
  -e HF_TOKEN=$HF_TOKEN \
  docker.io/openvino/model_server:latest-gpu \
      --source_model llmware/mistral-nemo-instruct-2407-ov \
      --model_repository_path models \
      --task text_generation \
      --rest_port 7000 \
      --target_device GPU \
      --cache_size 2

Verify OpenVINO is running

The first time you run the container, it will download the model from Hugging Face which can take several minutes depending on your connection speed.

You can check the container logs to monitor progress:

docker logs -f <container-id>

podman logs -f <container-id>

Once the server is ready, you can verify it is responding by querying the API:

curl http://localhost:7000/v3/models

Deploy TrustGraph

With the OpenVINO model server running, you can now deploy TrustGraph by following the Docker/Podman Compose deployment guide.

When configuring TrustGraph:

Select the OpenAI integration for the LLM
Before launching, set the OpenAI base URL to point to your OpenVINO server:

export OPENAI_BASE_URL=http://localhost:7000/v3

Then continue with the rest of the compose deployment guide to launch TrustGraph.