WebGPU Execution Provider
The WebGPU Execution Provider enables hardware-accelerated inference of ONNX models on a wide range of GPUs by targeting the WebGPU API. On native platforms it uses Dawn (Google’s WebGPU implementation), which in turn dispatches to D3D12, Vulkan, or Metal, depending on the platform. The same execution provider can also power the webgpu backend of ONNX Runtime Web inside the browser.
Compared to other GPU execution providers, WebGPU EP aims to be cross-vendor and cross-platform: a single build can target any GPU supported by the platform’s native graphics API, without requiring a vendor-specific SDK on the end user’s machine.
If you are using ONNX Runtime Web (JavaScript/TypeScript in the browser), see the dedicated tutorial Using the WebGPU Execution Provider for browser-specific topics such as
onnxruntime-web/webgpuimports,Tensor.fromGpuBuffer, andenv.webgpuflags. The rest of this page focuses on the native (C/C++/Python/C#) WebGPU EP.
Contents
- Install
- Requirements
- Build from source
- Usage
- Configuration options
- Graph capture
- Profiling and debugging
- Notes
- Additional resources
Install
The native WebGPU EP is distributed as a plugin EP: a shared library that is registered at runtime alongside a compatible core ONNX Runtime install.
- Python:
pip install onnxruntime onnxruntime-ep-webgpu - .NET: add a reference to
Microsoft.ML.OnnxRuntime.EP.WebGputogether with your existingMicrosoft.ML.OnnxRuntimepackage. - C/C++: consume the plugin EP shared library (
onnxruntime_providers_webgpu.{dll,so,dylib}) directly. It ships in the runtime files of the NuGet package above and can also be built from source.
For ONNX Runtime Web (browser), no separate install is required beyond the onnxruntime-web package — see the JavaScript quickstart.
Requirements
The WebGPU EP requires a GPU and driver that support one of Dawn’s native backends:
| Platform | Backend used by Dawn |
|---|---|
| Windows | Direct3D 12, Vulkan |
| Linux | Vulkan |
| Android | Vulkan |
| macOS | Metal |
| iOS | Metal |
In the browser, the underlying browser must implement WebGPU; see the WebGPU support matrix for ORT Web specifics and webgpu.io/status for browser availability.
Build from source
Use tools/ci_build/build.py with the --use_webgpu flag. See the BUILD page for the general build workflow.
# Build WebGPU plugin EP
python tools/ci_build/build.py --build_dir build/webgpu_plugin_ep --config RelWithDebInfo \
--build_shared_lib --use_webgpu shared_lib
Build flags
| Flag | Description |
|---|---|
--use_webgpu [static_lib\|shared_lib] | Enable the WebGPU EP. static_lib (the default if no value is given) builds the WebGPU EP into the main onnxruntime library. shared_lib builds the WebGPU EP as a separate plugin EP library (onnxruntime_USE_EP_API_ADAPTERS=ON) and is what produces the onnxruntime_providers_webgpu.{dll,so,dylib} shipped in the onnxruntime-ep-webgpu / Microsoft.ML.OnnxRuntime.EP.WebGpu packages. shared_lib is not supported for WebAssembly builds. Sets the CMake flag onnxruntime_USE_WEBGPU=ON. |
--use_external_dawn | Link against an externally-provided Dawn instead of building Dawn from source. Requires --use_webgpu. Sets onnxruntime_USE_EXTERNAL_DAWN=ON. |
--enable_pix_capture | Windows only. Build with PIX GPU debugger support. Requires --use_webgpu. |
Usage
The WebGPU EP is added to a session via the plugin EP APIs: the shared library is registered at runtime with register_execution_provider_library / RegisterExecutionProviderLibrary, and one or more OrtEpDevice entries are then attached to a SessionOptions. The onnxruntime-ep-webgpu (Python) and Microsoft.ML.OnnxRuntime.EP.WebGpu (.NET) packages bundle the shared library and provide helpers that return its path and the EP name to use.
See Using a Plugin Execution Provider Library for the general plugin EP workflow.
Python
import onnxruntime as ort
import onnxruntime_ep_webgpu as webgpu_ep
# Register the plugin EP library with ONNX Runtime
ort.register_execution_provider_library("webgpu_ep_registration", webgpu_ep.get_library_path())
# Discover WebGPU devices
webgpu_devices = [d for d in ort.get_ep_devices() if d.ep_name == webgpu_ep.get_ep_name()]
# Create a session using the WebGPU EP
sess_options = ort.SessionOptions()
sess_options.add_provider_for_devices(webgpu_devices, {
"preferredLayout": "NHWC",
"enableGraphCapture": "1",
})
session = ort.InferenceSession("model.onnx", sess_options=sess_options)
C# / .NET
using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.EP.WebGpu;
var env = OrtEnv.Instance();
env.RegisterExecutionProviderLibrary("webgpu_ep_registration", WebGpuEp.GetLibraryPath());
OrtEpDevice? webGpuDevice = null;
foreach (var d in env.GetEpDevices())
{
if (d.EpName == WebGpuEp.GetEpName())
{
webGpuDevice = d;
break;
}
}
using var sessionOptions = new SessionOptions();
sessionOptions.AppendExecutionProvider(env, new[] { webGpuDevice },
new Dictionary<string, string>
{
["preferredLayout"] = "NHWC",
["enableGraphCapture"] = "1",
});
using var session = new InferenceSession("model.onnx", sessionOptions);
C++
The C++ pattern is the generic plugin EP idiom — the host application is responsible for locating the onnxruntime_providers_webgpu.{dll,so,dylib} shared library (from the NuGet package’s runtime files, a manual build, etc.):
#include "onnxruntime_cxx_api.h"
Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "webgpu_sample");
env.RegisterExecutionProviderLibrary("webgpu_ep_registration",
ORT_TSTR("onnxruntime_providers_webgpu.dll"));
std::vector<Ort::ConstEpDevice> ep_devices = env.GetEpDevices();
std::vector<Ort::ConstEpDevice> selected_ep_devices{};
for (auto ep_device : ep_devices) {
if (std::strcmp(ep_device.EpName(), "WebGpuExecutionProvider") == 0) {
selected_ep_devices.push_back(ep_device);
break;
}
}
Ort::KeyValuePairs ep_options;
ep_options.Add("preferredLayout", "NHWC");
ep_options.Add("enableGraphCapture", "1");
Ort::SessionOptions session_options;
session_options.AppendExecutionProvider_V2(env, selected_ep_devices, ep_options);
Ort::Session session(env, ORT_TSTR("model.onnx"), session_options);
JavaScript / TypeScript (ONNX Runtime Web)
For browser usage, import onnxruntime-web/webgpu and list 'webgpu' in executionProviders. See Using the WebGPU Execution Provider for the complete guide, including IO binding with GPU buffers.
Configuration options
Provider options are read from the session’s config entries with the key prefix ep.webgpuexecutionprovider.*. Use the short names below (without the prefix) when passing the provider options to the plugin EP APIs shown in Usage; the prefix is added internally.
General options
| Option | Allowed values | Description |
|---|---|---|
preferredLayout | NCHW, NHWC | Preferred data layout for layout-sensitive kernels. |
enableGraphCapture | 0, 1 | Enable graph capture for models with static shapes that run entirely on WebGPU. |
enableInt64 | 0, 1 | Enable native int64 support in WGSL kernels (requires a device that exposes the matching feature). Forced on when enableGraphCapture is 1, regardless of this setting. |
multiRotaryCacheConcatOffset | non-negative integer | Offset used by multi-rotary cache concatenation kernels (advanced tuning option). |
forceCpuNodeNames | newline-separated list | Force the listed node names to run on the CPU EP fallback instead of WebGPU. Each line is one node name; empty lines are ignored. |
enablePIXCapture | 0, 1 | Enable per-run PIX captures. Only meaningful in a Windows build configured with --enable_pix_capture. |
WebGPU context options
These options configure the underlying Dawn/WebGPU instance, adapter, and device shared across sessions with the same deviceId.
| Option | Allowed values | Description |
|---|---|---|
deviceId | integer | Selects a WebGPU context to use. Sessions sharing the same deviceId share an underlying device and buffer cache. Defaults to 0. |
powerPreference | high-performance, low-power | Hint passed to requestAdapter. Defaults to the implementation default. |
webgpuInstance | pointer (as integer) | Bring-your-own WGPUInstance. Encoded as the decimal representation of the pointer value. Use when sharing a Dawn instance with the host application. |
webgpuDevice | pointer (as integer) | Bring-your-own WGPUDevice. Encoded as the decimal representation of the pointer value. Use together with webgpuInstance to share an existing device. |
preserveDevice | 0, 1 | When 1, keep the underlying WebGPU device alive after the last session that owns the context is released. Useful when sessions are created and destroyed repeatedly. |
validationMode | disabled, wgpuOnly, basic, full | Controls WGSL/runtime validation. disabled skips ONNX-side validation, wgpuOnly relies on Dawn’s validation, basic enables lightweight ONNX checks, full enables all available checks. Defaults to a build-dependent value. |
maxStorageBufferBindingSize | integer (bytes) | Override the requested maxStorageBufferBindingSize device limit. Cannot exceed the adapter’s reported limit. |
Pointer-valued options (
webgpuInstance,webgpuDevice) are parsed as base-10 integers — pass the decimal representation of the pointer value. Hex literals (e.g.0x...) are not accepted.
Buffer cache modes
The EP maintains four pooled-buffer caches. Each accepts the same set of modes:
| Option | Allowed values |
|---|---|
storageBufferCacheMode | disabled, lazyRelease, simple, bucket |
uniformBufferCacheMode | disabled, lazyRelease, simple, bucket |
queryResolveBufferCacheMode | disabled, lazyRelease, simple, bucket |
defaultBufferCacheMode | disabled, lazyRelease, simple, bucket |
Modes, in increasing order of caching aggressiveness:
disabled— buffers are destroyed immediately on release; lowest memory footprint, highest allocation overhead.lazyRelease— buffers are released at the end of the next run.simple— single free-list keyed on exact size.bucket— buffers are bucketed by power-of-two size for fast reuse with bounded waste. Recommended for inference workloads with stable shapes.
Graph capture
When a model has fully static shapes and all kernels run on WebGPU, setting enableGraphCapture to 1 records the WebGPU command sequence during an initial run and replays the recorded commands on subsequent runs, significantly reducing per-run CPU overhead.
If any kernel falls back to CPU, or any input shape changes between runs, graph capture may fail or fall back to regular execution. In that case, leave enableGraphCapture unset or set it to 0.
The same feature is exposed in ORT Web via the enableGraphCapture session option — see enableGraphCapture.
Profiling and debugging
- Generic ORT profiling. ORT’s built-in profiler (
SessionOptions::EnableProfiling) works with the WebGPU EP and produces per-op timing. - Native PIX capture (Windows). Build with
--use_webgpu --enable_pix_capture, attach PIX to the host process, and setenablePIXCaptureto1on the session to capture WebGPU/D3D12 work for individual runs. - WebGPU validation. Tune
validationModeto surface device-side issues during development; turn it down todisabledfor production benchmarking to remove validation overhead. - Browser profiling. For ORT Web, see WebGPU profiling.
Notes
WebGPU EP vs. JSEP
ONNX Runtime Web historically powered its webgpu backend through the JavaScript Execution Provider (JSEP), which is enabled at build time with --use_jsep. The native WebGPU EP (--use_webgpu) is a separate, C++-side implementation built on Dawn. Today both flags can be enabled in the same build (a transitional state); a future change is expected to make them mutually exclusive. Use --use_webgpu for native and WebAssembly+JSEP builds for the browser path that ORT Web currently ships.