How we scoreyour machine.
Everything below runs in your browser. Nothing about your hardware is sent to a server. We modeled the math after the openly published approach behind canirun.ai, then adapted it for diffusion workloads, where the bottleneck is VRAM + memory bandwidth instead of KV-cache size.
1 · Hardware detection
We use three browser APIs to fingerprint your machine, and never transmit the result anywhere.
// WebGL — works everywhere, sometimes obfuscated
const gl = canvas.getContext("webgl2");
const dbg = gl.getExtension("WEBGL_debug_renderer_info");
const renderer = gl.getParameter(dbg.UNMASKED_RENDERER_WEBGL);
// → "ANGLE (NVIDIA, NVIDIA GeForce RTX 4090 Direct3D11 vs_5_0 ps_5_0...)"
// WebGPU — richer, when the browser supports it
const adapter = await navigator.gpu.requestAdapter();
const info = adapter.info;
// → { vendor: "nvidia", architecture: "ada-lovelace", device: "..." }
// Navigator — RAM (rounded), CPU cores
const ramGB = navigator.deviceMemory;
const cores = navigator.hardwareConcurrency;Renderer strings are matched against a database of 92 GPUs (NVIDIA RTX 50/40/30/20/GTX 10, AMD RDNA 2/3/4, Intel Arc, and 15 Apple silicon variants). Each entry has hand-curated VRAM and memory bandwidth from official datasheets.
Caveat: Safari and several privacy extensions strip the renderer string. If we can’t identify your GPU, you can pick it manually — that’s why the override dropdowns exist.
2 · VRAM math
For every model + quantization combination, we estimate the actual VRAM the model takes during inference:
The 0.5 GB constant covers CUDA/Metal context plus the inference engine itself; the 10% multiplier covers the KV cache, intermediate activations, and a safety buffer for resolution and batch growth. For diffusion models we add it on top of the base weight VRAM because the U-Net / DiT works on full-resolution feature maps.
We track three quantization tiers per model:
- FP16Native precision, max quality, max VRAM.
- FP8~½ the VRAM, near-identical quality on most models.
- Q4GGUF 4-bit. ~¼ the VRAM. Slight softness/text-rendering penalty.
3 · Fit classification
The best quant for your GPU is the largest one that still fits. “Fits” depends on whether you’re on a discrete GPU or Apple unified memory.
function fits(vramUsed, gpu) {
// Apple Silicon shares memory with the OS — leave 25% headroom
const cap = gpu.kind === "apple-silicon"
? gpu.unifiedRAM * 0.75
: gpu.vram;
const ratio = vramUsed / cap;
if (gpu.kind === "apple-silicon") {
return ratio <= 0.70 ? "fits"
: ratio <= 1.00 ? "tight"
: "no-fit";
}
return ratio <= 0.85 ? "fits"
: ratio <= 1.10 ? "tight"
: "no-fit";
}Discrete GPUs get a tighter window because dedicated VRAM has no competition from the OS. Apple chips need more headroom because the operating system, browser, and apps share the same pool.
4 · Scoring (0–100)
Every model gets a composite 0–100 score against your hardware. The weights are:
≪ ×0.65 if “tight fit” ≫
Tight-fit models get a 35% penalty because in practice they OOM at higher resolutions, longer videos, or with control adapters attached. Models that don’t fit at all are capped at 18.
5 · Speed estimation
Diffusion is bandwidth-bound: each step reads the full model weights from VRAM. So:
Efficiency captures driver and runtime overhead. We track it as a 2-D table keyed by (chip, runtime):
- NVIDIA discrete · ComfyUI: 0.70
- Apple Silicon · MPS/MLX: 0.65
- AMD discrete · ROCm/DirectML: 0.55
- Intel Arc · oneAPI: 0.45
- Apple A-series · native iOS app (Core ML / MLX): 0.55
- Apple A-series · mobile Safari WebGPU: 0.30
- Snapdragon Adreno · native Android app (QNN): 0.40
- Snapdragon · mobile Chrome WebGPU: 0.20
- MediaTek / Tensor · native app: 0.30
- MediaTek / Tensor · mobile browser: 0.15
Numbers don’t replicate exact tok/s benchmarks — they’re a relative-comparison proxy. Trust the ranking, not the absolute numbers.
6 · Mobile assumptions
Phones and tablets aren’t laptops. Three things change in the math when the active chip is an A-series, Snapdragon, MediaTek, or Tensor:
- Usable VRAMshrinks. Native apps get ~60% of the chip’s representative unified memory (the OS reserves more than on macOS, and we leave thermal headroom). Mobile browsers cap single allocations at ~3 GB regardless of how much RAM the phone has — Safari and Chrome both refuse single buffers larger than that on iOS / Android.
- Efficiency drops. iOS apps that talk to the Apple Neural Engine + Metal hit ~0.55. Browser WebGPU on iOS / Android hits 0.20–0.30. Android NNAPI / QNN coverage for diffusion is still narrow — only a handful of apps (MLC, AI Edge, Local Dream) actually use it.
- Catalogshrinks. Most desktop models can’t fit; we expose a separate list of phone-friendly entries (SD 1.5 / 2.1 mobile builds, BK-SDM Tiny, SD-Turbo, SDXL-Lightning, FLUX.1 schnell Q4) only when the active chip is mobile, and recommend on-device apps that load them.
7 · Models we track
We start from ComfyUI’s officially supported lineup, then add first-class HuggingFace models for audio and a few image/video extras with mature local-inference paths.
- 38 image, 17 video, and 4 editing models — SD 1.5 through FLUX.2, Wan 2.2 A14B, HunyuanVideo, LTX-Video.
- 12 text-to-speech models — Kokoro 82M, XTTS v2, F5-TTS, Bark, Parler, Orpheus, Sesame CSM, Spark, Fish, Kani, Chatterbox.
- 10 music generators — ACE-Step, MusicGen, Stable Audio Open, YuE, DiffRhythm, Magenta RT.
- Source mix: 46 ComfyUI-native, 35 Hugging Face (Diffusers / Transformers / vLLM).
VRAM figures come from official model cards, ComfyUI’s blog, the HF model pages, and community benchmarks. Quantized numbers reflect city96 GGUF builds and Kijai FP8 builds for diffusion; vLLM / llama.cpp builds for audio LLMs.
8 · Privacy
All detection and scoring happens client-side. No analytics, no telemetry, no API calls. You can verify by opening DevTools → Network and watching: nothing goes out.