Can I Generate? logoCan I Generate?
Methodology

How we scoreyour machine.

Everything below runs in your browser. Nothing about your hardware is sent to a server. We modeled the math after the openly published approach behind canirun.ai, then adapted it for diffusion workloads, where the bottleneck is VRAM + memory bandwidth instead of KV-cache size.

1 · Hardware detection

We use three browser APIs to fingerprint your machine, and never transmit the result anywhere.

// WebGL — works everywhere, sometimes obfuscated
const gl = canvas.getContext("webgl2");
const dbg = gl.getExtension("WEBGL_debug_renderer_info");
const renderer = gl.getParameter(dbg.UNMASKED_RENDERER_WEBGL);
//   → "ANGLE (NVIDIA, NVIDIA GeForce RTX 4090 Direct3D11 vs_5_0 ps_5_0...)"

// WebGPU — richer, when the browser supports it
const adapter = await navigator.gpu.requestAdapter();
const info = adapter.info;
//   → { vendor: "nvidia", architecture: "ada-lovelace", device: "..." }

// Navigator — RAM (rounded), CPU cores
const ramGB = navigator.deviceMemory;
const cores = navigator.hardwareConcurrency;

Renderer strings are matched against a database of 92 GPUs (NVIDIA RTX 50/40/30/20/GTX 10, AMD RDNA 2/3/4, Intel Arc, and 15 Apple silicon variants). Each entry has hand-curated VRAM and memory bandwidth from official datasheets.

Caveat: Safari and several privacy extensions strip the renderer string. If we can’t identify your GPU, you can pick it manually — that’s why the override dropdowns exist.

2 · VRAM math

For every model + quantization combination, we estimate the actual VRAM the model takes during inference:

VRAM(GB) = params × bits ÷ 8 ÷ 1024³ + 0.5 GB runtime + 10% KV/safety

The 0.5 GB constant covers CUDA/Metal context plus the inference engine itself; the 10% multiplier covers the KV cache, intermediate activations, and a safety buffer for resolution and batch growth. For diffusion models we add it on top of the base weight VRAM because the U-Net / DiT works on full-resolution feature maps.

We track three quantization tiers per model:

  • FP16Native precision, max quality, max VRAM.
  • FP8~½ the VRAM, near-identical quality on most models.
  • Q4GGUF 4-bit. ~¼ the VRAM. Slight softness/text-rendering penalty.

3 · Fit classification

The best quant for your GPU is the largest one that still fits. “Fits” depends on whether you’re on a discrete GPU or Apple unified memory.

function fits(vramUsed, gpu) {
  // Apple Silicon shares memory with the OS — leave 25% headroom
  const cap = gpu.kind === "apple-silicon"
    ? gpu.unifiedRAM * 0.75
    : gpu.vram;
  const ratio = vramUsed / cap;

  if (gpu.kind === "apple-silicon") {
    return ratio <= 0.70 ? "fits"
         : ratio <= 1.00 ? "tight"
         : "no-fit";
  }
  return ratio <= 0.85 ? "fits"
       : ratio <= 1.10 ? "tight"
       : "no-fit";
}

Discrete GPUs get a tighter window because dedicated VRAM has no competition from the OS. Apple chips need more headroom because the operating system, browser, and apps share the same pool.

4 · Scoring (0–100)

Every model gets a composite 0–100 score against your hardware. The weights are:

Speed
55%
Bandwidth-bound throughput at the chosen quant
Memory headroom
35%
More room = stable at high res / long videos
Quality bonus
10%
Logarithmic bonus for larger model parameter counts
score = 0.55·speed + 0.35·headroom + 0.10·quality
≪ ×0.65 if “tight fit” ≫

Tight-fit models get a 35% penalty because in practice they OOM at higher resolutions, longer videos, or with control adapters attached. Models that don’t fit at all are capped at 18.

5 · Speed estimation

Diffusion is bandwidth-bound: each step reads the full model weights from VRAM. So:

steps/s ≈ bandwidth(GB/s) ÷ model_VRAM(GB) × efficiency

Efficiency captures driver and runtime overhead. We track it as a 2-D table keyed by (chip, runtime):

  • NVIDIA discrete · ComfyUI: 0.70
  • Apple Silicon · MPS/MLX: 0.65
  • AMD discrete · ROCm/DirectML: 0.55
  • Intel Arc · oneAPI: 0.45
  • Apple A-series · native iOS app (Core ML / MLX): 0.55
  • Apple A-series · mobile Safari WebGPU: 0.30
  • Snapdragon Adreno · native Android app (QNN): 0.40
  • Snapdragon · mobile Chrome WebGPU: 0.20
  • MediaTek / Tensor · native app: 0.30
  • MediaTek / Tensor · mobile browser: 0.15

Numbers don’t replicate exact tok/s benchmarks — they’re a relative-comparison proxy. Trust the ranking, not the absolute numbers.

6 · Mobile assumptions

Phones and tablets aren’t laptops. Three things change in the math when the active chip is an A-series, Snapdragon, MediaTek, or Tensor:

  • Usable VRAMshrinks. Native apps get ~60% of the chip’s representative unified memory (the OS reserves more than on macOS, and we leave thermal headroom). Mobile browsers cap single allocations at ~3 GB regardless of how much RAM the phone has — Safari and Chrome both refuse single buffers larger than that on iOS / Android.
  • Efficiency drops. iOS apps that talk to the Apple Neural Engine + Metal hit ~0.55. Browser WebGPU on iOS / Android hits 0.20–0.30. Android NNAPI / QNN coverage for diffusion is still narrow — only a handful of apps (MLC, AI Edge, Local Dream) actually use it.
  • Catalogshrinks. Most desktop models can’t fit; we expose a separate list of phone-friendly entries (SD 1.5 / 2.1 mobile builds, BK-SDM Tiny, SD-Turbo, SDXL-Lightning, FLUX.1 schnell Q4) only when the active chip is mobile, and recommend on-device apps that load them.

7 · Models we track

We start from ComfyUI’s officially supported lineup, then add first-class HuggingFace models for audio and a few image/video extras with mature local-inference paths.

  • 38 image, 17 video, and 4 editing models — SD 1.5 through FLUX.2, Wan 2.2 A14B, HunyuanVideo, LTX-Video.
  • 12 text-to-speech models — Kokoro 82M, XTTS v2, F5-TTS, Bark, Parler, Orpheus, Sesame CSM, Spark, Fish, Kani, Chatterbox.
  • 10 music generators — ACE-Step, MusicGen, Stable Audio Open, YuE, DiffRhythm, Magenta RT.
  • Source mix: 46 ComfyUI-native, 35 Hugging Face (Diffusers / Transformers / vLLM).

VRAM figures come from official model cards, ComfyUI’s blog, the HF model pages, and community benchmarks. Quantized numbers reflect city96 GGUF builds and Kijai FP8 builds for diffusion; vLLM / llama.cpp builds for audio LLMs.

8 · Privacy

All detection and scoring happens client-side. No analytics, no telemetry, no API calls. You can verify by opening DevTools → Network and watching: nothing goes out.

Now you know how it works.
Score my machine