Image
Stable Cascade Stability AI · 5.9B · 2024
VRAM headroom Infinity% used · ~0.0 steps/s proxy
3-stage cascade (Würstchen architecture). Strong prompt adherence, lighter than its 5.9B suggests.
1024×1024 13 GB disk 16 GB RAM ✓ Stability AI Non-Commercial
Image
Z-Image Turbo Tongyi Lab · 6B · 2025
VRAM headroom Infinity% used · ~0.0 steps/s proxy
Distilled few-step model. FP16 fits comfortably in 16GB. Apache-licensed.
1024×1024 fast 14 GB disk 16 GB RAM ✓ Apache 2.0
Image ★ Most popular community model
Stable Diffusion XL Stability AI · 3.5B · 2023
Too heavy Q4 GGUF · ~3.5 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
The workhorse. Sharp 1024px output, huge fine-tune ecosystem (Pony, Illustrious, Juggernaut).
1024×1024 7 GB disk 16 GB RAM ✓ CreativeML Open RAIL++-M
Image
SDXL Turbo Stability AI · 3.5B · 2023
VRAM headroom Infinity% used · ~0.0 steps/s proxy
1-step distilled SDXL. Generates in under a second on midrange GPUs. Lower fidelity than full SDXL.
512×512 real-time 7 GB disk 16 GB RAM ✓ Stability AI Non-Commercial
Edit
OmniGen 2 BAAI · 3.8B · 2025
VRAM headroom Infinity% used · ~0.0 steps/s proxy
Unified text-to-image and image-editing model. One model handles generate, edit, compose.
1024×1024 11 GB disk 16 GB RAM ✓ MIT
Video
CogVideoX-5B THUDM (Tsinghua) · 5B · 2024
Too heavy Q4 GGUF · ~5.5 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
5B T2V/I2V from Tsinghua. Mid-range hardware target. 6-second clips at 720p.
720p · 6s 16 GB disk 16 GB RAM ✓ CogVideoX (open)
Image
Stable Diffusion 3 Medium Stability AI · 2B · 2024
VRAM headroom Infinity% used · ~0.0 steps/s proxy
MMDiT architecture with triple text encoders. Better text rendering than SDXL.
1024×1024 11 GB disk 16 GB RAM ✓ Stability AI Community
Image
Stable Diffusion 3.5 Medium Stability AI · 2.5B · 2024
VRAM headroom Infinity% used · ~0.0 steps/s proxy
Improved 3.5 generation with better composition and aesthetics than SD3 Medium.
1024×1024 12.5 GB disk 16 GB RAM ✓ Stability AI Community
Image
Lumina Image 2.0 Alpha-VLLM · 2.6B · 2025
VRAM headroom Infinity% used · ~0.0 steps/s proxy
Compact next-gen DiT with strong photographic realism. Apache-licensed for commercial use.
1024×1024 12 GB disk 16 GB RAM ✓ Apache 2.0
Video
LTX-Video Lightricks · 2B · 2024
VRAM headroom Infinity% used · ~0.0 steps/s proxy
Realtime-capable video model. Generates faster than playback on a 4090. The speed champion.
768×512 real-time 11 GB disk 16 GB RAM ✓ Lightricks Open License
Image
Hunyuan-DiT Tencent · 1.5B · 2024
VRAM headroom Infinity% used · ~0.0 steps/s proxy
Bilingual (Chinese + English) DiT. Strong at Chinese text rendering, light to run.
1024×1024 11 GB disk 16 GB RAM ✓ Tencent Hunyuan Community
Video
Stable Video Diffusion Stability AI · 1.5B · 2023
VRAM headroom Infinity% used · ~0.0 steps/s proxy
Image-to-video, ~14–25 frames at 576×1024. The original consumer-friendly local video model.
576×1024 · 25 frames 10 GB disk 16 GB RAM ✓ Stability AI Non-Commercial
Video
Wan 2.1 T2V 1.3B Alibaba · 1.3B · 2025
VRAM headroom Infinity% used · ~0.0 steps/s proxy
Tiny T2V. The most compatible video model — fits on 8GB GPUs. Apache-licensed.
480p 9 GB disk 16 GB RAM ✓ Apache 2.0
Image
Stable Diffusion 1.5 Stability AI / RunwayML · 0.86B · 2022
VRAM headroom Infinity% used · ~0.0 steps/s proxy
The classic. Tiny, fast, runs on almost anything. Massive ecosystem of LoRAs and fine-tunes.
512×512 very fast 4.5 GB disk 8 GB RAM ✓ CreativeML Open RAIL-M
Image
Stable Diffusion 2.1 Stability AI · 0.86B · 2022
VRAM headroom Infinity% used · ~0.0 steps/s proxy
Successor to SD 1.5 with native 768px training. Smaller community than 1.5 but still light on hardware.
768×768 very fast 5.5 GB disk 8 GB RAM ✓ CreativeML Open RAIL++-M
Image
PixArt-Σ PixArt-α team · 0.6B · 2024
VRAM headroom Infinity% used · ~0.0 steps/s proxy
Featherweight 0.6B DiT with T5-XXL encoder. Beautiful 4K output for the param count.
Up to 4K 11.5 GB disk 16 GB RAM ✓ OpenRAIL++
Image
Stable Diffusion 3.5 Large Stability AI · 8B · 2024
Too heavy Q4 GGUF · ~8.0 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
Stability's flagship 8B MMDiT. Sharp text, strong composition. Triple text encoder (CLIP-L, CLIP-G, T5-XXL).
1024×1024 20 GB disk RAM tight · −15 Stability AI Community
Image
AuraFlow v0.3 Fal.ai · 6.8B · 2024
VRAM headroom Infinity% used · ~0.0 steps/s proxy
Truly open-source flow-matching model. The largest fully Apache-licensed image model.
1024×1024 16 GB disk RAM tight · −15 Apache 2.0
Image
FLUX.1 schnell Black Forest Labs · 12B · 2024
Too heavy Q4 GGUF · ~7.0 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
4-step distilled FLUX. The fastest way to get FLUX-tier quality. Apache-licensed.
1024×1024 fast 33 GB disk RAM tight · −15 Apache 2.0
Image ★ Flagship FLUX
FLUX.1 dev Black Forest Labs · 12B · 2024
Too heavy Q4 GGUF · ~7.0 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
The model to beat. State-of-the-art prompt adherence and photorealism. Hungry but worth it.
1024×1024 33 GB disk RAM tight · −15 FLUX.1 Non-Commercial
Edit
FLUX.1 Kontext dev Black Forest Labs · 12B · 2025
Too heavy Q4 GGUF · ~7.5 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
Image editing variant of FLUX. Reference image + prompt → edited result.
1024×1024 33 GB disk RAM tight · −15 FLUX.1 Non-Commercial
Image
HiDream-I1 HiDream-ai · 17B (8.5B active) · 2025
Too heavy Q4 GGUF · ~11.0 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
Hybrid DiT + MoE. Beats Flux on several benchmarks. MIT-licensed for any use.
1024×1024 38 GB disk RAM tight · −15 MIT
Image
Qwen-Image Alibaba · 20B · 2025
Too heavy Q4 GGUF · ~14.0 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
20B MMDiT from Alibaba. Best-in-class text rendering — handles paragraphs of text in images.
1328×1328 45 GB disk RAM low · −25 Apache 2.0
Edit
Qwen-Image-Edit Alibaba · 20B · 2025
Too heavy Q4 GGUF · ~14.0 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
Edit variant of Qwen-Image. Replace, add, restyle objects via natural prompts.
1328×1328 45 GB disk RAM low · −25 Apache 2.0
Image
Hunyuan Image 2.1 Tencent · 17B · 2025
Too heavy Q4 GGUF · ~11.0 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
Tencent's 17B image flagship. Strong at composition and Chinese text.
1024×1024 36 GB disk RAM tight · −15 Tencent Hunyuan Community
Image ★ 84B MoE giant
Hunyuan Image 3 Tencent · 84B (13B active) · 2025
Too heavy Q4 GGUF · ~32.0 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
Massive 84B MoE image model with 13B active. Quality rivals closed-source flagships. Needs serious hardware.
1024×1024 180 GB disk RAM low · −25 Tencent Hunyuan Community
Image ★ Next-gen FLUX
FLUX.2 dev Black Forest Labs · 32B · 2025
Too heavy Q4 GGUF · ~22.0 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
Successor to FLUX.1 dev. Higher fidelity, longer context. Heavy hardware required.
1024×1024 72 GB disk RAM low · −25 FLUX.2 Non-Commercial
Image
ERNIE-Image Baidu · 10B · 2025
VRAM headroom Infinity% used · ~0.0 steps/s proxy
Baidu's 10B image model. Strong at Chinese prompts and stylized output.
1024×1024 22 GB disk RAM tight · −15 Baidu Community
Edit
HiDream-E1.1 HiDream-ai · 17B (8.5B active) · 2025
Too heavy Q4 GGUF · ~11.0 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
Editing variant of HiDream-I1. Same MoE backbone tuned for image editing.
1024×1024 36 GB disk RAM tight · −15 MIT
Video
Wan 2.1 T2V 14B Alibaba · 14B · 2025
Too heavy Q4 GGUF · ~12.0 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
Full-size Wan T2V. Strong motion, 720p output. Quantization makes it viable down to 12GB.
720p 30 GB disk RAM tight · −15 Apache 2.0
Video
Wan 2.1 I2V 14B Alibaba · 14B · 2025
Too heavy Q4 GGUF · ~12.0 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
Image-to-video flagship. Best open I2V motion in early 2025.
720p 30 GB disk RAM tight · −15 Apache 2.0
Video ★ Best open video < 8B
Wan 2.2 TI2V 5B Alibaba · 5B · 2025
Too heavy Q4 GGUF · ~7.0 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
Unified text + image to video. Best-in-class T2V/I2V at 5B. Fits on a 24GB GPU at FP16.
720p 13 GB disk RAM tight · −15 Apache 2.0
Video
Wan 2.2 T2V A14B (MoE) Alibaba · 27B (14B active) · 2025
Too heavy Q4 GGUF · ~18.0 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
MoE flagship — 27B total / 14B active. Top open-video quality.
720p 56 GB disk RAM low · −25 Apache 2.0
Video
LTX-2 Lightricks · 8B · 2026
Too heavy Distilled FP8 · ~12.0 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
Successor to LTX-Video. 4K-capable on high-end hardware. Distilled variants run on 12GB.
1080p–4K 28 GB disk RAM tight · −15 Lightricks Open License
Video
HunyuanVideo Tencent · 13B · 2024
Too heavy Q4 GGUF · ~12.0 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
13B T2V — closest open rival to Sora at release. Slow but cinematic.
720p 50 GB disk RAM tight · −15 Tencent Hunyuan Community
Video ★ Consumer-friendly Hunyuan
HunyuanVideo 1.5 Tencent · 8.3B · 2025
Too heavy Q4 GGUF · ~9.0 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
Lighter, sharper successor. 8.3B params, 14GB minimum, runs on consumer GPUs.
720p 22 GB disk RAM tight · −15 Tencent Hunyuan Community
Video
Mochi 1 Genmo · 10B · 2024
Too heavy Q4 GGUF · ~9.0 GB VRAM headroom Infinity% used · ~0.0 steps/s proxy
10B asymmetric DiT. Ambitious motion. Apache-licensed for any use.
480p–720p 22 GB disk RAM tight · −15 Apache 2.0
Video
Pyramid Flow Pyramid Flow team · 2B · 2024
VRAM headroom Infinity% used · ~0.0 steps/s proxy
Pyramidal flow-matching for efficient long videos. Strong T2V quality at 2B.
768p · 10s 11 GB disk RAM tight · −15 MIT