OptimizedLLM

Will it run on your machine?

Pick your hardware. See every local model that fits, the quant to use, and the one actually worth running.

Models
How it works

For each model and quant, the estimate is weights + KV cache + overhead, compared against the memory your hardware can actually give the model.

These are close estimates, not exact VRAM accounting. We model the common K-quants; the IQ-quant family packs models even smaller at very low bit-rates. Architecture values are best-effort, sliding-window models like Gemma are flagged on their own cards since their long-context cache is smaller than shown, and real usage shifts with your runtime and settings. The model list is current as of June 2026, and this space moves fast. When it says tight, leave yourself a little headroom.

You know those sites you use once and never open again? Yeah, this is one of those. I built it for fun. If it saved you a little time, you can buy me a coffee.