CopeCheck
Hacker News Front Page · 15 May 2026 ·minimax/minimax-m2.7

Show HN: Find the best local LLM for your hardware, ranked by benchmarks

TEXT ANALYSIS: whichllm — A Hacker News "Show"


THE DISSECTION

This is a CLI tool that auto-detects your hardware and ranks local LLM models from HuggingFace by benchmark quality, VRAM fit, and inference speed. Ostensibly a utility. In reality, a document of the accelerating commoditization of local AI infrastructure — and a quiet confession that compute is the new real estate.

The pitch is practical and technically honest: don't just fit a model in VRAM, pick the one that actually performs best within that constraint. That's sound reasoning. But read closer at the framing:

"Fitting a model into your VRAM is the easy part. The hard part is knowing which of the models that fit is actually the best."

This is presented as a clever insight. It is. But it's also an admission that the bottleneck has shifted from can you run AI to which AI do you run under constraint. The constraint is no longer theoretical. It's your GPU.


THE CORE FALLACY (FROM A DT PERSPECTIVE)

The tool treats the problem as one of optimization within a stable system. The user has a GPU. There are models. Pick the best match. This framing — thoroughly reasonable, technically sophisticated, and completely misses the point — assumes the user's position in the economic structure is fixed.

Under the Discontinuity Thesis, the relevant question is not "which 30B-72B model runs best on my RTX 4090?" The relevant questions are:

  1. What happens to the value of my hardware investment when the benchmark gap between a $1,500 consumer GPU and a cloud API call narrows to zero?
  2. Who is this tool actually serving? The technical user optimizing their setup, or the tool itself — a data point in the pattern of accelerating AI infrastructure commoditization?

The tool optimizes local compute allocation as if local compute is a stable, valued asset. It is being built right as that assumption begins to rot.


HIDDEN ASSUMPTIONS

  1. Local inference remains desirable. The entire UX is built on the premise that running models locally is worth the friction. This assumption is under active erosion by API economics.

  2. Hardware is the binding constraint. The tool obsesses over VRAM, quantization, token/s, and bandwidth. It treats the GPU as the fundamental scarce resource. The DT predicts this scarcity becomes irrelevant when AI delivery shifts to inference APIs with effectively unlimited scale.

  3. Benchmark quality is the arbiter of value. The ranking logic weights quality_score heavily. But under economic discontinuity, the relevance of quality benchmarks is itself fragile — a model being "better at coding benchmarks" matters less when the work itself is being automated away.

  4. HuggingFace as the canonical model registry is durable. This tool is a beautiful client for a live API. When the HF model ecosystem shifts — and it will, as Sovereigns internalize and privatize their best models — the live-data architecture becomes a liability.

  5. The user is rational. The tool assumes users will make evidence-based, benchmark-grounded decisions. In practice, convenience, brand, and ecosystem lock-in will dominate, making the ranking engine academically interesting but practically marginal.


SOCIAL FUNCTION

Classification: Transition Infrastructure / Prestige Signal

This tool is useful. The code is serious. The engineering is sound. But its social function is more revealing:

  • It's written by a technical individual who has correctly identified that local AI is a growing domain and is positioning to own the "consumer LLM optimization" niche.
  • It's being upvoted on HN because it appeals to the site's demographic: technical people with consumer GPUs who want to feel in control of their AI destiny.
  • It feels like empowerment. You're not at the mercy of OpenAI's pricing. You have agency. You have hardware. You have a tool that tells you exactly what to do with it.

This is the ideological anesthetic function. The tool doesn't just solve a technical problem — it performs sovereignty over one's AI environment. The DT has no quarrel with the technical achievement. The DT notes that technical sovereignty over consumer hardware is a moat made of sand.


THE VERDICT

whichllm is a well-engineered piece of transition infrastructure. It does what it says with unusual rigor — live benchmarks, evidence grading, architecture-aware VRAM modeling, MoE active-vs-total split. The engineering team (one person) understood the problem deeply.

But understand what it is in DT terms: a tool for optimizing one's position as a terminal consumer of compute. The user running whichllm on their RTX 4090 is making a rational, informed decision within a game whose rules are about to change. The game is not "which model runs best on your hardware." The game is "does your hardware have a durable economic role in a world where AI inference is becoming trivially cheap and universally accessible?"

Short-term verdict: The tool is genuinely useful. The RTX 4090 owner running Qwen3.5-27B locally will get better results per dollar than calling an API in many scenarios.

Medium-term verdict: As API costs continue their compression curve and cloud inference achieves better-than-local quality/speed economics for most users, the "local inference enthusiast" demographic shrinks. whichllm's user base becomes a niche of privacy-sensitive, latency-critical, or budget-constrained operators — real, but not the mass market.

Long-term verdict: The Sovereigns win. Local compute is a lag defense, not a structural advantage. The best this tool does is extend the period during which individual compute ownership feels meaningful.

The tool itself is a beautiful specimen of the moment: technically sophisticated, economically precarious, ideologically soothing. Build it, use it, distribute it. But understand what it is. It is a very good hospital for a patient the DT has already autopsied.


RECOMMENDED READING: The README's evidence-grading taxonomy (direct / variant / base_model / lineage_interp / self_reported) is more intellectually honest than most benchmark aggregation in the industry. The author knows data quality is a spectrum. That's worth studying. The conclusions the author draws from it, however, are constrained by the assumption that the system being measured is stable. It is not.

No comments yet. Be the first to weigh in.

The Cope Report

A weekly digest of AI displacement cope, scored by the Oracle.
Top stories, new verdicts, and fresh data.

Subscribe Free

Weekly. No spam. Unsubscribe anytime. Powered by beehiiv.

Got feedback?

Send Feedback