skillmake
← marketplace
engineersconceptsha:f722db687d0c5700manual

mp-diagnose

Use when chasing hard bugs or perf regressions — disciplined reproduce→minimise→hypothesise→instrument→fix→regression-test loop, with feedback loop quality treated as the actual skill.

Tutorials · creator-attached
One-line install
curl --create-dirs -fsSL https://skillmake.xyz/i/mp-diagnose -o ~/.claude/skills/mp-diagnose/SKILL.md

The hash above pins this exact content. The file we serve at /api/marketplace/mp-diagnose-f722db68/raw always matches sha:f722db687d0c5700.

5,421 chars · ~1,355 tokens
---
name: mp-diagnose
description: Use when chasing hard bugs or perf regressions — disciplined reproduce→minimise→hypothesise→instrument→fix→regression-test loop, with feedback loop quality treated as the actual skill.
source: https://github.com/mattpocock/skills/blob/main/skills/engineering/diagnose/SKILL.md
generated: 2026-05-12T18:04:58.022Z
category: concept
audience: engineers
---

## Tutorials

- https://skillmake.xyz/v/mp-diagnose.mp4

## When to use

- User says 'diagnose this' / 'debug this' or reports something throwing, failing, or broken
- Performance regression where timing changed and you need a reliable measurement harness
- Non-deterministic / flaky bug that needs a higher reproduction rate before it's debuggable
- Bug that's been re-fixed twice and you want a regression test seam that survives refactors

## Key concepts

### feedback loop (Phase 1)

The actual skill. A fast, deterministic, agent-runnable pass/fail signal for the bug. Bisection, hypothesis-testing, and instrumentation only consume this signal — without one, no amount of code reading helps. Treat the loop as a product: make it faster, sharper, more deterministic. A 2-second deterministic loop is a debugging superpower; a 30-second flaky one is barely better than nothing.

### ways to construct a loop

In rough order: failing test at the right seam, curl/HTTP script, CLI invocation with snapshot diff, headless browser, captured-trace replay, throwaway harness, property/fuzz loop, git-bisect harness, differential old-vs-new run, and as last resort a HITL bash script that drives a human. Be aggressive, creative, refuse to give up.

### non-deterministic bugs

The goal is not a clean repro but a higher reproduction rate. Loop the trigger 100×, parallelise, add stress, narrow timing windows, inject sleeps. A 50%-flake bug is debuggable; 1% is not — keep raising the rate until it is.

### ranked hypotheses (Phase 3)

Generate 3–5 falsifiable hypotheses before testing any of them — single-hypothesis generation anchors on the first plausible idea. Each must state a prediction: 'If X is the cause, then changing Y makes the bug disappear / changing Z makes it worse.' Show the ranked list to the user before testing; they often re-rank instantly.

### tagged debug logs

Every probe gets a unique prefix like `[DEBUG-a4f2]`. Cleanup at the end becomes a single grep. Untagged logs survive into prod; tagged logs die. One breakpoint beats ten logs — prefer debugger/REPL inspection when the env supports it, then targeted logs at hypothesis-distinguishing boundaries. Never 'log everything and grep'.

### correct seam for regression test

A seam where the test exercises the real bug pattern as it occurs at the call site. A single-caller test for a bug that needs multiple callers, or a unit test that can't replicate the triggering chain, gives false confidence. If no correct seam exists, that itself is the finding — flag it and hand off to architecture work after the fix.

### perf branch

For performance regressions, logs are usually wrong. Establish a baseline measurement (timing harness, performance.now(), profiler, query plan), then bisect. Measure first, fix second.

## API reference

```
Phase 5 — write the regression test before the fix
```

Only when a correct seam exists: turn the minimised repro into a failing test at that seam, watch it fail, apply the fix, watch it pass, then re-run the Phase 1 loop against the original un-minimised scenario.

```
1. Turn the minimised repro into a failing test at the correct seam
2. Watch it fail
3. Apply the fix
4. Watch it pass
5. Re-run the Phase 1 feedback loop against the original (un-minimised) scenario
```

```
Phase 6 — done-checklist before declaring done
```

Cleanup gate. All four boxes plus the post-mortem question must be answered before claiming the bug is fixed.

```
- [ ] Original repro no longer reproduces (re-run Phase 1 loop)
- [ ] Regression test passes (or absence of seam is documented)
- [ ] All [DEBUG-...] instrumentation removed (grep the prefix)
- [ ] Throwaway prototypes deleted or clearly marked
- [ ] Correct hypothesis stated in the commit / PR message

Then: what would have prevented this bug?
```

## Gotchas

- Do not proceed past Phase 1 without a feedback loop you believe in — staring at code without a signal wastes hours.
- Confirm the loop reproduces the SAME failure mode the user described, not a different failure that happens to be nearby. Wrong bug = wrong fix.
- Don't anchor on the first plausible hypothesis. Generate 3–5 ranked, falsifiable ones before testing any.
- Tag every debug log with a unique prefix like [DEBUG-a4f2] so cleanup is a single grep — untagged logs end up in production.
- For perf regressions, logs lie. Measure first with a baseline harness, then bisect.
- Change one variable at a time when instrumenting; multi-variable probes destroy signal.
- Writing a regression test at the wrong seam gives false confidence — if no correct seam exists, that's the finding, not 'good enough'.
- If you genuinely can't build a loop, stop and say so explicitly. Ask for env access, a captured artifact (HAR / log dump / core dump), or permission to add temporary prod instrumentation.

---
Generated by SkillMake from https://github.com/mattpocock/skills/blob/main/skills/engineering/diagnose/SKILL.md on 2026-05-12T18:04:58.022Z.
Verify against source before relying on details.

File: ~/.claude/skills/mp-diagnose/SKILL.md