engineerstoolCommunitysha:085e279a848cb08dmanual

anthropic-skill-creator

Use when authoring, evaluating, or packaging a new Claude agent skill — the official Anthropic skill-creator walks you from intent capture to a graded eval and a shippable bundle.

source: https://github.com/anthropics/skills/tree/main/skills/skill-creator ↗anthropics/skills· ★ 136k

Install confidence

curl --create-dirs -fsSL https://skillmake.xyz/i/anthropic-skill-creator -o ~/.claude/skills/anthropic-skill-creator/SKILL.md

Pinned content

sha:085e279a848cb08d

Generated with

manual

Source

github.com

The file served at /api/marketplace/anthropic-skill-creator-085e279a/raw matches this hash. Inspect before install, then copy the command.

4,142 chars · ~1,036 tokens

---
name: anthropic-skill-creator
description: Use when authoring, evaluating, or packaging a new Claude agent skill — the official Anthropic skill-creator walks you from intent capture to a graded eval and a shippable bundle.
source: https://github.com/anthropics/skills/tree/main/skills/skill-creator
generated: 2026-05-17T04:18:09.113Z
category: tool
audience: engineers
---

## When to use

- Drafting a new SKILL.md and unsure what frontmatter and structure Anthropic expects
- Tightening a skill's description so it triggers reliably without overfiring
- Running a side-by-side eval comparing with-skill vs baseline runs of Claude
- Aggregating benchmark scores across an iteration directory of eval runs
- Packaging a skill folder into a distributable artifact
- Iterating on a skill by spawning new iteration-N directories and re-running evals

## Key concepts

### SKILL.md frontmatter

Every skill starts with YAML frontmatter containing at minimum `name` and `description`. The reference skill uses: name: skill-creator, description: 'Create new skills, modify and improve existing skills, and measure skill performance.'

### Pushy descriptions

The skill-creator advises writing the description 'a little bit pushy' to combat undertriggering. A passive description leaves the model uncertain about when to load the skill; an assertive one biases toward correct activation.

### Five-stage workflow

Authoring follows Capture Intent → Interview & Research → Write SKILL.md → Test & Evaluate → Iterate. Each stage gates the next: you do not write SKILL.md until edge cases, formats, and success criteria are explicit.

### Parallel eval runs

When evaluating, you 'Spawn all runs (with-skill AND baseline) in the same turn' so they execute in parallel subagents. Outputs are graded with assertions, aggregated into benchmarks, and viewed in an HTML viewer.

### Iteration directories

Each round of evals writes into a new iteration-N folder inside the workspace. This preserves history so you can compare iteration-1 vs iteration-2 scores instead of overwriting prior runs.

### Lean prompts with explained why

Rather than rigid MUSTs, the skill instructs authors to explain the reasoning behind each rule so the model can generalize. 'Keep the prompt lean' by deleting instructions that don't measurably improve outputs.

## API reference

```
python -m scripts.run_loop --eval-set <path> --skill-path <path>
```

Run the eval loop against a skill, executing all configured test cases with and without the skill loaded.

```
python -m scripts.run_loop --eval-set ./evals --skill-path ./skills/my-skill
```

```
python -m scripts.aggregate_benchmark <workspace>/iteration-N
```

Aggregate grades from a single iteration directory into a benchmark summary.

```
python -m scripts.aggregate_benchmark workspace/iteration-2
```

```
python <skill-creator-path>/eval-viewer/generate_review.py
```

Generate the HTML eval-viewer page that lets you inspect with-skill vs baseline runs side by side.

```
python skills/skill-creator/eval-viewer/generate_review.py
```

```
python -m scripts.package_skill <path/to/skill-folder>
```

Package a finished skill folder into a distributable artifact ready to share or install.

```
python -m scripts.package_skill ./skills/my-skill
```

## Gotchas

- A weak, passive description is the #1 reason skills undertrigger — write it pushier than feels natural
- Keep SKILL.md under ~500 lines; longer prompts hurt rather than help triggering and adherence
- Baselines matter: without a no-skill control, you cannot tell whether the skill is actually moving the needle
- On Claude.ai (no subagents) you must run test cases sequentially yourself, skip baselines, and skip the HTML viewer
- Don't overwrite iteration folders — each eval run should land in a new iteration-N to preserve regression history
- Explain the why behind rules instead of stacking MUSTs; rigid imperatives without rationale degrade model behavior

---
Generated by SkillMake from https://github.com/anthropics/skills/tree/main/skills/skill-creator on 2026-05-17T04:18:09.113Z.
Verify against source before relying on details.

File: ~/.claude/skills/anthropic-skill-creator/SKILL.md