aaddrick.com

Hobby maximalist and serial task jumper.

This is a walkthrough of the patterns I've built in .claude/ over the past several months. It's not a spec. It's what actually worked.

https://github.com/aaddrick/claude-pipeline

You can replicate the structure for any stack. The specifics here are Laravel/PHP, but the patterns are universal. My current project is a Laravel 11 app with PHP 8.2+, PostgreSQL, and AWS integrations. The domain has security and compliance constraints that shaped a lot of the agent design.

Three Ways to Build Skills

I've found there are really three paths to building useful skills, and most of the good ones combine two or three:

Adapt from community. Stuff like TDD, debugging workflows, and git patterns are pretty universal. I grabbed several from obra/superpowers, dropped them into .claude/skills/, and tweaked the test runner commands and linting tools for my project. Took maybe 2-3 refinement rounds before they felt right. The systematic-debugging, test-driven-development, dispatching-parallel-agents, and using-git-worktrees skills all started this way.

Extract from books or docs. I had a copy of "Handcrafted CSS: More Bulletproof Web Design" that I converted from PDF to text. Fed it to Claude and asked for a skill extraction. The first draft was generic CSS patterns. Took 5+ rounds to get it project-specific: design tokens, no-Tailwind rule, Blade template conventions, coordination with the backend agent. That's how the bulletproof-frontend/ and ui-design-fundamentals/ skills were born. The UI design skill alone has 13 supporting files — buttons, cards, colors, forms, typography, navigation, shadows, and more.

Capture from experience. The GitHub workflow skills came from doing the work. handle-issues/, implement-issue/, process-pr/ — these all started as "how do we actually do this" and got written down after enough repetitions. Anti-patterns came from real mistakes, not theory.

Pick the approach that fits the skill type. Methodologies adapt well from community. Domain expertise extracts well from books. Team workflows you just have to capture by doing the work.

The Improvement Loop

This is probably the most important pattern in the whole project.

When a skill or agent produces bad output, don't immediately edit it. Work through to the correct solution first. Then update the skill with what you learned.

The instinct is to jump into the skill file and start tweaking. I've done it. It doesn't work well because you end up encoding partial understanding or wrong assumptions.

Here's what works:

  1. Agent produces wrong output
  2. Debug it, understand why it's wrong
  3. Iterate until you reach the correct solution
  4. Now you actually understand the problem
  5. Update the skill with specific guidance, an anti-pattern entry, and an example

A real example: my backend agent kept using Model::all(). Works fine in dev with 100 rows. Falls over in production with a million. I had to work through pagination vs chunking vs cursor-based approaches before I understood the right pattern for different use cases. Then I added a "NEVER use Model::all() on large tables" anti-pattern with the correct alternatives.

That issue showed up three separate times across different services before the anti-pattern caught it consistently. Track your failures. They're the best source material.

What's in the Directory

.claude/
├── agents/              # 10 specialized subagent personas
├── hooks/               # 2 lifecycle hooks + settings.json config
├── prompts/
│   └── frontend/        # 3 Blade refactoring prompt templates
├── scripts/
│   ├── implement-issue-orchestrator.sh   # The big one (1600+ lines)
│   ├── batch-orchestrator.sh             # Parallel issue processing
│   ├── batch-runner.sh                   # Simple parallel wrapper
│   ├── schemas/         # 13 JSON schemas for stage validation
│   └── implement-issue-test/             # 9 BATS test files + fixtures
├── skills/              # 19 skill directories
│   └── [name]/
│       ├── SKILL.md
│       └── *.md         # Supporting docs
└── settings.json        # Hook configuration (PreToolUse, PostToolUse, Notification)

Nothing exotic. The key is that each piece has a clear job and they compose together.

The 10 Agents

This is where the system gets specific. Each agent has a clear persona, explicit scope boundaries, deferral rules, and anti-patterns from real mistakes.

Agent Model Role
laravel-backend-developer inherit PHP/Laravel backend specialist with full project context
bulletproof-frontend-developer inherit Semantic CSS, Blade templates, "Handcrafted CSS" principles
code-reviewer inherit Plan alignment + code quality review
spec-reviewer inherit Goal achievement validation (not code quality)
fsa-code-simplifier opus Simplifies PHP code for clarity while preserving behavior
php-test-validator opus Two-phase test execution + quality auditing
bats-test-validator opus Same concept but for BATS/bash tests
bash-script-craftsman opus Shell scripting with style.ysap.sh conventions
cc-orchestration-writer opus Creates Claude CLI orchestration scripts
phpdoc-writer sonnet PHPDoc blocks optimized for onboarding

The model assignments matter. Opus handles the complex reasoning tasks: test validation, code simplification, orchestration writing. Sonnet handles documentation where speed matters more than depth. Most agents inherit the session model.

How agents coordinate

Agents explicitly defer to each other. The backend agent doesn't touch CSS — that goes to the frontend agent. The frontend agent doesn't write PHP — that goes to the backend agent. The code reviewer flags Tailwind as a Critical/blocking issue and delegates refactoring to the frontend agent.

The backend agent has a blocking coordination protocol with a Knowledge Manager agent. After creating or modifying services, controllers, middleware, or models, it has to request a knowledge base update and wait for confirmation before proceeding. This keeps architectural documentation in sync.

The spec reviewer is intentionally narrow. It only cares about one question: "Did we achieve what we set out to do?" It doesn't review code quality, performance, or style. That's the code reviewer's job. Separating these concerns catches different kinds of problems.

What makes agents useful

Generic advice like "write clean code" does nothing. Specific anti-patterns change behavior.

The Laravel backend agent has been refined 15+ times. Each round came from a real issue: N+1 queries, env() outside config files, missing RefreshDatabase in tests, Model::all() on large tables. The agent also carries full project context: the directory structure, auth flow, database patterns, deployment scripts, and domain-specific constraints. That project context is what makes it actually useful instead of just another "senior Laravel developer" prompt.

The no-Tailwind rule is enforced project-wide. It's not just a preference — it's a hard policy. The code reviewer rejects any Tailwind utility classes as Critical/must-fix. The frontend agent refactors them to semantic CSS. This came from a team decision that utility classes violate separation of concerns.

The 19 Skills

Skills tell Claude when and how to apply a process. Here's what I've got:

Adapted from community (obra/superpowers):

  • systematic-debugging/ — Root cause analysis before fixes (9 supporting files including pressure tests)
  • test-driven-development/ — TDD workflow with anti-patterns file
  • dispatching-parallel-agents/ — When and how to parallelize
  • using-git-worktrees/ — Branch isolation for features
  • executing-plans/ — Running implementation plans with review checkpoints
  • subagent-driven-development/ — Task execution with spec + quality reviewers (3 supporting files)
  • writing-plans/ — Creating multi-step implementation plans
  • writing-skills/ — Meta-skill for creating new skills (5 supporting files including Anthropic best practices)
  • using-skills/ — The gateway skill that enforces skill usage
  • brainstorming/ — Structured ideation before building

Extracted from books:

  • bulletproof-frontend/ — From "Handcrafted CSS" (4 supporting files: CSS architecture, accessibility, responsive patterns, reference)
  • ui-design-fundamentals/ — Component design patterns (13 supporting files covering buttons, cards, colors, forms, grids, hero sections, modals, navigation, pricing, search, shadows, style guides, typography)

Captured from experience:

  • handle-issues/ — Batch GitHub issue processing
  • implement-issue/ — End-to-end implementation orchestration
  • process-pr/ — PR review workflow
  • review-ui/ — Parallel UI review with criteria files
  • write-docblocks/ — PHPDoc generation workflow
  • investigating-codebase-for-user-stories/ — Reverse-engineering requirements from code
  • writing-agents/ — Creating agent definitions (includes templates)

The YAML frontmatter description should focus on WHEN to use the skill, not WHAT it does. That's how Claude matches skills to situations.

The using-skills skill deserves special mention. It's enforced aggressively — wrapped in <EXTREMELY_IMPORTANT> tags and injected into every conversation via the session-start hook. It includes a rationalization table that catches excuses like "this is just a simple question" or "I'll just do this one thing first." Sounds heavy-handed, but without it Claude would skip skills constantly.

Keep the main SKILL.md concise. Put detailed reference material in supporting files. The UI design skill has 13 separate docs. The systematic debugging skill has pressure tests to verify the skill holds up under urgency.

Hooks and Settings

The settings.json configures five hooks across three lifecycle events:

PostToolUse:

  • After any Edit or Write: Runs Laravel Pint on PHP files. Auto-formatting so I don't think about style.
  • After any Bash command: Runs post-pr-simplify.sh, which watches for gh pr create commands. When it detects a new PR, it triggers the fsa-code-simplifier agent to review the changed PHP files.

PreToolUse:

  • Before any Edit or Write: Blocks editing .env, .git/, credentials, or package-lock.json. Catches accidental sensitive file modifications.
  • Before any Bash command: Blocks any command containing deploy_to_production. Safety net for production deployments.

Notification:

  • On permission prompts or idle: Fires notify-send for a desktop notification. Useful when a long-running task needs my attention.

The session-start hook (hooks/session-start.sh) injects the using-skills skill content into every conversation context. It reads the skill file, escapes it for JSON, and outputs it as additionalContext so Claude has skill awareness from the first message.

The Implement-Issue Orchestrator

This is the most complex piece. It's a bash state machine that takes a GitHub issue from assignment to merged PR. 1600+ lines with resume capability.

The pipeline has these stages:

Stage Agent Purpose
setup default Create worktree, fetch issue, explore codebase
research default Understand context
evaluate default Assess approach options
plan default Create implementation plan
implement per-task Execute each task from the plan
task-review spec-reviewer Verify task achieved its goal
fix per-task Address review findings
simplify fsa-code-simplifier Clean up code
test php-test-validator Run tests + quality audit
review code-reviewer Internal code review
docs phpdoc-writer Add PHPDoc blocks
pr default Create or update PR
spec-review spec-reviewer Verify PR achieves issue goals
code-review code-reviewer Final quality check
complete default Post summary

Each stage validates output against a JSON schema in scripts/schemas/. There are 13 schemas total covering every stage output format. Fail fast on bad data instead of letting garbage flow downstream.

Resume capability

Long-running AI workflows get interrupted. Rate limits, service outages, network drops, or you just need to stop. The orchestrator saves state to status.json after every operation. Run --resume and it picks up where it left off. You can also --resume-from <log-dir> to restart from a specific log directory.

I've had rate limits hit during task 5 of 8 and resume saved me 20+ minutes of redone work. Without it you'd lose 30-60 minutes, waste API quota, and regenerate the same code.

Quality loops with iteration limits

Prevents runaway processes:

  • Task review: max 3 attempts
  • Quality iterations: max 5
  • Test iterations: max 10
  • PR review iterations: max 3
  • Stage timeout: 1 hour per stage
  • Rate limit default wait: 1 hour

The test harness

The orchestrator itself has a test suite in scripts/implement-issue-test/. Nine BATS test files covering argument parsing, JSON parsing, status functions, stage runner, quality loops, rate limits, comment helpers, constants, and integration. Plus 11 fixture files for different scenarios (success, failure, rate limit, review approved, changes requested).

I test the orchestrator like I test application code. It's infrastructure that needs to be reliable.

Real-world performance

Simple features (2-3 tasks) take 10-15 minutes. Medium features (5-7 tasks) take 25-35 minutes. Complex features (10+ tasks) take 45-60 minutes.

Test Validation

This one took a while to get right. The problem is that "tests pass" doesn't mean "tests are good."

I've seen tests with assertTrue(true) pass CI. Tests that only check assertOk() on API responses without verifying actual data. Tests that mock the system under test so they literally test nothing. All green. All worthless.

The php-test-validator agent (runs on opus for deep reasoning) operates in two phases. First, execute the tests — it must run the test suite before doing anything else. Then audit quality: scan for TODOs, hollow assertions, missing edge cases, mock abuse, and missing negative tests.

The bats-test-validator agent does the same thing for bash/BATS tests. Same two-phase approach, same philosophy.

Seven things I flag as automatic failures:

  1. TODO/FIXME markers in test files
  2. Hollow assertions (no real verification)
  3. Missing edge cases (only happy path tested)
  4. Mock abuse (mocking the system under test)
  5. Missing negative tests (no error condition coverage)
  6. Empty data providers
  7. Timing-based tests (sleep instead of condition-based waiting)

Prompt Templates

Three Blade refactoring prompts in prompts/frontend/:

  • audit-blade.md — Systematic Blade template analysis
  • refactor-blade-basic.md — Quick refactoring for simple cases
  • refactor-blade-thorough.md — Deep refactoring with sibling page consistency checks, CSS grid fixes, and semantic HTML improvements

The thorough refactoring prompt includes a page structure consistency step: before refactoring, read a sibling page in the same directory and match wrapper classes, section patterns, and layout structure. Prevents drift.

Patterns Worth Knowing

A few patterns show up across the whole project:

Aggressive skill enforcement. The using-skills skill and session-start hook work together to make skill usage non-optional. Sounds extreme, but Claude will rationalize skipping skills without it. The rationalization table in the skill catches a dozen common excuses.

Explicit agent deferral. Every agent says what it does and what it doesn't do. The backend agent lists "Not in scope: CSS, Tailwind refactoring, styling..." and names the agent to defer to. Prevents overlap and makes coordination predictable.

Model selection by complexity. Opus for deep reasoning (test validation, code simplification, orchestration). Sonnet for high-volume lower-complexity work (documentation). Inherit for most agents so the session model carries through.

PreToolUse guards. Blocking dangerous operations before they happen. Editing .env, deploying to production — these get caught by PreToolUse hooks, not by hoping the agent remembers.

Supporting files for complex skills. Main SKILL.md stays concise. My UI design skill has 13 separate docs. Systematic debugging has 9, including pressure test scenarios that verify the skill holds up when someone says "just fix it quickly."

Two-stage review. Separate spec compliance from code quality. The spec reviewer asks "did we build the right thing?" The code reviewer asks "did we build it well?" Different concerns, different reviewers, different problems caught.

BATS tests for infrastructure. The orchestrator has its own test suite. If it's complex enough to break, it's complex enough to test.

Write a CLAUDE.md Companion Doc

One thing I'd strongly recommend: write a CLAUDE.md (or similar) file at the project root that serves as a quick-reference companion to your .claude/ configuration. Mine covers every agent, skill, hook, and script in a scannable format.

The .claude/ directory is the machine-readable configuration. The CLAUDE.md is the human-readable guide. It answers "what do I have and how do I use it?" without reading through 19 skill files and 10 agent definitions.

I organize mine with these sections:

  • Quick start for the flagship workflow (implement-issue invocation, monitoring commands)
  • Agent summaries — one paragraph each: purpose, when to use, what it defers to
  • Skill list with invocation syntax and brief descriptions
  • Hook descriptions — what triggers each hook and what it does
  • Script usage — command-line flags, resume modes, configuration constants
  • Directory tree — the full layout so you can see everything at a glance

Keep it updated as you add components. I've found that a stale CLAUDE.md is worse than none at all because it builds wrong assumptions. If you add an agent, add it to the doc in the same session.

The CLAUDE.md also helps when someone new joins the project. They can read one file and understand the whole AI-assisted workflow without digging through individual agent and skill definitions.

Getting Started

If you're building your own .claude/ setup:

  1. Create the directory structure: mkdir -p .claude/{agents,hooks,prompts,scripts,skills}
  2. Grab foundational skills from community sources like obra/superpowers
  3. Update test runner commands and linting tools for your stack
  4. Build agents for your domain specialists — start broad, add anti-patterns from real issues
  5. Set up using-skills and a session-start hook early. Skill discovery is the foundation.
  6. Add PreToolUse guards for dangerous operations (.env, production deployment)
  7. Add PostToolUse hooks when you notice repetitive manual steps (formatting, linting)
  8. Build orchestration scripts when you have a pipeline that needs state management
  9. Write a CLAUDE.md companion doc once you have a few components in place

Don't try to build everything at once. Start with skills and agents. The rest comes naturally when you need it.

What I've Learned

The first version of any skill is a starting point. The Laravel backend agent started as a generic PHP developer prompt. After 15+ rounds of real code review issues, it carries full project context — auth flows, database patterns, security constraints — plus specific anti-patterns that actually prevent mistakes.

More complex components need more refinement. Skills might take 3-5 rounds. Agents need 8-15. Orchestration scripts needed 20+. That's just how it goes.

Here's the rough breakdown:

Component Approach Rounds What Improved
TDD, debugging Community 2-4 Project test runners, pressure tests
UI design (13 files) Book extraction 5+ Design tokens, no Tailwind, Blade
Frontend agent Book extraction 5+ CSS architecture, agent coordination
GitHub workflows Experience 10+ Real failures became anti-patterns
Laravel agent Created 15+ N+1, env(), RefreshDatabase, auth, security
Test validators Created 8+ Hollow assertions, mock abuse, two-phase
Orchestrator Scratch 20+ Resume, rate limits, state, BATS tests

The improvement loop is the whole point. Initial drafts get you started. Working through failures and encoding what you learned — that's what makes the system valuable.

After about 6 months, the generic pieces became project-specific. Anti-patterns reflect actual mistakes. Agents coordinate smoothly. Workflows handle edge cases. It's infrastructure that gets better with use instead of documentation that rots.

The effort compounds. That's the key takeaway.