My Claude Project Implementation Patterns Guide
I was asked how I put together my .claude/ folder for a specific project and how I built my skills, agents, and so on. Worked with Claude to analyze my implementation, git history, and general background to produce the document below.
Claude Project Implementation Patterns Guide
Overview
This document explains the architectural patterns and implementation concepts used in Aaddrick's .claude/ project configuration. These patterns can be replicated for any technology stack or project type.
Key Focus: This guide shows multiple paths to building a .claude/ project configuration. Skills and agents can be:
- Adapted from community sources like obra/superpowers
- Created from books, documentation, or domain expertise
- Iteratively refined through multiple rounds of project-specific customization
Table of Contents
- Quick Reference: Three Paths to Skills
- Core Philosophy
- Directory Structure & Purpose
- Implementation Concepts
- Common Patterns Across Files
- Cross-Stack Replication Guide
- Key Success Factors
- Conclusion
Quick Reference: Three Paths to Skills
| Approach | Best For | Example Sources | This Project Examples |
|---|---|---|---|
| Adapt from Community | Universal methodologies | obra/superpowers, open-source projects | TDD, debugging, git workflows, parallel dispatch |
| Extract from Books/Docs | Domain expertise | Technical books, framework docs | UI design (from "Handcrafted CSS"), frontend patterns |
| Capture from Experience | Team workflows | Retrospectives, incidents, reviews | GitHub workflow, PR process, deployment pipeline |
All approaches benefit from iterative refinement: initial draft → project context → team patterns → anti-patterns → testing → continuous improvement.
The Most Valuable Patterns
- Multiple Skill Sources - Community adaptation, book extraction, experience capture
- Project-Specific Agents - Creating specialized subagents with domain expertise
- Workflow Automation - Hooks, orchestration scripts, and state management
- Iterative Refinement - Starting with drafts and evolving through real usage
This project demonstrates both approaches: foundational skills adapted from community sources (TDD, debugging), and domain-specific skills created from source material (UI design from "Handcrafted CSS", frontend patterns from experience). All skills evolved through multiple refinement rounds.
Core Philosophy
The project follows three fundamental principles:
- Test-Driven Documentation - Skills and agents are validated through testing before deployment
- Autonomous Workflow Orchestration - Multi-stage pipelines with state management and error recovery
- Specialization Through Composition - Small, focused components that combine for complex behaviors
Directory Structure & Purpose
.claude/
├── agents/ # Specialized subagent personas
├── hooks/ # Lifecycle automation scripts
├── prompts/ # Reusable prompt templates
├── scripts/ # Orchestration and automation
│ ├── schemas/ # JSON schemas for validation
│ └── *-test/ # Test harnesses
├── skills/ # Reusable process documentation
│ └── [skill-name]/ # Each skill in its own directory
│ ├── SKILL.md # Main skill documentation
│ └── *.md # Supporting documentation
└── settings.json # Hook configuration and automation
Implementation Concepts
1. Skill Creation Approaches
Concept: Skills can be built through three main approaches, each with different strengths.
Approach A: Adapt from Community
Sources:
- obra/superpowers - Community-maintained skill library
- Claude.ai skill marketplace (when available)
- Other open-source .claude/ projects
Process:
- Browse community repositories for relevant patterns
- Copy to your
.claude/skills/directory - Modify descriptions to match your project triggers
- Adapt examples to your tech stack (Laravel vs Django, React vs Vue)
- Add project-specific conventions and anti-patterns
- Test with your actual codebase
Example (skills/test-driven-development/, skills/systematic-debugging/):
- Copied from obra/superpowers
- Updated test runner commands for project
- Added project-specific test patterns
- Minimal changes, mostly works as-is
Best for: Foundational methodologies (TDD, debugging, git workflows) that are universal across projects.
Approach B: Extract from Books/Documentation
Sources:
- Technical books (PDF → text conversion)
- Official framework documentation
- Architecture guides and papers
- Domain-specific references
Process:
- Convert source material to text (if needed)
- Feed to Claude with extraction prompt
- Review and structure initial draft
- Add project-specific context and examples
- Include team patterns and anti-patterns
- Iterate through multiple refinement rounds
Example (skills/ui-design-fundamentals/, skills/bulletproof-frontend/):
Source: "Handcrafted CSS: More Bulletproof Web Design" (book)
Process:
1. Converted PDF to text
2. Asked Claude: "Extract key concepts and guidance into a skill and agent"
3. Initial draft had generic CSS patterns
4. Added: Project's design system tokens
5. Added: "No Tailwind" anti-pattern from code reviews
6. Added: Blade template specifics for Laravel
7. Added: Coordination with laravel-backend-developer agent
Result: Skill adapted to project's semantic CSS architecture
Supporting Files Pattern:
skills/ui-design-fundamentals/
SKILL.md # Overview + quick reference
buttons.md # Extracted button patterns from book
forms.md # Form patterns from book
colors.md # Color theory + project tokens
typography.md # Type scale + project fonts
Best for: Domain-specific expertise (design, security, performance) where authoritative sources exist.
Approach C: Capture from Experience
Sources:
- Team retrospectives and lessons learned
- Code review feedback patterns
- Bug post-mortems
- Workflow pain points
Process:
- Identify recurring issues or decisions
- Document the pattern that solves them
- Write skill with clear triggering conditions
- Include red flags and anti-patterns from real mistakes
- Test with team members
- Refine based on actual usage
Example (skills/handle-issues/, skills/process-pr/, skills/implement-issue/):
- Created from team's GitHub workflow
- Captures multi-stage process evolved over time
- Includes specific GitHub CLI commands
- References project's actual agents and scripts
- Anti-patterns from actual workflow failures
Best for: Workflows and processes unique to your team that aren't documented elsewhere.
Common Patterns Across All Approaches
YAML Frontmatter:
---
name: skill-name
description: Use when [triggering conditions]
---
CSO (Claude Search Optimization):
- Descriptions focus on WHEN to use, not WHAT it does
- Include concrete triggers, symptoms, and situations
- Written in third person (injected into system prompt)
Iterative Refinement: All approaches benefit from multiple rounds:
- Initial draft (adapted/extracted/captured)
- Add project-specific context
- Test with real tasks
- Add anti-patterns from failures
- Refine based on usage
- Repeat
Replication Strategy:
Choose your approach based on the skill type:
- Universal methodologies → Adapt from community
- Domain expertise → Extract from authoritative sources
- Team workflows → Capture from experience
- Mix and match → Most skills combine multiple sources
2. Agents System
Concept: Specialized subagent personas with defined roles, scope, and coordination protocols.
Key Patterns:
- Clear Persona Definition - specific expertise and project context
- Explicit Scope Boundaries - what the agent does AND doesn't do
- Deferral Rules - when to hand off to other agents
- Anti-Patterns Section - domain-specific mistakes to avoid
- Project Context - structure, commands, and conventions
Example Application (agents/code-reviewer.md):
---
name: code-reviewer
description: Use when a major project step has been completed and needs review
model: inherit
---
You are a Senior Code Reviewer...
## CORE COMPETENCIES
- Plan alignment analysis
- Code quality assessment
**Not in scope** (defer to bulletproof-frontend-developer):
- CSS architecture refactoring
Why This Works:
- Agents maintain consistent behavior through clear personas
- Scope boundaries prevent overlap and enable specialization
- Anti-patterns capture domain expertise
Replication Strategy:
- Research domain best practices via web search
- Explore codebase to understand project patterns
- Define clear persona with project-specific context
- List specific anti-patterns (not generic advice)
- Establish coordination protocols with other agents
File-Specific Applications:
- laravel-backend-developer.md: Backend specialist with PHP/Laravel expertise, SQL optimization rules
- bulletproof-frontend-developer.md: Frontend specialist deferring backend work, CSS architecture focus
- bash-script-craftsman.md: Shell scripting specialist with POSIX compliance and security patterns
- spec-reviewer.md: Validates implementation against specifications (no code quality concerns)
3. Hook System
Concept: Lifecycle-triggered automation that runs at specific points in the development workflow.
Key Patterns:
- PreToolUse Hooks - validation before actions (prevent accidents)
- PostToolUse Hooks - cleanup after actions (formatting, simplification)
- Notification Hooks - user alerts for specific conditions
- SessionStart Hooks - context injection at conversation start
Example Application (hooks/session-start.sh):
#!/usr/bin/env bash
# Injects using-skills content into conversation context
using_skills_content=$(cat ".claude/skills/using-skills/SKILL.md")
cat <<EOF
{
"hookSpecificOutput": {
"hookEventName": "SessionStart",
"additionalContext": "...$using_skills_escaped..."
}
}
EOF
Why This Works:
- Skills automatically available without explicit invocation
- Prevents dangerous operations (editing .env, production deployments)
- Ensures consistency (auto-formatting, linting)
Configuration (settings.json):
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": "lint or format the file",
"timeout": 30
}
]
}
]
}
}
Replication Strategy:
- Identify repetitive tasks in your workflow
- Create hook scripts that output JSON
- Configure matchers in settings.json
- Use PreToolUse for validation, PostToolUse for cleanup
- Keep timeouts short to avoid blocking
File-Specific Applications:
- session-start.sh: Injects core skill into every conversation
- post-pr-simplify.sh: Triggered after bash commands, simplifies complex diffs
4. Orchestration Scripts
Concept: Multi-stage workflow automation with state management, error recovery, and progress tracking.
Key Patterns:
- State Machine Design - clear stages with status tracking
- JSON Status Files - real-time progress visibility
- Schema Validation - JSON schemas for each stage output
- Iteration Limits - prevent infinite loops (quality_iterations, test_iterations)
- Resume Capability - restart from failure point
- Rate Limit Handling - exponential backoff and retry logic
Example Application (scripts/implement-issue-orchestrator.sh):
#!/usr/bin/env bash
# Orchestrates multi-stage issue implementation
# State tracking
init_status() {
jq -n \
--arg state "initializing" \
--argjson issue "$ISSUE_NUMBER" \
'{
state: $state,
issue: $issue,
stages: {
setup: {status: "pending"},
research: {status: "pending"},
plan: {status: "pending"},
implement: {status: "pending"},
test_loop: {status: "pending", iteration: 0},
pr: {status: "pending"}
}
}' > "$STATUS_FILE"
}
# Stage execution with error handling
run_stage() {
local stage="$1"
update_stage "$stage" "in_progress"
if claude_cli_invoke "$stage" > "$stage_log" 2>&1; then
update_stage "$stage" "completed"
return 0
else
update_stage "$stage" "failed"
return 1
fi
}
Why This Works:
- State files enable inspection during long-running processes
- JSON schemas validate stage outputs (fail fast)
- Iteration limits prevent runaway processes
- Resume capability saves time and API costs
Replication Strategy:
- Define workflow stages as a state machine
- Create JSON schemas for each stage output
- Implement status file with stage tracking
- Add iteration limits for loops
- Enable resume from status file
- Handle rate limits with exponential backoff
File-Specific Applications:
- implement-issue-orchestrator.sh: 11-stage workflow (setup → research → evaluate → plan → implement → quality_loop → test_loop → docs → pr → pr_review → complete)
- batch-orchestrator.sh: Parallel task execution with progress tracking
- batch-runner.sh: Simple parallel execution wrapper
5. Implement-Issue: End-to-End Workflow
Concept: A complete end-to-end orchestration system for taking a GitHub issue from assignment to merged PR, combining multiple skills, agents, and quality loops.
This is the most complex pattern in the project - a production-grade workflow orchestrator that demonstrates how all the components work together.
🔄 CRITICAL FEATURE: Resume Capability
The architecture is designed to handle interruptions gracefully. If the workflow is interrupted by:
- Rate limits (Claude API throttling)
- Service outages (Claude services temporarily unavailable)
- System crashes (computer loses power, process killed)
- Network failures (internet disconnection)
You can resume exactly where you left off:
./implement-issue-orchestrator.sh --resumeThe orchestrator reads
status.json, validates the worktree still exists, and continues from the last completed stage. This saves 20-60 minutes of redundant work and preserves all progress. State is synced to disk after every operation, so no work is lost.
Architecture Overview
Three Layers:
- Skill Layer (
skills/implement-issue/SKILL.md) - User-facing interface - Orchestrator Script (
scripts/implement-issue-orchestrator.sh) - 1600+ line bash state machine - Schema Layer (
scripts/schemas/implement-issue-*.json) - Stage output validation
User invokes skill
↓
Skill launches orchestrator script
↓
Orchestrator runs 11-stage pipeline
↓
Each stage validated against schema
↓
State tracked in status.json
↓
GitHub comments provide visibility
The 11-Stage Pipeline
| Stage | Purpose | Agent | Output Schema |
|---|---|---|---|
| 1. Setup | Create worktree, fetch issue | default | implement-issue-setup.json |
| 2. Research | Explore codebase context | default | implement-issue-research.json |
| 3. Evaluate | Assess approach options | default | implement-issue-evaluate.json |
| 4. Plan | Create implementation plan | default | implement-issue-plan.json |
| 5. Implement | Execute each task | per-task | implement-issue-implement.json |
| 6. Task Review | Verify task met spec | spec-reviewer | implement-issue-task-review.json |
| 7. Simplify | Clean up code | fsa-code-simplifier | implement-issue-simplify.json |
| 8. Test Loop | Run tests → fix → repeat | php-test-validator | implement-issue-test.json |
| 9. Docs | Add documentation | phpdoc-writer | (inline) |
| 10. PR | Create/update PR | default | implement-issue-pr.json |
| 11. PR Review | Spec + quality review | reviewers | implement-issue-review.json |
Quality Loops (Prevent Infinite Iterations)
Per-Task Quality Loop (runs after each task during implement):
for each task:
1. Implement task (agent per task type)
2. Task review (spec-reviewer checks requirements met)
- If failed: Fix and re-review (max 3 attempts)
3. Simplify code (fsa-code-simplifier)
4. Code review (code-reviewer checks quality)
- If failed: Fix and re-review (max 5 iterations)
→ Move to next task
Test Loop (runs once after all tasks):
loop (max 10 iterations):
1. Run test suite (php-test-validator)
- If failed: Fix tests → continue loop
2. Validate test quality (php-test-validator scoped to issue)
- If failed: Improve tests → continue loop
3. If both passed: exit loop
PR Review Loop (runs at end):
loop (max 3 iterations):
1. Spec review (spec-reviewer: does PR meet issue goals?)
- If failed: Fix implementation → continue
2. Code review (code-reviewer: quality check)
- If failed: Fix quality issues → continue
3. If both approved: complete
State Management
status.json Structure:
{
"state": "running",
"issue": 123,
"branch": "feature/issue-123-...",
"worktree": "/path/to/worktree",
"current_stage": "implement",
"current_task": 2,
"stages": {
"setup": {
"status": "completed",
"started_at": "2025-01-15T10:00:00Z",
"completed_at": "2025-01-15T10:02:00Z"
},
"implement": {
"status": "in_progress",
"task_progress": "2/5"
},
"test_loop": {
"status": "pending",
"iteration": 0
}
},
"tasks": [
{
"id": 1,
"description": "Add user profile endpoint",
"agent": "laravel-backend-developer",
"status": "completed",
"review_attempts": 1
},
{
"id": 2,
"description": "Create profile view",
"agent": "bulletproof-frontend-developer",
"status": "in_progress",
"review_attempts": 0
}
],
"quality_iterations": 2,
"test_iterations": 1,
"pr_review_iterations": 0,
"log_dir": "logs/implement-issue/issue-123-20250115-100000"
}
Resume Capability:
# Original run fails at task 3
./implement-issue-orchestrator.sh --issue 123 --branch main
# [Interrupted: Rate limit hit, or service timeout, or Ctrl+C]
# Resume from where it left off
./implement-issue-orchestrator.sh --resume
# Reads status.json, validates worktree, continues from task 3
What Gets Preserved:
- ✓ Worktree and branch
- ✓ All completed stages
- ✓ Completed tasks (doesn't redo work)
- ✓ Iteration counts (quality, test, PR review)
- ✓ GitHub PR number (if already created)
- ✓ Log directory and context
Real-World Resume Scenarios:
-
Rate Limit Hit (Most Common)
Task 5 of 8 implementation → Rate limit (429 error) Status: Saved after task 4 completion Resume: Continues from task 5 Time Saved: ~25 minutes (4 completed tasks not redone) -
Claude Service Outage
During test loop iteration 3 → Service unavailable (503) Status: Saved after iteration 2 completion Resume: Continues test loop from iteration 3 Time Saved: ~15 minutes (prior test fixes preserved) -
Computer Crash / Power Loss
During PR creation → Computer loses power Status: Last sync after task review completion Resume: Skips all completed tasks, proceeds to PR creation Time Saved: ~40 minutes (all implementation preserved) -
Network Failure
During task 7 implementation → Internet disconnects Status: Saved after task 6 completion Resume: Validates worktree, continues from task 7 Time Saved: ~30 minutes -
Manual Interruption (Ctrl+C)
You need to stop and check something → Ctrl+C Status: Last completed stage saved Resume: Pick up exactly where stopped Time Saved: Flexibility to pause/resume workflow
How Resume Works Internally:
# Load state from status.json
load_resume_state() {
ISSUE_NUMBER=$(jq -r '.issue' status.json)
BRANCH=$(jq -r '.branch' status.json)
WORKTREE=$(jq -r '.worktree' status.json)
CURRENT_STAGE=$(jq -r '.current_stage' status.json)
COMPLETED_STAGES=$(jq -r '.stages | to_entries |
map(select(.value.status == "completed")) |
map(.key)' status.json)
}
# Skip completed stages
for stage in "${stages[@]}"; do
if is_stage_completed "$stage"; then
log "Skipping $stage (already completed)"
continue
fi
run_stage "$stage"
done
State Sync Strategy (Why Nothing Is Lost):
# After EVERY operation, sync to disk
update_stage() {
# Update in-memory status.json
jq '.stages[$stage].status = $status' status.json > tmp
mv tmp status.json
# Immediately sync to log directory
cp status.json "$LOG_DIR/status.json"
}
# Even if process killed mid-operation, worst case:
# - Last completed stage is preserved
# - Current stage marked "in_progress" (safe to restart)
# - No data corruption (atomic file moves)
Resume Validation:
Before resuming, the orchestrator validates:
- ✓ status.json exists and is valid JSON
- ✓ Required fields present (issue, branch, worktree)
- ✓ Worktree still exists at path
- ✓ Worktree is a valid git worktree
- ✓ State is resumable (not already completed)
If validation fails, provides clear error message with remediation steps.
Why This Architecture Matters:
Long-running AI workflows (30-60 minutes) face inevitable interruptions:
- API rate limits are unpredictable
- Service outages happen
- Local issues occur (power, network, crashes)
Without resume capability:
- ❌ Lose 30-60 minutes of work
- ❌ Waste API quota redoing completed work
- ❌ Regenerate same code multiple times
- ❌ Re-run tests that already passed
- ❌ Create duplicate GitHub comments
With resume capability:
- ✅ Continue exactly where stopped
- ✅ Preserve all completed work
- ✅ Save API quota
- ✅ Save time (20-60 minutes)
- ✅ Maintain clean GitHub comment history
- ✅ Handle interruptions gracefully
GitHub Integration
Automatic Comments (14 comment points throughout workflow):
- Starting automated processing
- Evaluation: Best path
- Implementation plan (with collapsible full plan)
- Task list (markdown checklist)
- Per-task: Implementation summary
- Per-task: Spec review results
- Per-task: Simplification summary
- Per-task: Code review results
- Test loop: Test results (each iteration)
- Test loop: Validation results
- Test loop: Fix summaries
- PR created/updated
- PR spec review
- PR code review
Comment Format:
### Stage: Description
✅ **Result:** success
Summary of what happened...
_— agent-name_
Error Handling
Rate Limits:
handle_rate_limit() {
local wait_time="${1:-3600}" # Default 1 hour
log "Rate limit hit. Waiting ${wait_time}s..."
# Update status to show waiting
jq --arg state "rate_limited" \
--argjson wait "$wait_time" \
'.state = $state | .wait_until = (now + $wait) | todate' \
"$STATUS_FILE" > "${STATUS_FILE}.tmp"
sleep "$wait_time"
# Resume
jq '.state = "running"' "$STATUS_FILE" > "${STATUS_FILE}.tmp"
}
Max Iterations:
# Prevents infinite loops
readonly MAX_TASK_REVIEW_ATTEMPTS=3
readonly MAX_QUALITY_ITERATIONS=5
readonly MAX_TEST_ITERATIONS=10
readonly MAX_PR_REVIEW_ITERATIONS=3
if (( iteration > MAX_ITERATIONS )); then
log_error "Exceeded max iterations"
set_final_state "max_iterations_exceeded"
exit 2
fi
Schema Validation:
validate_output() {
local output="$1"
local schema="$2"
if ! jq -e . <<< "$output" > /dev/null 2>&1; then
log_error "Invalid JSON output"
return 1
fi
# Validate against schema using ajv or similar
if ! validate_json_schema "$output" "$SCHEMA_DIR/$schema"; then
log_error "Output doesn't match schema: $schema"
return 1
fi
}
Logging System
Log Directory Structure:
logs/implement-issue/issue-123-20250115-100000/
├── orchestrator.log # Main orchestrator log
├── stages/
│ ├── 01-setup.log # Stage outputs
│ ├── 02-research.log
│ ├── 03-evaluate.log
│ ├── 04-plan.log
│ ├── 05-implement-task-1.log
│ ├── 06-task-review-1.log
│ ├── 07-simplify-1.log
│ └── ...
├── context/
│ ├── setup-output.json # Parsed stage results
│ ├── research-output.json
│ ├── plan-output.json
│ ├── tasks.json # Task list
│ └── review-comments.json
└── status.json # Final status snapshot
Log Synchronization:
sync_status_to_log() {
if [[ -n "$LOG_BASE" ]]; then
cp "$STATUS_FILE" "$LOG_BASE/status.json"
fi
}
# Called after every status update
update_stage "setup" "completed"
sync_status_to_log # Ensures log directory always has latest state
Monitoring
Watch Progress:
# Simple JSON view
watch -n 5 'jq . status.json'
# Focused view
watch -n 5 'jq -c "{
state,
stage:.current_stage,
task:.current_task,
quality:.quality_iterations,
test:.test_iterations
}" status.json'
# Stage completion status
jq '.stages | to_entries | map({
stage: .key,
status: .value.status,
started: .value.started_at
})' status.json
Log Tailing:
# Follow orchestrator log
tail -f logs/implement-issue/issue-123-*/orchestrator.log
# Follow current stage
tail -f logs/implement-issue/issue-123-*/stages/$(ls -t logs/.../stages/ | head -1)
Integration with Other Components
Skills Used:
using-git-worktrees- Worktree creation and managementwriting-plans- Implementation plan generationsubagent-driven-development- Task execution patterntest-driven-development- Test-first enforcementrequesting-code-review- Review prompt templates
Agents Invoked:
laravel-backend-developer- Backend task implementationbulletproof-frontend-developer- Frontend task implementationspec-reviewer- Spec compliance verificationcode-reviewer- Code quality assessmentfsa-code-simplifier- Code simplification (FSA = Feature Spec Adherence)php-test-validator- Test execution and validationphpdoc-writer- Documentation generation
Hooks Triggered:
PostToolUseon file edits - Auto-formatting with PintPostToolUseon bash - PR simplification check
Key Design Decisions
Why Bash for Orchestration?
- Native GitHub CLI integration
- Easy file system operations (worktrees, logs)
- jq for JSON manipulation
- Shell portability
- Direct command execution without subprocess overhead
Why Per-Task Quality Loops Instead of End-to-End?
- Catch issues early (cheaper to fix)
- Smaller context per review (more focused)
- Prevent cascading errors
- Better progress granularity
Why Separate Spec and Quality Reviews?
- Different concerns: "right thing" vs "right way"
- Spec review prevents over/under-building
- Quality review ensures maintainability
- Two reviews catch different issue types
Why JSON Schemas?
- Fail fast on malformed outputs
- Self-documenting stage contracts
- Enables reliable automation
- Validates before expensive operations
Replication for Your Stack
1. Define Your Pipeline Stages:
# Example for a different stack
stages=(
"setup" # Clone/setup workspace
"analysis" # Static analysis
"plan" # Implementation plan
"implement" # Code generation
"test" # Unit tests
"integration" # Integration tests
"security" # Security scan
"docs" # Documentation
"pr" # Pull request
)
2. Create Stage Schemas:
// schemas/your-workflow-implement.json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["status", "summary", "files_changed"],
"properties": {
"status": {"enum": ["success", "failed"]},
"summary": {"type": "string"},
"files_changed": {"type": "array", "items": {"type": "string"}}
}
}
3. Build State Machine:
main() {
init_status
for stage in "${stages[@]}"; do
if is_stage_completed "$stage"; then
log "Skipping $stage (already completed)"
continue
fi
run_stage "$stage" || handle_error "$stage"
done
finalize
}
4. Add Quality Loops:
run_quality_loop() {
local max_iterations=5
for (( i=1; i<=max_iterations; i++ )); do
result=$(run_quality_check)
if [[ "$result" == "passed" ]]; then
return 0
fi
apply_fixes "$result"
done
return 1
}
5. Implement Resume:
load_resume_state() {
CURRENT_STAGE=$(jq -r '.current_stage' status.json)
COMPLETED_STAGES=$(jq -r '.stages | to_entries |
map(select(.value.status == "completed")) |
map(.key) | .[]' status.json)
}
Real-World Performance
Typical Execution:
- Simple feature (2-3 tasks): 10-15 minutes
- Medium feature (5-7 tasks): 25-35 minutes
- Complex feature (10+ tasks): 45-60 minutes
Iteration Counts (from actual usage):
- Quality loop iterations: Average 1-2, max 5
- Test loop iterations: Average 1-3, max 10
- PR review iterations: Average 1, max 3
Resume Scenarios:
- Rate limit hit during task 5 of 8: Resume saves ~20 minutes
- Test failures after 7 tasks complete: Resume saves ~30 minutes
- API timeout during PR creation: Resume completes in ~2 minutes
Why This Pattern Matters
This orchestrator demonstrates:
- Production-Grade Automation - Not a toy, handles real complexity
- State Machine Design - Clear stages, resumable, monitorable
- Quality Gates - Multiple checkpoints prevent bad code
- Error Recovery - Graceful handling of failures
- Integration - All components (skills, agents, hooks, schemas) working together
- Observability - Real-time state, comprehensive logging, GitHub visibility
It's the culmination of all the other patterns in this guide, showing how they compose into a working system.
6. Test Validation Methodology
Concept: Automated test quality validation that goes beyond "tests pass" to ensure tests actually catch bugs.
Core Principle: Tests that don't catch bugs are worse than no tests—they provide false confidence.
This pattern demonstrates how to build AI agents that audit test quality, not just run tests. The methodology applies across languages (shown here with PHP/PHPUnit, but adaptable to pytest, Jest, Go testing, etc.).
The Problem with "Tests Pass"
# This passes, but catches nothing
public function test_user_creation(): void
{
$this->assertTrue(true); // TODO: implement
}
# This passes, but is hollow
public function test_api_endpoint(): void
{
$response = $this->get('/api/users');
$response->assertOk(); // What about the data?
}
# This passes, but mocks the system under test
public function test_service(): void
{
$mock = $this->createMock(UserService::class);
$mock->method('create')->willReturn(new User());
$result = $mock->create($data); // Tests nothing!
}
All three tests pass. None catch bugs. Traditional CI/CD only checks "did tests pass?" not "are tests meaningful?"
Two-Phase Validation
Phase 1: Execution (Does it work?)
- Run the full test suite
- Check for failures, errors, skipped tests
- Validate tests complete successfully
- Capture runtime metrics
Phase 2: Quality Audit (Does it catch bugs?)
- Scan for TODO/FIXME/incomplete markers
- Detect hollow assertions (
assertTrue(true)) - Check for missing edge cases
- Identify mock abuse patterns
- Verify negative test cases exist
- Validate assertion meaningfulness
Test Validator Agent
Agent: php-test-validator (uses Opus model for deep reasoning)
Responsibilities:
- Run tests first (mandatory) - static analysis alone is insufficient
- Audit test quality - check for anti-patterns
- Check coverage - every public method has tests
- Validate edge cases - null, empty, negative, boundary conditions
- Detect cheating - mocks that bypass actual logic
- Report actionable findings - specific file:line issues
Output Format:
## Test Validation Report
**Verdict:** PASS | FAIL
### Test Suite Execution
Tests: 42 passed, 2 failed, 1 incomplete
### Critical Issues (Must Fix)
1. Incomplete test: tests/Unit/UserTest.php:45
- `$this->markTestIncomplete('TODO')`
- Fix: Implement the test
2. Hollow assertion: tests/Feature/ApiTest.php:67
- Only checks response code, not data
- Fix: Add assertions for returned user data
### Coverage Gaps
| Method | Test Coverage | Gap |
|--------|---------------|-----|
| `UserService::create()` | ✓ Tested | - |
| `UserService::delete()` | ✗ Missing | No test exists |
| `UserService::validate()` | △ Partial | No edge cases |
The Seven Deadly Test Sins
1. TODO/FIXME/Incomplete Tests (Automatic Failure)
// FAIL: Deferred testing
public function test_feature(): void
{
$this->markTestIncomplete('TODO: implement');
}
// FAIL: Placeholder
public function test_something(): void
{
$this->assertTrue(true); // Will do later
}
Detection: Scan for markTestIncomplete(), markTestSkipped(), TODO comments, assertTrue(true) patterns.
2. Hollow Assertions
// FAIL: No assertions
public function test_operation(): void
{
$service->doSomething(); // Passes if no exception
}
// FAIL: Tautological
public function test_calculation(): void
{
$result = $service->calculate(10, 20);
$this->assertNotNull($result); // But is it correct?
}
Detection: Tests with zero assertions, or only existence checks without value validation.
3. Missing Edge Cases
// Code handles edge cases
public function process(?int $value): int {
if ($value === null) return 0;
if ($value < 0) throw new Exception();
return $value * 2;
}
// FAIL: Only happy path tested
public function test_process(): void
{
$this->assertEquals(20, $service->process(10));
// Missing: null, negative, zero, large numbers
}
Detection: Compare test cases against branches/conditions in implementation.
4. Mock Abuse
// FAIL: Mocking the system under test
public function test_user_service(): void
{
$service = $this->createMock(UserService::class);
$service->method('createUser')->willReturn(new User());
$result = $service->createUser($data); // Tests nothing!
}
// FAIL: Mock returns exactly what test expects
public function test_validation(): void
{
$validator = $this->mock(Validator::class);
$validator->shouldReceive('validate')->andReturn(true);
// Never tests if validation logic actually works
}
Detection: Mocking the class being tested, or mocking with predetermined results that bypass logic.
5. Missing Negative Tests
// Code has error handling
public function create(array $data): User {
if (empty($data['email'])) throw new ValidationException();
if (User::where('email', $data['email'])->exists()) {
throw new DuplicateException();
}
return User::create($data);
}
// FAIL: Only success case tested
public function test_create_user(): void
{
$user = $service->create(['email' => 'test@test.com']);
$this->assertInstanceOf(User::class, $user);
// Missing: empty email, duplicate email
}
Detection: Exception/error handling in code without corresponding expectException tests.
6. Empty or Broken Data Providers
// FAIL: Empty provider
#[DataProvider('userDataProvider')]
public function test_validates_user(array $data): void { }
public static function userDataProvider(): array
{
return []; // No test data!
}
Detection: DataProvider annotation without method, or provider returning empty array.
7. Brittle or Flaky Patterns
// FAIL: Timing-based tests
public function test_async_operation(): void
{
$service->startAsync();
sleep(2); // Hope it finishes?
$this->assertTrue($service->isComplete());
}
// FAIL: Order-dependent tests
#[Depends('test_creates_user')]
public function test_updates_user(): void
{
// Breaks if test order changes
}
Detection: sleep()/usleep() calls, @depends annotations, missing database refresh traits.
Validation Process (Five Steps)
Step 1: Run Test Suite (Mandatory First)
cd project && php artisan test
# or for specific files
php artisan test --filter=UserServiceTest
Capture output:
- Total passed/failed/skipped/incomplete
- Risky tests (no assertions) flagged by PHPUnit
- Execution time (unusually fast = potentially hollow)
- Any runtime warnings
Step 2: Identify Test-Implementation Pairs
app/Services/UserService.php
→ tests/Unit/Services/UserServiceTest.php
app/Http/Controllers/UserController.php
→ tests/Feature/Http/Controllers/UserControllerTest.php
Step 3: Coverage Check
For each public method:
- Is there at least one test?
- Are edge cases covered?
- Are error conditions tested?
- Do assertions verify actual behavior?
Step 4: Quality Audit
For each test method:
- Has meaningful assertions (not just
assertOk()) - Tests behavior, not implementation details
- Mocks appropriately (dependencies, not system under test)
- Would catch a bug if code broke
Step 5: Pattern Detection
Scan test files for:
- TODO/FIXME markers
assertTrue(true)patternsmarkTestIncomplete()/markTestSkipped()- Missing assertions after operations
- Mock abuse (mocking system under test)
- Sleep/timing dependencies
- Hardcoded IDs or database-dependent values
Integration with Implement-Issue Workflow
The test validator runs in Test Loop (Stage 8 of implement-issue):
loop (max 10 iterations):
1. Run tests (php-test-validator)
→ If failed: laravel-backend-developer fixes → re-test
2. Validate test quality (php-test-validator)
→ If hollow/incomplete: laravel-backend-developer improves → re-validate
3. Both passed: exit loop
Example Iteration:
Iteration 1:
- Tests run: 45 passed, 3 failed
- Fix: laravel-backend-developer addresses failures
- Re-run: 48 passed
Iteration 2:
- Tests passed
- Quality audit: Found 2 TODO tests, 1 hollow assertion
- Fix: laravel-backend-developer completes TODOs, adds assertions
- Re-validate: All quality checks passed
Loop complete: Tests pass AND quality validated
Decision Framework
PASS when:
- ✓ All tests pass (no failures, no errors)
- ✓ Zero incomplete/skipped tests
- ✓ Zero TODO/FIXME markers
- ✓ All test methods have meaningful assertions
- ✓ Edge cases covered
- ✓ Error conditions tested
- ✓ No mock abuse detected
- ✓ No timing dependencies
FAIL when:
- ✗ Any test failures
- ✗ Tests marked incomplete/skipped
- ✗ TODO/FIXME in test files
- ✗ Tests without assertions
- ✗ Only happy path tested
- ✗ Mocking system under test
- ✗ PHPUnit reports "risky" tests
- ✗ Tests would pass even with broken code
Cross-Language Adaptation
Python (pytest):
# Similar anti-patterns
def test_user_creation():
pass # FAIL: Empty test
def test_validation():
assert True # FAIL: Hollow assertion
def test_api(mocker):
service = mocker.Mock(UserService)
service.create.return_value = User()
# FAIL: Mocking system under test
JavaScript (Jest):
// Similar detection
test('creates user', () => {
// FAIL: No expectations
service.createUser(data);
});
test('validates input', () => {
expect(result).toBeTruthy(); // FAIL: Vague assertion
});
test('service method', () => {
const mock = jest.fn().mockReturnValue(user);
// FAIL: Mock bypasses logic
});
Go (testing package):
// Similar patterns
func TestUserCreation(t *testing.T) {
// FAIL: No assertions
service.CreateUser(data)
}
func TestValidation(t *testing.T) {
if result != nil {
// FAIL: Only checking existence
}
}
Key Insights
Why This Matters:
- Traditional CI only checks "tests pass"
- Passing tests ≠ good tests
- Bad tests provide false confidence
- Bugs slip through to production
- Technical debt accumulates
What's Different:
- Two-phase validation (execution + quality)
- Automated quality auditing
- Agent detects anti-patterns
- Actionable, specific feedback
- Prevents "checkbox testing"
Benefits:
- Catches hollow tests before merge
- Enforces meaningful test coverage
- Reduces false confidence
- Improves actual test quality
- Teaches better testing patterns
Replication Strategy
1. Define Anti-Patterns for Your Stack:
# .claude/agents/test-validator.md
Anti-patterns:
- TODO markers
- Empty test bodies
- Hollow assertions
- Mock abuse
- Missing edge cases
- No negative tests
2. Build Test Runner + Auditor:
# Step 1: Run tests
pytest --verbose
# Step 2: Static analysis
grep -r "TODO\|FIXME" tests/
grep -r "assert True" tests/
# Step 3: Coverage check
pytest --cov=src --cov-report=term-missing
3. Create Quality Schemas:
{
"verdict": "pass|fail",
"test_execution": {
"passed": 45,
"failed": 0,
"skipped": 0
},
"quality_issues": [
{
"type": "hollow_assertion",
"file": "tests/test_user.py",
"line": 67,
"fix_required": "Add specific value assertions"
}
]
}
4. Integrate with Workflow:
# After implementation
run_tests()
if tests_fail:
fix_and_retest()
validate_test_quality()
if quality_fail:
improve_tests()
5. Track Metrics:
- Test quality improvements over time
- Common anti-patterns in your codebase
- Effectiveness of different agents
- Time saved catching issues early
7. Project-Specific Skills
Concept: While many foundational skills come from the community, project-specific skills capture your unique workflow, conventions, and domain knowledge.
Custom Skills in This Project:
Created from Books/Documentation:
-
bulletproof-frontend/ - CSS architecture from "Handcrafted CSS: More Bulletproof Web Design"
- Process: PDF → text → Claude extraction → 5+ refinement rounds
- Initial draft: Generic CSS patterns
- Iteration 1: Added project design tokens
- Iteration 2: Added "No Tailwind" anti-pattern
- Iteration 3: Added Blade template specifics
- Iteration 4: Added coordination with Laravel agent
- Iteration 5+: Real code review feedback incorporated
-
ui-design-fundamentals/ - Component patterns from same book
- Multiple supporting files (buttons.md, forms.md, colors.md, typography.md)
- Each component extracted separately then refined with project examples
Created from Team Experience:
- handle-issues/ - GitHub issue workflow specific to the team
- implement-issue/ - End-to-end implementation pipeline for this project
- process-pr/ - Pull request review process matching team standards
- review-ui/ - UI review criteria specific to design system
- write-docblocks/ - Documentation standards for this codebase
- brainstorming/ - Structured ideation process for this team
Key Differences from Community Skills:
- Reference project-specific tools and conventions
- Include actual file paths and directory structures
- Mention specific agents by name for coordination
- Capture team-specific anti-patterns learned from real mistakes
- Integrate with project automation (hooks, scripts)
Example: From Book to Skill (skills/bulletproof-frontend/SKILL.md):
---
name: bulletproof-frontend
description: Use for CSS architecture, responsive design, Blade templates
---
# Created from "Handcrafted CSS: More Bulletproof Web Design"
# Refined through multiple rounds with project specifics
## Project Context
**Tech Stack**: Laravel Blade, PostCSS, No Tailwind (semantic CSS only)
**Design System**: Custom tokens in /resources/css/tokens/
**Browser Support**: Last 2 versions, IE11 graceful degradation
**Coordination**: Defer PHP logic to laravel-backend-developer agent
## Anti-Patterns (from actual code reviews)
- **NEVER use Tailwind utility classes** - converts to semantic CSS
- **Avoid inline styles** - all styling in dedicated CSS files
- **No !important** - specificity issues indicate architecture problem
Book Extraction Process:
- Convert PDF to text (if needed)
- Feed to Claude: "Extract key CSS concepts into skill format"
- Review initial draft (generic patterns)
- Add project tech stack and tooling
- Include team conventions (no Tailwind, semantic CSS)
- Add coordination rules (defer to backend agent)
- Test with real refactoring tasks
- Incorporate feedback from code reviews
- Iterate (this skill had 5+ refinement rounds)
Why Project-Specific Skills Matter:
- Capture institutional knowledge that isn't generic
- Enable new team members (or AI) to understand conventions quickly
- Coordinate with project's specific agent ecosystem
- Reference actual project structure and tools
- Evolve with the project through continuous refinement
Replication Strategy:
- Start with appropriate source: community for methodology, books for domain expertise, experience for workflows
- Create initial draft through adaptation/extraction/capture
- Add project context (tech stack, tools, directory structure)
- Document coordination with your specific agents
- Capture anti-patterns from actual code reviews
- Reference real file paths and commands
- Test with real tasks
- Refine through multiple iterations
- Keep refining as project evolves
8. Prompt Templates
Concept: Reusable, parameterized prompts for common tasks.
Key Patterns:
- Placeholder Syntax -
{{variable_name}}for parameter substitution - Context Sections - structured information (issue, requirements, constraints)
- Output Format - explicit structure requirements
- Example Responses - show expected output format
Example Application (prompts/frontend/refactor-blade-thorough.md):
# Blade Refactoring Prompt
## Context
File: {{file_path}}
Issues: {{identified_issues}}
## Requirements
- Convert utility classes to semantic CSS
- Follow design system patterns
- Maintain accessibility
## Output Format
- Files changed
- CSS added
- Classes replaced
- Testing performed
Why This Works:
- Consistency across invocations
- Easy to maintain and update
- Clear expectations for outputs
- Parameterization enables reuse
Replication Strategy:
- Identify repetitive prompt patterns
- Extract parameters as placeholders
- Include context, requirements, and output format
- Provide examples of expected outputs
- Store in prompts/ directory by category
File-Specific Applications:
- prompts/frontend/audit-blade.md: Systematic Blade template analysis
- prompts/frontend/refactor-blade-basic.md: Quick refactoring for simple cases
- prompts/frontend/refactor-blade-thorough.md: Deep refactoring with testing
11. Foundational Skills from Multiple Sources
Concept: Several foundational skills came from different sources and were adapted/refined for the project.
Skills and Their Origins:
From Community (obra/superpowers):
test-driven-development- TDD workflow, customized with project test runnerssystematic-debugging- Root cause analysis, adapted with project debugging toolsdispatching-parallel-agents- Parallel execution frameworksubagent-driven-development- Multi-agent coordination, adapted for Laravel workflowwriting-skills/writing-agents- Meta-skills for extending the systemusing-skills- Skill discovery and invocation patterns
From Books/Documentation:
ui-design-fundamentals- Extracted from "Handcrafted CSS: More Bulletproof Web Design"bulletproof-frontend- CSS architecture patterns from same book- Both refined through 5+ rounds of adding project specifics
From Experience:
handle-issues- GitHub workflow captured from team processprocess-pr- PR review process from actual code reviewsimplement-issue- End-to-end pipeline evolved over multiple iterationsbrainstorming- Team ideation process documented
The Refinement Pattern:
All skills, regardless of origin, went through similar evolution:
- Initial draft (adapted/extracted/captured)
- Project context (tech stack, directory structure, tools)
- Team patterns (conventions from code reviews)
- Anti-patterns (mistakes from actual failures)
- Coordination (references to specific agents)
- Testing (validation with real tasks)
- Iteration (multiple refinement rounds)
Why Multiple Sources Work:
- Community skills provide proven methodologies
- Books provide authoritative domain expertise
- Experience captures unique team workflows
- All need project-specific customization to be effective
9. Git Worktrees
Key Patterns:
- Branch Isolation - each worktree on different branch
- Shared Git State - common .git directory
- Parallel Development - work on multiple features simultaneously
- Clean Switching - no stashing required
Example Application (skills/using-git-worktrees/SKILL.md):
# Create worktree for feature branch
git worktree add ../project-feature-x feature/x
# Work in that directory
cd ../project-feature-x
# When done, remove worktree
git worktree remove ../project-feature-x
Why This Works:
- No branch switching interrupts work
- Can test multiple branches simultaneously
- Clean separation of concerns
- Easier subagent coordination (each in own worktree)
Replication Strategy:
- Document worktree commands for your workflow
- Explain when to use vs regular branching
- Include cleanup procedures
- Show integration with orchestration scripts
10. Schema Validation
Concept: JSON schemas define expected structure for stage outputs, enabling validation.
Key Patterns:
- One Schema Per Stage - explicit output structure
- Type Safety - validate data types
- Required Fields - prevent missing data
- Format Constraints - URLs, dates, enums
Example Application (scripts/schemas/implement-issue-setup.json):
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["branch_name", "worktree_path", "tasks"],
"properties": {
"branch_name": {
"type": "string",
"pattern": "^feature/issue-[0-9]+"
},
"worktree_path": {
"type": "string"
},
"tasks": {
"type": "array",
"items": {
"type": "object",
"required": ["description", "agent"],
"properties": {
"description": {"type": "string"},
"agent": {"type": "string"}
}
}
}
}
}
Why This Works:
- Fail fast on invalid outputs
- Self-documenting stage contracts
- Enables reliable orchestration
- Catches errors before downstream stages
Replication Strategy:
- Define one schema per workflow stage
- Specify all required fields
- Add format validations (patterns, enums)
- Validate in orchestration scripts
- Use schemas as documentation
File-Specific Applications (all in scripts/schemas/):
- implement-issue-setup.json: Branch and worktree creation
- implement-issue-plan.json: Implementation plan structure
- implement-issue-implement.json: Task completion tracking
- implement-issue-test.json: Test results and coverage
- implement-issue-pr.json: Pull request metadata
Cross-Stack Replication Guide
For Any Language/Framework
1. Create Directory Structure
mkdir -p .claude/{agents,hooks,prompts,scripts,skills}
2. Build Your Skill Library (Choose Your Approach)
Approach A: Adapt from Community
# Browse and copy foundational skills
cp -r superpowers/skills/using-skills .claude/skills/
cp -r superpowers/skills/test-driven-development .claude/skills/
cp -r superpowers/skills/systematic-debugging .claude/skills/
# Customize for your project
# - Update test runner commands (pytest, jest, cargo test)
# - Add your linting/formatting tools
# - Include your debugging tools and workflows
Approach B: Extract from Books/Documentation
# Example: Extract React patterns from official docs
# 1. Copy React documentation sections to file
# 2. Use Claude to extract patterns:
"I have the React documentation on hooks. Please extract:
- Key concepts into a skill (skills/react-hooks/SKILL.md)
- Common patterns and anti-patterns
- Project-specific: We use TypeScript strict mode
- Include examples using our design system"
# 3. Iterate through 3-5 refinement rounds
# 4. Add team-specific patterns from code reviews
Approach C: Capture from Experience
# Document your unique workflows
mkdir .claude/skills/deployment-workflow
mkdir .claude/skills/incident-response
# Write skills capturing your team's actual process
# Include: Tools used, commands, anti-patterns from actual incidents
3. Create Language-Specific Agents
# Example: Python Django Agent
---
name: django-backend-developer
description: Senior Python/Django developer. Use for models, views, serializers, middleware, ORM queries, migrations, and pytest.
---
You are a senior Python/Django developer with expertise in Django 5.x, Python 3.12, and PostgreSQL...
## Anti-Patterns to Avoid
- **N+1 queries** - always use `select_related()` and `prefetch_related()`
- **Never use `.filter().count()`** - use `.count()` directly
- **Use `get_object_or_404()`** - not try/except DoesNotExist
4. Configure Hooks
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [{
"type": "command",
"command": "black $file && isort $file",
"timeout": 30
}]
}
]
}
}
5. Build Orchestration Scripts
- Adapt state machine to your workflow stages
- Use JSON schemas for validation
- Implement resume capability
- Add rate limit handling
Technology-Specific Examples
React/TypeScript Project:
.claude/
├── agents/
│ ├── react-component-developer.md
│ ├── typescript-type-architect.md
│ └── jest-test-specialist.md
├── skills/
│ ├── test-driven-development/ # Adapted from community
│ ├── react-patterns/ # Extracted from React docs
│ ├── typescript-patterns/ # Extracted from TS handbook
│ ├── component-testing/ # Captured from team experience
│ └── deployment-workflow/ # Captured from team process
└── settings.json (ESLint + Prettier hooks)
Python Data Science Project:
.claude/
├── agents/
│ ├── data-engineer.md
│ ├── ml-model-developer.md
│ └── jupyter-notebook-specialist.md
├── skills/
│ ├── test-driven-development/ # Adapted from community
│ ├── data-validation/ # Extracted from "Data Quality" book
│ ├── model-evaluation/ # Extracted from ML textbooks
│ ├── visualization-patterns/ # Captured from team standards
│ └── experiment-tracking/ # Captured from workflow
└── settings.json (black + mypy hooks)
DevOps/Infrastructure Project:
.claude/
├── agents/
│ ├── terraform-architect.md
│ ├── kubernetes-operator.md
│ └── ci-cd-engineer.md
├── skills/
│ ├── systematic-debugging/ # Adapted from community
│ ├── infrastructure-as-code/ # Extracted from HashiCorp docs
│ ├── deployment-strategies/ # Extracted from "Release It!" book
│ ├── incident-response/ # Captured from actual incidents
│ └── monitoring-observability/ # Captured from team runbooks
└── settings.json (terraform fmt hooks)
Skill Source Strategy by Domain:
| Domain | Adapt from Community | Extract from Books/Docs | Capture from Experience |
|---|---|---|---|
| Methodology | TDD, debugging, git | N/A | Team retrospectives |
| Framework | General patterns | Official documentation | Project conventions |
| Design | Basic principles | Design books, style guides | Design system |
| Architecture | SOLID, patterns | Architecture books | System decisions |
| DevOps | Git workflows | Tool documentation | Incident runbooks |
| Domain Logic | N/A | Domain textbooks | Business rules |
Key Success Factors
1. Choose the Right Source for Each Skill
Don't force one approach for everything:
- Methodologies (TDD, debugging) → Adapt from community
- Domain expertise (CSS, security, ML) → Extract from books
- Team workflows (deployment, PR process) → Capture from experience
- Most skills combine multiple sources through iteration
2. Expect Multiple Refinement Rounds
Initial drafts are starting points, not final products:
- Round 1: Get the basic structure (adapt/extract/capture)
- Round 2: Add project tech stack and tools
- Round 3: Include team conventions and patterns
- Round 4: Add anti-patterns from real code reviews
- Round 5+: Continuous refinement based on usage
Example: UI design skill evolution
- Draft: Generic CSS patterns from book
- Round 1: Project design tokens
- Round 2: "No Tailwind" from team decision
- Round 3: Blade template specifics
- Round 4: Coordination with backend agent
- Round 5: Real refactoring examples
3. Work Through Issues to Completion, Then Update
The Continuous Improvement Loop - Most Important Pattern
When skills, agents, or workflows fail or produce incorrect results, follow this process:
Step 1: Don't Update Yet - Solve the Problem First
❌ WRONG: Agent fails → immediately edit agent → hope it works
✅ RIGHT: Agent fails → work through to correct solution → update agent
Why this matters: You need to understand the correct solution before you can teach it. Updating before solving often encodes incorrect assumptions or partial solutions.
Step 2: Work Through to the Correct Solution
Use Claude to iteratively debug and reach the right answer:
Agent produces incorrect code → Run tests (fail)
↓
Analyze failure → Understand root cause
↓
Try fix attempt 1 → Run tests (still fail, different error)
↓
Analyze new failure → Refine understanding
↓
Try fix attempt 2 → Run tests (pass)
↓
Verify solution is correct, not just passing
↓
NOW you have the correct solution
Step 3: Ask Claude to Update the Skill/Agent
Once you have the correct solution, prompt:
"I just encountered this issue: [describe problem]
The agent/skill did: [incorrect behavior]
The correct solution was: [working solution]
Please update [skill/agent name] to prevent this issue. Add:
1. Specific guidance that would have caught this
2. An anti-pattern entry for the incorrect approach
3. An example showing the correct pattern
4. A red flag if this is a common rationalization"
Real Example from Project:
Issue Encountered:
// Agent wrote this (seems to work, but breaks in production)
public function getUsers() {
return User::all(); // Works in dev (100 users), OOM in prod (1M users)
}
Work Through Process:
Iteration 1: Add pagination
public function getUsers() {
return User::paginate(50); // Better, but breaks API contract
}
Iteration 2: Add chunking
public function getUsers() {
return User::chunk(1000, function($users) {
// Process batch
});
} // Wrong pattern for this use case
Iteration 3: Correct solution
public function getUsers(int $page = 1, int $perPage = 50) {
return User::paginate($perPage, ['*'], 'page', $page);
// Returns paginated response, maintains API contract
}
Update Agent:
## Anti-Patterns to Avoid
- **NEVER use `Model::all()` on large tables**
- Problem: Loads entire table into memory (OOM in production)
- Symptom: Works in dev, fails in production with large datasets
- Solution: Always use pagination: `Model::paginate($perPage)`
- Red flag: "It works in my local database"
## Red Flags - STOP and Reconsider
- "It works with my test data" → Test with production-scale data
- "Model::all() is simpler" → Simplicity that breaks at scale is complexity
Step 4: Test the Update
Run the same scenario with updated skill/agent:
- Does it now produce correct code?
- Does it catch the anti-pattern?
- Does it provide the right guidance?
If not, refine the update and retest.
4. Build Knowledge from Failures
Failure → Refinement Cycle:
digraph improvement {
rankdir=LR;
"Use skill/agent" [shape=box];
"Issue occurs" [shape=diamond];
"Work through to correct solution" [shape=box, style=filled, fillcolor=yellow];
"Understand root cause" [shape=box, style=filled, fillcolor=yellow];
"Update skill/agent" [shape=box, style=filled, fillcolor=lightgreen];
"Add anti-pattern" [shape=box];
"Add red flag" [shape=box];
"Test updated version" [shape=box];
"Use skill/agent" -> "Issue occurs";
"Issue occurs" -> "Continue working" [label="no issue"];
"Issue occurs" -> "Work through to correct solution" [label="issue found"];
"Work through to correct solution" -> "Understand root cause";
"Understand root cause" -> "Update skill/agent";
"Update skill/agent" -> "Add anti-pattern";
"Add anti-pattern" -> "Add red flag";
"Add red flag" -> "Test updated version";
"Test updated version" -> "Use skill/agent" [label="improvement verified"];
}
Track Patterns Across Failures:
Keep a log of common issues:
## Common Issues Log
### Issue: Agent uses Model::all() on large tables
- Occurred: 3 times (UserService, OrderService, ProductService)
- Root cause: Agent doesn't consider production data scale
- Solution: Added "NEVER use Model::all()" anti-pattern
- Prevention: Added red flag "Works in dev" → "Test at scale"
- Result: Zero occurrences after update
### Issue: Tests with sleep() instead of event-based waiting
- Occurred: 5 times (async operations, polling, race conditions)
- Root cause: Agent defaults to timing instead of conditions
- Solution: Added condition-based-waiting skill
- Prevention: Red flag "Use sleep() to wait"
- Result: All new tests use proper wait patterns
5. Examples of Iterative Refinement
Example 1: Laravel Backend Agent
Initial Version (Generic):
description: PHP/Laravel backend developer
Anti-patterns:
- Write clean code
- Follow best practices
After Issue #1 (N+1 queries in UserController):
Anti-patterns:
- **N+1 prevention** — Always eager load with `with()`
- Never use `Model::all()` on large tables
After Issue #2 (Used env() in Service class):
Anti-patterns:
- **N+1 prevention** — Always eager load with `with()`
- **Never use `env()`** outside config files — Use `config()` helper
- Never use `Model::all()` on large tables
After Issue #3 (Tests lacked RefreshDatabase):
Anti-patterns:
- **N+1 prevention** — Always eager load with `with()`
- **Never use `env()`** outside config files
- Never use `Model::all()` on large tables
- **Missing `RefreshDatabase`** in feature tests — Tests contaminate each other
Red Flags:
- "Tests pass locally but fail in CI" → Missing RefreshDatabase
- "Works in dev" → Test with production-scale data
Example 2: Test Validator Agent
Initial Version:
Validate tests have assertions
After Hollow Test Issue:
### Hollow Assertions
Tests that pass but don't verify behavior:
```php
// FAIL: Only asserting response code, not content
public function test_api_returns_users(): void
{
$response = $this->get('/api/users');
$response->assertOk(); // What about the users?
}
Flag: Response checks without data validation
**After Mock Abuse Issue**:
```markdown
### Hollow Assertions
[previous content]
### Brittle/Cheating Mocks
Mocks that bypass the actual logic being tested:
```php
// FAIL: Mocking the system under test
public function test_user_service(): void
{
$service = $this->createMock(UserService::class);
$service->method('createUser')->willReturn(new User());
// Tests nothing!
}
Flag: Mocking the class being tested
### 6. When to Update vs When to Discard
**Update When**:
- Issue is fixable with clearer guidance
- Pattern is close but needs refinement
- Anti-pattern can prevent future occurrences
- Skill/agent is fundamentally sound
**Discard/Rewrite When**:
- Fundamental approach is wrong
- Skill fights against better patterns
- Multiple unrelated issues from same skill
- Easier to start fresh than patch
**Example - Discard**:
```markdown
# Original skill: "Always use mocks in unit tests"
# After issues: Actually need real objects for domain logic tests
# Decision: Skill fundamentally wrong, rewrite with nuance
# New skill: "Use mocks for boundaries, real objects for domain"
7. Pressure Testing After Updates
After updating a skill/agent, test under pressure:
Create Scenarios That Previously Failed:
Updated agent to avoid Model::all()
↓
Test: "Create a service that fetches all users"
↓
Does agent now use pagination?
↓
YES: Update verified
NO: Refine update, add more explicit guidance
Test Related Patterns:
Updated: "Never use env() outside config"
↓
Test: "Read database connection settings in service"
↓
Does agent use config('database.connection')?
↓
Verify it doesn't fall back to env()
Test Under Time Pressure:
Add to skill: "This is urgent, just make it work"
↓
Does agent still follow anti-patterns?
↓
If yes: Add stronger language, make non-negotiable
8. Track Improvements Over Time
Maintain a changelog for each skill/agent:
## CHANGELOG
### 2025-01-15: Added N+1 query prevention
- Issue: UserController loaded all orders without eager loading
- Solution: Added "Always use with()" anti-pattern
- Verification: Tested with large datasets, no more N+1s
### 2025-01-20: Added env() restriction
- Issue: Service class called env('API_KEY') directly
- Solution: Added "Never use env() outside config" rule
- Verification: Scanned codebase, all env() calls in config/
### 2025-01-25: Added RefreshDatabase reminder
- Issue: Feature tests contaminating each other
- Solution: Added "Missing RefreshDatabase" anti-pattern
- Verification: All new tests include trait
Benefits:
- See evolution over time
- Understand why rules exist
- Share learning with team
- Identify patterns in failures
9. Test with Real Tasks
Validate before relying on skills/agents:
- Skills: Run pressure scenarios with subagents
- Agents: Test on representative domain tasks
- Hooks: Verify in actual workflow
- Keep iterating until they work reliably
10. Document Rationale and Sources
Make origins and reasoning clear:
- Note source (community/book/experience)
- Explain why patterns exist
- Document what problems they solve
- Include attribution for adapted/extracted content
11. Maintain Discoverability
Keep skills findable:
- Rich, searchable descriptions
- Clear naming conventions
- Cross-references between components
- Regular pruning of unused skills
12. Balance Generic and Specific
Find the right level of abstraction:
- Too generic → Not actionable for your project
- Too specific → Breaks when project evolves
- Sweet spot → Project-specific but adaptable
Example balance:
# Too generic (not useful)
"Write good CSS"
# Too specific (breaks easily)
"Use class .btn-primary-lg-blue from line 47 of app.css"
# Right balance (project-specific but adaptable)
"Use semantic button classes from design system tokens
- .btn--primary for main actions
- .btn--secondary for supporting actions
See /resources/css/components/buttons.css"
Common Patterns Across Files
Pattern: Flowchart-Driven Decision Making
Files: All major skills (TDD, subagent-driven-development, dispatching-parallel-agents)
Concept: Visual flowcharts clarify when to use a pattern and how to execute it.
Implementation:
digraph decision {
"Have plan?" [shape=diamond];
"Tasks independent?" [shape=diamond];
"Use subagent workflow" [shape=box];
}
Why: Reduces cognitive load, provides clear decision criteria, visually communicates process.
Replicate For: Any multi-step process with decision points.
Pattern: Red Flags / Rationalization Tables
Files: TDD, writing-skills, subagent-driven-development
Concept: Anticipate and counter common justifications for skipping best practices.
Implementation:
| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
Why: Pre-emptively addresses resistance to discipline, makes violations obvious.
Replicate For: Any prescriptive methodology that might be circumvented under pressure.
Pattern: Skill-Specific Supporting Files
Files: systematic-debugging, ui-design-fundamentals, bulletproof-frontend
Concept: Main SKILL.md stays concise, detailed patterns in separate files.
Implementation:
skills/
ui-design-fundamentals/
SKILL.md # Overview + quick reference
buttons.md # Button-specific patterns
forms.md # Form-specific patterns
navigation.md # Navigation patterns
Why: Keeps main file scannable while providing depth when needed.
Replicate For: Skills with multiple sub-domains or extensive reference material.
Pattern: Explicit State Tracking
Files: implement-issue-orchestrator.sh, subagent-driven-development
Concept: Maintain explicit state that persists across subagent invocations.
Implementation:
# Track branch name explicitly
FEATURE_BRANCH="feature/issue-123"
# Include in every subagent dispatch
dispatch_implementer "$FEATURE_BRANCH" "$task_text"
Why: Subagents have no memory, must receive all context explicitly.
Replicate For: Multi-step workflows with fresh subagent per step.
Pattern: Two-Stage Review
Files: subagent-driven-development, code-reviewer
Concept: Separate spec compliance from code quality - different concerns, different reviewers.
Implementation:
1. Implement task
2. Spec reviewer: Does it match requirements?
3. Code quality reviewer: Is it well-built?
Why: Spec compliance prevents over/under-building, quality review ensures good implementation.
Replicate For: Any implementation workflow where "right thing" differs from "right way".
Conclusion
This project demonstrates a practical AI development system built through:
- Multiple Skill Sources - Community adaptation, book extraction, experience capture
- Project-Specific Automation - Custom agents, hooks, and orchestration scripts
- Workflow Integration - Multi-stage pipelines with state management
- Domain Specialization - Agents with clear scope and coordination protocols
- Continuous Improvement - Failure-driven refinement loop
The Three-Path Strategy:
Skills can be built through different approaches depending on the type:
Path 1: Adapt from Community
- Start: Browse obra/superpowers for foundational patterns
- Customize: Update examples to your tech stack
- Example: TDD, debugging, git workflows
Path 2: Extract from Books/Docs
- Start: Convert authoritative source to text
- Process: Have Claude extract concepts into skill format
- Refine: Add project context through multiple rounds
- Example: UI design fundamentals from "Handcrafted CSS"
Path 3: Capture from Experience
- Start: Identify recurring team patterns
- Document: Write skill with real anti-patterns
- Example: GitHub workflow, PR processes
The Critical Improvement Loop:
The most important pattern: Work through issues to the correct solution BEFORE updating skills/agents.
Issue occurs → Work through to correct solution → Understand root cause
↓
Update skill/agent → Add anti-pattern → Add red flag → Test update
↓
Verify improvement → Log the learning → Continue
This loop is what makes the system self-improving:
- Skills get better with each failure
- Anti-patterns accumulate real experience
- Red flags prevent future rationalizations
- The system learns from actual usage
What Worked in This Project:
| Skill Type | Approach | Refinement Rounds | Key Improvements |
|---|---|---|---|
| TDD methodology | Adapted from community | 2-3 | Added project test runners |
| Debugging patterns | Adapted from community | 3-4 | Added project-specific tools |
| UI design fundamentals | Book extraction | 5+ | Design tokens, no Tailwind, Blade specifics |
| Frontend architecture | Book extraction | 5+ | Project CSS architecture, agent coordination |
| GitHub workflows | Experience capture | 10+ | Real workflow failures → anti-patterns |
| Laravel backend agent | Created + experience | 15+ | N+1 queries, env() usage, RefreshDatabase |
| Test validator agent | Created + experience | 8+ | Hollow assertions, mock abuse, TODOs |
| Orchestration scripts | Created from scratch | 20+ | Resume capability, rate limits, state management |
Notice: More complex components (agents, orchestrators) had more refinement rounds because they encountered more real-world issues.
The Real Work:
Regardless of approach, the value comes from:
- Iterative refinement - Initial draft → project context → team patterns → anti-patterns from failures
- Working through issues - Don't update until you understand the correct solution
- Testing with real tasks - Pressure test skills, validate agents on actual work
- Capturing failures - Each issue becomes an anti-pattern or red flag
- Coordination between components - Agents reference skills, hooks trigger scripts, schemas validate
- Building institutional knowledge - System improves as it encounters and solves problems
Getting Started:
- Pick 5-10 foundational skills - TDD, debugging, git workflows (adapt from community)
- Create 2-3 domain skills - Your framework/language expertise (extract from books/docs)
- Document 1-2 team workflows - Your unique processes (capture from experience)
- Build 2-3 specialized agents - With project context and coordination rules
- Add automation hooks - For repetitive quality gates
- Use the system - Let it fail, work through issues, update, repeat
- Track improvements - Log common issues and how skills evolved
Key Success Factor:
Don't expect perfection on round 1. The first version of a skill/agent is a hypothesis. Real usage reveals issues. Working through those issues to the correct solution, then encoding that knowledge back into the skill/agent—that's what makes the system valuable.
Initial drafts get you started. The improvement loop is what makes it great.
After 6 months of use:
- Generic skills become project-specific
- Anti-patterns reflect actual mistakes
- Red flags catch real rationalizations
- Agents coordinate smoothly
- Workflows handle edge cases
- The system embodies team knowledge
This is infrastructure that improves with use, not documentation that rots. The effort compounds.