I was asked how I put together my .claude/ folder for a specific project and how I built my skills, agents, and so on. Worked with Claude to analyze my implementation, git history, and general background to produce the document below. # Claude Project Implementation Patterns Guide ## Overview This document explains the architectural patterns and implementation concepts used in Aaddrick's `.claude/` project configuration. These patterns can be replicated for any technology stack or project type. **Key Focus**: This guide shows multiple paths to building a .claude/ project configuration. Skills and agents can be: - **Adapted** from community sources like obra/superpowers - **Created** from books, documentation, or domain expertise - **Iteratively refined** through multiple rounds of project-specific customization ### Table of Contents 1. [Quick Reference: Three Paths to Skills](#quick-reference-three-paths-to-skills) 2. [Core Philosophy](#core-philosophy) 3. [Directory Structure & Purpose](#directory-structure--purpose) 4. [Implementation Concepts](#implementation-concepts) - [Skill Creation Approaches](#1-skill-creation-approaches) - [Agents System](#2-agents-system) - [Hook System](#3-hook-system) - [Orchestration Scripts](#4-orchestration-scripts) - [Implement-Issue: End-to-End Workflow](#5-implement-issue-end-to-end-workflow) - [Test Validation Methodology](#6-test-validation-methodology) - [Project-Specific Skills](#7-project-specific-skills) - [Prompt Templates](#8-prompt-templates) - [Git Worktrees](#9-git-worktrees) - [Schema Validation](#10-schema-validation) - [Foundational Skills from Multiple Sources](#11-foundational-skills-from-multiple-sources) 5. [Common Patterns Across Files](#common-patterns-across-files) 6. [Cross-Stack Replication Guide](#cross-stack-replication-guide) 7. [Key Success Factors](#key-success-factors) 8. [Conclusion](#conclusion) ### Quick Reference: Three Paths to Skills | Approach | Best For | Example Sources | This Project Examples | |----------|----------|----------------|---------------------| | **Adapt from Community** | Universal methodologies | obra/superpowers, open-source projects | TDD, debugging, git workflows, parallel dispatch | | **Extract from Books/Docs** | Domain expertise | Technical books, framework docs | UI design (from "Handcrafted CSS"), frontend patterns | | **Capture from Experience** | Team workflows | Retrospectives, incidents, reviews | GitHub workflow, PR process, deployment pipeline | All approaches benefit from **iterative refinement**: initial draft → project context → team patterns → anti-patterns → testing → continuous improvement. ### The Most Valuable Patterns 1. **Multiple Skill Sources** - Community adaptation, book extraction, experience capture 2. **Project-Specific Agents** - Creating specialized subagents with domain expertise 3. **Workflow Automation** - Hooks, orchestration scripts, and state management 4. **Iterative Refinement** - Starting with drafts and evolving through real usage This project demonstrates both approaches: foundational skills adapted from community sources (TDD, debugging), and domain-specific skills created from source material (UI design from "Handcrafted CSS", frontend patterns from experience). All skills evolved through multiple refinement rounds. ## Core Philosophy The project follows three fundamental principles: 1. **Test-Driven Documentation** - Skills and agents are validated through testing before deployment 2. **Autonomous Workflow Orchestration** - Multi-stage pipelines with state management and error recovery 3. **Specialization Through Composition** - Small, focused components that combine for complex behaviors ## Directory Structure & Purpose ``` .claude/ ├── agents/ # Specialized subagent personas ├── hooks/ # Lifecycle automation scripts ├── prompts/ # Reusable prompt templates ├── scripts/ # Orchestration and automation │ ├── schemas/ # JSON schemas for validation │ └── *-test/ # Test harnesses ├── skills/ # Reusable process documentation │ └── [skill-name]/ # Each skill in its own directory │ ├── SKILL.md # Main skill documentation │ └── *.md # Supporting documentation └── settings.json # Hook configuration and automation ``` ## Implementation Concepts ### 1. Skill Creation Approaches **Concept**: Skills can be built through three main approaches, each with different strengths. #### Approach A: Adapt from Community **Sources**: - [obra/superpowers](https://github.com/obra/superpowers) - Community-maintained skill library - Claude.ai skill marketplace (when available) - Other open-source .claude/ projects **Process**: 1. Browse community repositories for relevant patterns 2. Copy to your `.claude/skills/` directory 3. Modify descriptions to match your project triggers 4. Adapt examples to your tech stack (Laravel vs Django, React vs Vue) 5. Add project-specific conventions and anti-patterns 6. Test with your actual codebase **Example** (`skills/test-driven-development/`, `skills/systematic-debugging/`): - Copied from obra/superpowers - Updated test runner commands for project - Added project-specific test patterns - Minimal changes, mostly works as-is **Best for**: Foundational methodologies (TDD, debugging, git workflows) that are universal across projects. #### Approach B: Extract from Books/Documentation **Sources**: - Technical books (PDF → text conversion) - Official framework documentation - Architecture guides and papers - Domain-specific references **Process**: 1. Convert source material to text (if needed) 2. Feed to Claude with extraction prompt 3. Review and structure initial draft 4. Add project-specific context and examples 5. Include team patterns and anti-patterns 6. Iterate through multiple refinement rounds **Example** (`skills/ui-design-fundamentals/`, `skills/bulletproof-frontend/`): ``` Source: "Handcrafted CSS: More Bulletproof Web Design" (book) Process: 1. Converted PDF to text 2. Asked Claude: "Extract key concepts and guidance into a skill and agent" 3. Initial draft had generic CSS patterns 4. Added: Project's design system tokens 5. Added: "No Tailwind" anti-pattern from code reviews 6. Added: Blade template specifics for Laravel 7. Added: Coordination with laravel-backend-developer agent Result: Skill adapted to project's semantic CSS architecture ``` **Supporting Files Pattern**: ``` skills/ui-design-fundamentals/ SKILL.md # Overview + quick reference buttons.md # Extracted button patterns from book forms.md # Form patterns from book colors.md # Color theory + project tokens typography.md # Type scale + project fonts ``` **Best for**: Domain-specific expertise (design, security, performance) where authoritative sources exist. #### Approach C: Capture from Experience **Sources**: - Team retrospectives and lessons learned - Code review feedback patterns - Bug post-mortems - Workflow pain points **Process**: 1. Identify recurring issues or decisions 2. Document the pattern that solves them 3. Write skill with clear triggering conditions 4. Include red flags and anti-patterns from real mistakes 5. Test with team members 6. Refine based on actual usage **Example** (`skills/handle-issues/`, `skills/process-pr/`, `skills/implement-issue/`): - Created from team's GitHub workflow - Captures multi-stage process evolved over time - Includes specific GitHub CLI commands - References project's actual agents and scripts - Anti-patterns from actual workflow failures **Best for**: Workflows and processes unique to your team that aren't documented elsewhere. #### Common Patterns Across All Approaches **YAML Frontmatter**: ```yaml --- name: skill-name description: Use when [triggering conditions] --- ``` **CSO (Claude Search Optimization)**: - Descriptions focus on WHEN to use, not WHAT it does - Include concrete triggers, symptoms, and situations - Written in third person (injected into system prompt) **Iterative Refinement**: All approaches benefit from multiple rounds: 1. Initial draft (adapted/extracted/captured) 2. Add project-specific context 3. Test with real tasks 4. Add anti-patterns from failures 5. Refine based on usage 6. Repeat **Replication Strategy**: Choose your approach based on the skill type: - **Universal methodologies** → Adapt from community - **Domain expertise** → Extract from authoritative sources - **Team workflows** → Capture from experience - **Mix and match** → Most skills combine multiple sources ### 2. Agents System **Concept**: Specialized subagent personas with defined roles, scope, and coordination protocols. **Key Patterns**: - **Clear Persona Definition** - specific expertise and project context - **Explicit Scope Boundaries** - what the agent does AND doesn't do - **Deferral Rules** - when to hand off to other agents - **Anti-Patterns Section** - domain-specific mistakes to avoid - **Project Context** - structure, commands, and conventions **Example Application** (`agents/code-reviewer.md`): ```yaml --- name: code-reviewer description: Use when a major project step has been completed and needs review model: inherit --- You are a Senior Code Reviewer... ## CORE COMPETENCIES - Plan alignment analysis - Code quality assessment **Not in scope** (defer to bulletproof-frontend-developer): - CSS architecture refactoring ``` **Why This Works**: - Agents maintain consistent behavior through clear personas - Scope boundaries prevent overlap and enable specialization - Anti-patterns capture domain expertise **Replication Strategy**: 1. Research domain best practices via web search 2. Explore codebase to understand project patterns 3. Define clear persona with project-specific context 4. List specific anti-patterns (not generic advice) 5. Establish coordination protocols with other agents **File-Specific Applications**: - **laravel-backend-developer.md**: Backend specialist with PHP/Laravel expertise, SQL optimization rules - **bulletproof-frontend-developer.md**: Frontend specialist deferring backend work, CSS architecture focus - **bash-script-craftsman.md**: Shell scripting specialist with POSIX compliance and security patterns - **spec-reviewer.md**: Validates implementation against specifications (no code quality concerns) ### 3. Hook System **Concept**: Lifecycle-triggered automation that runs at specific points in the development workflow. **Key Patterns**: - **PreToolUse Hooks** - validation before actions (prevent accidents) - **PostToolUse Hooks** - cleanup after actions (formatting, simplification) - **Notification Hooks** - user alerts for specific conditions - **SessionStart Hooks** - context injection at conversation start **Example Application** (`hooks/session-start.sh`): ```bash #!/usr/bin/env bash # Injects using-skills content into conversation context using_skills_content=$(cat ".claude/skills/using-skills/SKILL.md") cat < "$STATUS_FILE" } # Stage execution with error handling run_stage() { local stage="$1" update_stage "$stage" "in_progress" if claude_cli_invoke "$stage" > "$stage_log" 2>&1; then update_stage "$stage" "completed" return 0 else update_stage "$stage" "failed" return 1 fi } ``` **Why This Works**: - State files enable inspection during long-running processes - JSON schemas validate stage outputs (fail fast) - Iteration limits prevent runaway processes - Resume capability saves time and API costs **Replication Strategy**: 1. Define workflow stages as a state machine 2. Create JSON schemas for each stage output 3. Implement status file with stage tracking 4. Add iteration limits for loops 5. Enable resume from status file 6. Handle rate limits with exponential backoff **File-Specific Applications**: - **implement-issue-orchestrator.sh**: 11-stage workflow (setup → research → evaluate → plan → implement → quality_loop → test_loop → docs → pr → pr_review → complete) - **batch-orchestrator.sh**: Parallel task execution with progress tracking - **batch-runner.sh**: Simple parallel execution wrapper ### 5. Implement-Issue: End-to-End Workflow **Concept**: A complete end-to-end orchestration system for taking a GitHub issue from assignment to merged PR, combining multiple skills, agents, and quality loops. This is the most complex pattern in the project - a production-grade workflow orchestrator that demonstrates how all the components work together. > **🔄 CRITICAL FEATURE: Resume Capability** > > The architecture is designed to handle interruptions gracefully. If the workflow is interrupted by: > - **Rate limits** (Claude API throttling) > - **Service outages** (Claude services temporarily unavailable) > - **System crashes** (computer loses power, process killed) > - **Network failures** (internet disconnection) > > You can resume exactly where you left off: > ```bash > ./implement-issue-orchestrator.sh --resume > ``` > > The orchestrator reads `status.json`, validates the worktree still exists, and continues from the last completed stage. This saves 20-60 minutes of redundant work and preserves all progress. State is synced to disk after every operation, so no work is lost. #### Architecture Overview **Three Layers**: 1. **Skill Layer** (`skills/implement-issue/SKILL.md`) - User-facing interface 2. **Orchestrator Script** (`scripts/implement-issue-orchestrator.sh`) - 1600+ line bash state machine 3. **Schema Layer** (`scripts/schemas/implement-issue-*.json`) - Stage output validation ``` User invokes skill ↓ Skill launches orchestrator script ↓ Orchestrator runs 11-stage pipeline ↓ Each stage validated against schema ↓ State tracked in status.json ↓ GitHub comments provide visibility ``` #### The 11-Stage Pipeline | Stage | Purpose | Agent | Output Schema | |-------|---------|-------|---------------| | **1. Setup** | Create worktree, fetch issue | default | `implement-issue-setup.json` | | **2. Research** | Explore codebase context | default | `implement-issue-research.json` | | **3. Evaluate** | Assess approach options | default | `implement-issue-evaluate.json` | | **4. Plan** | Create implementation plan | default | `implement-issue-plan.json` | | **5. Implement** | Execute each task | per-task | `implement-issue-implement.json` | | **6. Task Review** | Verify task met spec | spec-reviewer | `implement-issue-task-review.json` | | **7. Simplify** | Clean up code | fsa-code-simplifier | `implement-issue-simplify.json` | | **8. Test Loop** | Run tests → fix → repeat | php-test-validator | `implement-issue-test.json` | | **9. Docs** | Add documentation | phpdoc-writer | (inline) | | **10. PR** | Create/update PR | default | `implement-issue-pr.json` | | **11. PR Review** | Spec + quality review | reviewers | `implement-issue-review.json` | #### Quality Loops (Prevent Infinite Iterations) **Per-Task Quality Loop** (runs after each task during implement): ```bash for each task: 1. Implement task (agent per task type) 2. Task review (spec-reviewer checks requirements met) - If failed: Fix and re-review (max 3 attempts) 3. Simplify code (fsa-code-simplifier) 4. Code review (code-reviewer checks quality) - If failed: Fix and re-review (max 5 iterations) → Move to next task ``` **Test Loop** (runs once after all tasks): ```bash loop (max 10 iterations): 1. Run test suite (php-test-validator) - If failed: Fix tests → continue loop 2. Validate test quality (php-test-validator scoped to issue) - If failed: Improve tests → continue loop 3. If both passed: exit loop ``` **PR Review Loop** (runs at end): ```bash loop (max 3 iterations): 1. Spec review (spec-reviewer: does PR meet issue goals?) - If failed: Fix implementation → continue 2. Code review (code-reviewer: quality check) - If failed: Fix quality issues → continue 3. If both approved: complete ``` #### State Management **status.json Structure**: ```json { "state": "running", "issue": 123, "branch": "feature/issue-123-...", "worktree": "/path/to/worktree", "current_stage": "implement", "current_task": 2, "stages": { "setup": { "status": "completed", "started_at": "2025-01-15T10:00:00Z", "completed_at": "2025-01-15T10:02:00Z" }, "implement": { "status": "in_progress", "task_progress": "2/5" }, "test_loop": { "status": "pending", "iteration": 0 } }, "tasks": [ { "id": 1, "description": "Add user profile endpoint", "agent": "laravel-backend-developer", "status": "completed", "review_attempts": 1 }, { "id": 2, "description": "Create profile view", "agent": "bulletproof-frontend-developer", "status": "in_progress", "review_attempts": 0 } ], "quality_iterations": 2, "test_iterations": 1, "pr_review_iterations": 0, "log_dir": "logs/implement-issue/issue-123-20250115-100000" } ``` **Resume Capability**: ```bash # Original run fails at task 3 ./implement-issue-orchestrator.sh --issue 123 --branch main # [Interrupted: Rate limit hit, or service timeout, or Ctrl+C] # Resume from where it left off ./implement-issue-orchestrator.sh --resume # Reads status.json, validates worktree, continues from task 3 ``` **What Gets Preserved**: - ✓ Worktree and branch - ✓ All completed stages - ✓ Completed tasks (doesn't redo work) - ✓ Iteration counts (quality, test, PR review) - ✓ GitHub PR number (if already created) - ✓ Log directory and context **Real-World Resume Scenarios**: 1. **Rate Limit Hit** (Most Common) ``` Task 5 of 8 implementation → Rate limit (429 error) Status: Saved after task 4 completion Resume: Continues from task 5 Time Saved: ~25 minutes (4 completed tasks not redone) ``` 2. **Claude Service Outage** ``` During test loop iteration 3 → Service unavailable (503) Status: Saved after iteration 2 completion Resume: Continues test loop from iteration 3 Time Saved: ~15 minutes (prior test fixes preserved) ``` 3. **Computer Crash / Power Loss** ``` During PR creation → Computer loses power Status: Last sync after task review completion Resume: Skips all completed tasks, proceeds to PR creation Time Saved: ~40 minutes (all implementation preserved) ``` 4. **Network Failure** ``` During task 7 implementation → Internet disconnects Status: Saved after task 6 completion Resume: Validates worktree, continues from task 7 Time Saved: ~30 minutes ``` 5. **Manual Interruption** (Ctrl+C) ``` You need to stop and check something → Ctrl+C Status: Last completed stage saved Resume: Pick up exactly where stopped Time Saved: Flexibility to pause/resume workflow ``` **How Resume Works Internally**: ```bash # Load state from status.json load_resume_state() { ISSUE_NUMBER=$(jq -r '.issue' status.json) BRANCH=$(jq -r '.branch' status.json) WORKTREE=$(jq -r '.worktree' status.json) CURRENT_STAGE=$(jq -r '.current_stage' status.json) COMPLETED_STAGES=$(jq -r '.stages | to_entries | map(select(.value.status == "completed")) | map(.key)' status.json) } # Skip completed stages for stage in "${stages[@]}"; do if is_stage_completed "$stage"; then log "Skipping $stage (already completed)" continue fi run_stage "$stage" done ``` **State Sync Strategy** (Why Nothing Is Lost): ```bash # After EVERY operation, sync to disk update_stage() { # Update in-memory status.json jq '.stages[$stage].status = $status' status.json > tmp mv tmp status.json # Immediately sync to log directory cp status.json "$LOG_DIR/status.json" } # Even if process killed mid-operation, worst case: # - Last completed stage is preserved # - Current stage marked "in_progress" (safe to restart) # - No data corruption (atomic file moves) ``` **Resume Validation**: Before resuming, the orchestrator validates: 1. ✓ status.json exists and is valid JSON 2. ✓ Required fields present (issue, branch, worktree) 3. ✓ Worktree still exists at path 4. ✓ Worktree is a valid git worktree 5. ✓ State is resumable (not already completed) If validation fails, provides clear error message with remediation steps. **Why This Architecture Matters**: Long-running AI workflows (30-60 minutes) face inevitable interruptions: - API rate limits are unpredictable - Service outages happen - Local issues occur (power, network, crashes) Without resume capability: - ❌ Lose 30-60 minutes of work - ❌ Waste API quota redoing completed work - ❌ Regenerate same code multiple times - ❌ Re-run tests that already passed - ❌ Create duplicate GitHub comments With resume capability: - ✅ Continue exactly where stopped - ✅ Preserve all completed work - ✅ Save API quota - ✅ Save time (20-60 minutes) - ✅ Maintain clean GitHub comment history - ✅ Handle interruptions gracefully #### GitHub Integration **Automatic Comments** (14 comment points throughout workflow): 1. Starting automated processing 2. Evaluation: Best path 3. Implementation plan (with collapsible full plan) 4. Task list (markdown checklist) 5. Per-task: Implementation summary 6. Per-task: Spec review results 7. Per-task: Simplification summary 8. Per-task: Code review results 9. Test loop: Test results (each iteration) 10. Test loop: Validation results 11. Test loop: Fix summaries 12. PR created/updated 13. PR spec review 14. PR code review **Comment Format**: ```markdown ### Stage: Description ✅ **Result:** success Summary of what happened... _— agent-name_ ``` #### Error Handling **Rate Limits**: ```bash handle_rate_limit() { local wait_time="${1:-3600}" # Default 1 hour log "Rate limit hit. Waiting ${wait_time}s..." # Update status to show waiting jq --arg state "rate_limited" \ --argjson wait "$wait_time" \ '.state = $state | .wait_until = (now + $wait) | todate' \ "$STATUS_FILE" > "${STATUS_FILE}.tmp" sleep "$wait_time" # Resume jq '.state = "running"' "$STATUS_FILE" > "${STATUS_FILE}.tmp" } ``` **Max Iterations**: ```bash # Prevents infinite loops readonly MAX_TASK_REVIEW_ATTEMPTS=3 readonly MAX_QUALITY_ITERATIONS=5 readonly MAX_TEST_ITERATIONS=10 readonly MAX_PR_REVIEW_ITERATIONS=3 if (( iteration > MAX_ITERATIONS )); then log_error "Exceeded max iterations" set_final_state "max_iterations_exceeded" exit 2 fi ``` **Schema Validation**: ```bash validate_output() { local output="$1" local schema="$2" if ! jq -e . <<< "$output" > /dev/null 2>&1; then log_error "Invalid JSON output" return 1 fi # Validate against schema using ajv or similar if ! validate_json_schema "$output" "$SCHEMA_DIR/$schema"; then log_error "Output doesn't match schema: $schema" return 1 fi } ``` #### Logging System **Log Directory Structure**: ``` logs/implement-issue/issue-123-20250115-100000/ ├── orchestrator.log # Main orchestrator log ├── stages/ │ ├── 01-setup.log # Stage outputs │ ├── 02-research.log │ ├── 03-evaluate.log │ ├── 04-plan.log │ ├── 05-implement-task-1.log │ ├── 06-task-review-1.log │ ├── 07-simplify-1.log │ └── ... ├── context/ │ ├── setup-output.json # Parsed stage results │ ├── research-output.json │ ├── plan-output.json │ ├── tasks.json # Task list │ └── review-comments.json └── status.json # Final status snapshot ``` **Log Synchronization**: ```bash sync_status_to_log() { if [[ -n "$LOG_BASE" ]]; then cp "$STATUS_FILE" "$LOG_BASE/status.json" fi } # Called after every status update update_stage "setup" "completed" sync_status_to_log # Ensures log directory always has latest state ``` #### Monitoring **Watch Progress**: ```bash # Simple JSON view watch -n 5 'jq . status.json' # Focused view watch -n 5 'jq -c "{ state, stage:.current_stage, task:.current_task, quality:.quality_iterations, test:.test_iterations }" status.json' # Stage completion status jq '.stages | to_entries | map({ stage: .key, status: .value.status, started: .value.started_at })' status.json ``` **Log Tailing**: ```bash # Follow orchestrator log tail -f logs/implement-issue/issue-123-*/orchestrator.log # Follow current stage tail -f logs/implement-issue/issue-123-*/stages/$(ls -t logs/.../stages/ | head -1) ``` #### Integration with Other Components **Skills Used**: - `using-git-worktrees` - Worktree creation and management - `writing-plans` - Implementation plan generation - `subagent-driven-development` - Task execution pattern - `test-driven-development` - Test-first enforcement - `requesting-code-review` - Review prompt templates **Agents Invoked**: - `laravel-backend-developer` - Backend task implementation - `bulletproof-frontend-developer` - Frontend task implementation - `spec-reviewer` - Spec compliance verification - `code-reviewer` - Code quality assessment - `fsa-code-simplifier` - Code simplification (FSA = Feature Spec Adherence) - `php-test-validator` - Test execution and validation - `phpdoc-writer` - Documentation generation **Hooks Triggered**: - `PostToolUse` on file edits - Auto-formatting with Pint - `PostToolUse` on bash - PR simplification check #### Key Design Decisions **Why Bash for Orchestration?** - Native GitHub CLI integration - Easy file system operations (worktrees, logs) - jq for JSON manipulation - Shell portability - Direct command execution without subprocess overhead **Why Per-Task Quality Loops Instead of End-to-End?** - Catch issues early (cheaper to fix) - Smaller context per review (more focused) - Prevent cascading errors - Better progress granularity **Why Separate Spec and Quality Reviews?** - Different concerns: "right thing" vs "right way" - Spec review prevents over/under-building - Quality review ensures maintainability - Two reviews catch different issue types **Why JSON Schemas?** - Fail fast on malformed outputs - Self-documenting stage contracts - Enables reliable automation - Validates before expensive operations #### Replication for Your Stack **1. Define Your Pipeline Stages**: ```bash # Example for a different stack stages=( "setup" # Clone/setup workspace "analysis" # Static analysis "plan" # Implementation plan "implement" # Code generation "test" # Unit tests "integration" # Integration tests "security" # Security scan "docs" # Documentation "pr" # Pull request ) ``` **2. Create Stage Schemas**: ```json // schemas/your-workflow-implement.json { "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "required": ["status", "summary", "files_changed"], "properties": { "status": {"enum": ["success", "failed"]}, "summary": {"type": "string"}, "files_changed": {"type": "array", "items": {"type": "string"}} } } ``` **3. Build State Machine**: ```bash main() { init_status for stage in "${stages[@]}"; do if is_stage_completed "$stage"; then log "Skipping $stage (already completed)" continue fi run_stage "$stage" || handle_error "$stage" done finalize } ``` **4. Add Quality Loops**: ```bash run_quality_loop() { local max_iterations=5 for (( i=1; i<=max_iterations; i++ )); do result=$(run_quality_check) if [[ "$result" == "passed" ]]; then return 0 fi apply_fixes "$result" done return 1 } ``` **5. Implement Resume**: ```bash load_resume_state() { CURRENT_STAGE=$(jq -r '.current_stage' status.json) COMPLETED_STAGES=$(jq -r '.stages | to_entries | map(select(.value.status == "completed")) | map(.key) | .[]' status.json) } ``` #### Real-World Performance **Typical Execution**: - Simple feature (2-3 tasks): 10-15 minutes - Medium feature (5-7 tasks): 25-35 minutes - Complex feature (10+ tasks): 45-60 minutes **Iteration Counts** (from actual usage): - Quality loop iterations: Average 1-2, max 5 - Test loop iterations: Average 1-3, max 10 - PR review iterations: Average 1, max 3 **Resume Scenarios**: - Rate limit hit during task 5 of 8: Resume saves ~20 minutes - Test failures after 7 tasks complete: Resume saves ~30 minutes - API timeout during PR creation: Resume completes in ~2 minutes #### Why This Pattern Matters This orchestrator demonstrates: 1. **Production-Grade Automation** - Not a toy, handles real complexity 2. **State Machine Design** - Clear stages, resumable, monitorable 3. **Quality Gates** - Multiple checkpoints prevent bad code 4. **Error Recovery** - Graceful handling of failures 5. **Integration** - All components (skills, agents, hooks, schemas) working together 6. **Observability** - Real-time state, comprehensive logging, GitHub visibility It's the culmination of all the other patterns in this guide, showing how they compose into a working system. ### 6. Test Validation Methodology **Concept**: Automated test quality validation that goes beyond "tests pass" to ensure tests actually catch bugs. **Core Principle**: Tests that don't catch bugs are worse than no tests—they provide false confidence. This pattern demonstrates how to build AI agents that audit test quality, not just run tests. The methodology applies across languages (shown here with PHP/PHPUnit, but adaptable to pytest, Jest, Go testing, etc.). #### The Problem with "Tests Pass" ```bash # This passes, but catches nothing public function test_user_creation(): void { $this->assertTrue(true); // TODO: implement } # This passes, but is hollow public function test_api_endpoint(): void { $response = $this->get('/api/users'); $response->assertOk(); // What about the data? } # This passes, but mocks the system under test public function test_service(): void { $mock = $this->createMock(UserService::class); $mock->method('create')->willReturn(new User()); $result = $mock->create($data); // Tests nothing! } ``` All three tests pass. None catch bugs. Traditional CI/CD only checks "did tests pass?" not "are tests meaningful?" #### Two-Phase Validation **Phase 1: Execution** (Does it work?) - Run the full test suite - Check for failures, errors, skipped tests - Validate tests complete successfully - Capture runtime metrics **Phase 2: Quality Audit** (Does it catch bugs?) - Scan for TODO/FIXME/incomplete markers - Detect hollow assertions (`assertTrue(true)`) - Check for missing edge cases - Identify mock abuse patterns - Verify negative test cases exist - Validate assertion meaningfulness #### Test Validator Agent **Agent: `php-test-validator`** (uses Opus model for deep reasoning) **Responsibilities**: 1. **Run tests first** (mandatory) - static analysis alone is insufficient 2. **Audit test quality** - check for anti-patterns 3. **Check coverage** - every public method has tests 4. **Validate edge cases** - null, empty, negative, boundary conditions 5. **Detect cheating** - mocks that bypass actual logic 6. **Report actionable findings** - specific file:line issues **Output Format**: ```markdown ## Test Validation Report **Verdict:** PASS | FAIL ### Test Suite Execution Tests: 42 passed, 2 failed, 1 incomplete ### Critical Issues (Must Fix) 1. Incomplete test: tests/Unit/UserTest.php:45 - `$this->markTestIncomplete('TODO')` - Fix: Implement the test 2. Hollow assertion: tests/Feature/ApiTest.php:67 - Only checks response code, not data - Fix: Add assertions for returned user data ### Coverage Gaps | Method | Test Coverage | Gap | |--------|---------------|-----| | `UserService::create()` | ✓ Tested | - | | `UserService::delete()` | ✗ Missing | No test exists | | `UserService::validate()` | △ Partial | No edge cases | ``` #### The Seven Deadly Test Sins **1. TODO/FIXME/Incomplete Tests** (Automatic Failure) ```php // FAIL: Deferred testing public function test_feature(): void { $this->markTestIncomplete('TODO: implement'); } // FAIL: Placeholder public function test_something(): void { $this->assertTrue(true); // Will do later } ``` **Detection**: Scan for `markTestIncomplete()`, `markTestSkipped()`, `TODO` comments, `assertTrue(true)` patterns. **2. Hollow Assertions** ```php // FAIL: No assertions public function test_operation(): void { $service->doSomething(); // Passes if no exception } // FAIL: Tautological public function test_calculation(): void { $result = $service->calculate(10, 20); $this->assertNotNull($result); // But is it correct? } ``` **Detection**: Tests with zero assertions, or only existence checks without value validation. **3. Missing Edge Cases** ```php // Code handles edge cases public function process(?int $value): int { if ($value === null) return 0; if ($value < 0) throw new Exception(); return $value * 2; } // FAIL: Only happy path tested public function test_process(): void { $this->assertEquals(20, $service->process(10)); // Missing: null, negative, zero, large numbers } ``` **Detection**: Compare test cases against branches/conditions in implementation. **4. Mock Abuse** ```php // FAIL: Mocking the system under test public function test_user_service(): void { $service = $this->createMock(UserService::class); $service->method('createUser')->willReturn(new User()); $result = $service->createUser($data); // Tests nothing! } // FAIL: Mock returns exactly what test expects public function test_validation(): void { $validator = $this->mock(Validator::class); $validator->shouldReceive('validate')->andReturn(true); // Never tests if validation logic actually works } ``` **Detection**: Mocking the class being tested, or mocking with predetermined results that bypass logic. **5. Missing Negative Tests** ```php // Code has error handling public function create(array $data): User { if (empty($data['email'])) throw new ValidationException(); if (User::where('email', $data['email'])->exists()) { throw new DuplicateException(); } return User::create($data); } // FAIL: Only success case tested public function test_create_user(): void { $user = $service->create(['email' => 'test@test.com']); $this->assertInstanceOf(User::class, $user); // Missing: empty email, duplicate email } ``` **Detection**: Exception/error handling in code without corresponding expectException tests. **6. Empty or Broken Data Providers** ```php // FAIL: Empty provider #[DataProvider('userDataProvider')] public function test_validates_user(array $data): void { } public static function userDataProvider(): array { return []; // No test data! } ``` **Detection**: DataProvider annotation without method, or provider returning empty array. **7. Brittle or Flaky Patterns** ```php // FAIL: Timing-based tests public function test_async_operation(): void { $service->startAsync(); sleep(2); // Hope it finishes? $this->assertTrue($service->isComplete()); } // FAIL: Order-dependent tests #[Depends('test_creates_user')] public function test_updates_user(): void { // Breaks if test order changes } ``` **Detection**: `sleep()`/`usleep()` calls, `@depends` annotations, missing database refresh traits. #### Validation Process (Five Steps) **Step 1: Run Test Suite** (Mandatory First) ```bash cd project && php artisan test # or for specific files php artisan test --filter=UserServiceTest ``` Capture output: - Total passed/failed/skipped/incomplete - Risky tests (no assertions) flagged by PHPUnit - Execution time (unusually fast = potentially hollow) - Any runtime warnings **Step 2: Identify Test-Implementation Pairs** ``` app/Services/UserService.php → tests/Unit/Services/UserServiceTest.php app/Http/Controllers/UserController.php → tests/Feature/Http/Controllers/UserControllerTest.php ``` **Step 3: Coverage Check** For each public method: - Is there at least one test? - Are edge cases covered? - Are error conditions tested? - Do assertions verify actual behavior? **Step 4: Quality Audit** For each test method: - Has meaningful assertions (not just `assertOk()`) - Tests behavior, not implementation details - Mocks appropriately (dependencies, not system under test) - Would catch a bug if code broke **Step 5: Pattern Detection** Scan test files for: - TODO/FIXME markers - `assertTrue(true)` patterns - `markTestIncomplete()` / `markTestSkipped()` - Missing assertions after operations - Mock abuse (mocking system under test) - Sleep/timing dependencies - Hardcoded IDs or database-dependent values #### Integration with Implement-Issue Workflow The test validator runs in **Test Loop** (Stage 8 of implement-issue): ```bash loop (max 10 iterations): 1. Run tests (php-test-validator) → If failed: laravel-backend-developer fixes → re-test 2. Validate test quality (php-test-validator) → If hollow/incomplete: laravel-backend-developer improves → re-validate 3. Both passed: exit loop ``` **Example Iteration**: ``` Iteration 1: - Tests run: 45 passed, 3 failed - Fix: laravel-backend-developer addresses failures - Re-run: 48 passed Iteration 2: - Tests passed - Quality audit: Found 2 TODO tests, 1 hollow assertion - Fix: laravel-backend-developer completes TODOs, adds assertions - Re-validate: All quality checks passed Loop complete: Tests pass AND quality validated ``` #### Decision Framework **PASS when**: - ✓ All tests pass (no failures, no errors) - ✓ Zero incomplete/skipped tests - ✓ Zero TODO/FIXME markers - ✓ All test methods have meaningful assertions - ✓ Edge cases covered - ✓ Error conditions tested - ✓ No mock abuse detected - ✓ No timing dependencies **FAIL when**: - ✗ Any test failures - ✗ Tests marked incomplete/skipped - ✗ TODO/FIXME in test files - ✗ Tests without assertions - ✗ Only happy path tested - ✗ Mocking system under test - ✗ PHPUnit reports "risky" tests - ✗ Tests would pass even with broken code #### Cross-Language Adaptation **Python (pytest)**: ```python # Similar anti-patterns def test_user_creation(): pass # FAIL: Empty test def test_validation(): assert True # FAIL: Hollow assertion def test_api(mocker): service = mocker.Mock(UserService) service.create.return_value = User() # FAIL: Mocking system under test ``` **JavaScript (Jest)**: ```javascript // Similar detection test('creates user', () => { // FAIL: No expectations service.createUser(data); }); test('validates input', () => { expect(result).toBeTruthy(); // FAIL: Vague assertion }); test('service method', () => { const mock = jest.fn().mockReturnValue(user); // FAIL: Mock bypasses logic }); ``` **Go (testing package)**: ```go // Similar patterns func TestUserCreation(t *testing.T) { // FAIL: No assertions service.CreateUser(data) } func TestValidation(t *testing.T) { if result != nil { // FAIL: Only checking existence } } ``` #### Key Insights **Why This Matters**: - Traditional CI only checks "tests pass" - Passing tests ≠ good tests - Bad tests provide false confidence - Bugs slip through to production - Technical debt accumulates **What's Different**: - Two-phase validation (execution + quality) - Automated quality auditing - Agent detects anti-patterns - Actionable, specific feedback - Prevents "checkbox testing" **Benefits**: - Catches hollow tests before merge - Enforces meaningful test coverage - Reduces false confidence - Improves actual test quality - Teaches better testing patterns #### Replication Strategy **1. Define Anti-Patterns for Your Stack**: ```yaml # .claude/agents/test-validator.md Anti-patterns: - TODO markers - Empty test bodies - Hollow assertions - Mock abuse - Missing edge cases - No negative tests ``` **2. Build Test Runner + Auditor**: ```bash # Step 1: Run tests pytest --verbose # Step 2: Static analysis grep -r "TODO\|FIXME" tests/ grep -r "assert True" tests/ # Step 3: Coverage check pytest --cov=src --cov-report=term-missing ``` **3. Create Quality Schemas**: ```json { "verdict": "pass|fail", "test_execution": { "passed": 45, "failed": 0, "skipped": 0 }, "quality_issues": [ { "type": "hollow_assertion", "file": "tests/test_user.py", "line": 67, "fix_required": "Add specific value assertions" } ] } ``` **4. Integrate with Workflow**: ```bash # After implementation run_tests() if tests_fail: fix_and_retest() validate_test_quality() if quality_fail: improve_tests() ``` **5. Track Metrics**: - Test quality improvements over time - Common anti-patterns in your codebase - Effectiveness of different agents - Time saved catching issues early ### 7. Project-Specific Skills **Concept**: While many foundational skills come from the community, project-specific skills capture your unique workflow, conventions, and domain knowledge. **Custom Skills in This Project**: **Created from Books/Documentation**: - **bulletproof-frontend/** - CSS architecture from "Handcrafted CSS: More Bulletproof Web Design" - Process: PDF → text → Claude extraction → 5+ refinement rounds - Initial draft: Generic CSS patterns - Iteration 1: Added project design tokens - Iteration 2: Added "No Tailwind" anti-pattern - Iteration 3: Added Blade template specifics - Iteration 4: Added coordination with Laravel agent - Iteration 5+: Real code review feedback incorporated - **ui-design-fundamentals/** - Component patterns from same book - Multiple supporting files (buttons.md, forms.md, colors.md, typography.md) - Each component extracted separately then refined with project examples **Created from Team Experience**: - **handle-issues/** - GitHub issue workflow specific to the team - **implement-issue/** - End-to-end implementation pipeline for this project - **process-pr/** - Pull request review process matching team standards - **review-ui/** - UI review criteria specific to design system - **write-docblocks/** - Documentation standards for this codebase - **brainstorming/** - Structured ideation process for this team **Key Differences from Community Skills**: - Reference project-specific tools and conventions - Include actual file paths and directory structures - Mention specific agents by name for coordination - Capture team-specific anti-patterns learned from real mistakes - Integrate with project automation (hooks, scripts) **Example: From Book to Skill** (`skills/bulletproof-frontend/SKILL.md`): ```yaml --- name: bulletproof-frontend description: Use for CSS architecture, responsive design, Blade templates --- # Created from "Handcrafted CSS: More Bulletproof Web Design" # Refined through multiple rounds with project specifics ## Project Context **Tech Stack**: Laravel Blade, PostCSS, No Tailwind (semantic CSS only) **Design System**: Custom tokens in /resources/css/tokens/ **Browser Support**: Last 2 versions, IE11 graceful degradation **Coordination**: Defer PHP logic to laravel-backend-developer agent ## Anti-Patterns (from actual code reviews) - **NEVER use Tailwind utility classes** - converts to semantic CSS - **Avoid inline styles** - all styling in dedicated CSS files - **No !important** - specificity issues indicate architecture problem ``` **Book Extraction Process**: 1. Convert PDF to text (if needed) 2. Feed to Claude: "Extract key CSS concepts into skill format" 3. Review initial draft (generic patterns) 4. Add project tech stack and tooling 5. Include team conventions (no Tailwind, semantic CSS) 6. Add coordination rules (defer to backend agent) 7. Test with real refactoring tasks 8. Incorporate feedback from code reviews 9. Iterate (this skill had 5+ refinement rounds) **Why Project-Specific Skills Matter**: - Capture institutional knowledge that isn't generic - Enable new team members (or AI) to understand conventions quickly - Coordinate with project's specific agent ecosystem - Reference actual project structure and tools - Evolve with the project through continuous refinement **Replication Strategy**: 1. Start with appropriate source: community for methodology, books for domain expertise, experience for workflows 2. Create initial draft through adaptation/extraction/capture 3. Add project context (tech stack, tools, directory structure) 4. Document coordination with your specific agents 5. Capture anti-patterns from actual code reviews 6. Reference real file paths and commands 7. Test with real tasks 8. Refine through multiple iterations 9. Keep refining as project evolves ### 8. Prompt Templates **Concept**: Reusable, parameterized prompts for common tasks. **Key Patterns**: - **Placeholder Syntax** - `{{variable_name}}` for parameter substitution - **Context Sections** - structured information (issue, requirements, constraints) - **Output Format** - explicit structure requirements - **Example Responses** - show expected output format **Example Application** (`prompts/frontend/refactor-blade-thorough.md`): ```markdown # Blade Refactoring Prompt ## Context File: {{file_path}} Issues: {{identified_issues}} ## Requirements - Convert utility classes to semantic CSS - Follow design system patterns - Maintain accessibility ## Output Format - Files changed - CSS added - Classes replaced - Testing performed ``` **Why This Works**: - Consistency across invocations - Easy to maintain and update - Clear expectations for outputs - Parameterization enables reuse **Replication Strategy**: 1. Identify repetitive prompt patterns 2. Extract parameters as placeholders 3. Include context, requirements, and output format 4. Provide examples of expected outputs 5. Store in prompts/ directory by category **File-Specific Applications**: - **prompts/frontend/audit-blade.md**: Systematic Blade template analysis - **prompts/frontend/refactor-blade-basic.md**: Quick refactoring for simple cases - **prompts/frontend/refactor-blade-thorough.md**: Deep refactoring with testing ### 11. Foundational Skills from Multiple Sources **Concept**: Several foundational skills came from different sources and were adapted/refined for the project. **Skills and Their Origins**: **From Community (obra/superpowers)**: - `test-driven-development` - TDD workflow, customized with project test runners - `systematic-debugging` - Root cause analysis, adapted with project debugging tools - `dispatching-parallel-agents` - Parallel execution framework - `subagent-driven-development` - Multi-agent coordination, adapted for Laravel workflow - `writing-skills` / `writing-agents` - Meta-skills for extending the system - `using-skills` - Skill discovery and invocation patterns **From Books/Documentation**: - `ui-design-fundamentals` - Extracted from "Handcrafted CSS: More Bulletproof Web Design" - `bulletproof-frontend` - CSS architecture patterns from same book - Both refined through 5+ rounds of adding project specifics **From Experience**: - `handle-issues` - GitHub workflow captured from team process - `process-pr` - PR review process from actual code reviews - `implement-issue` - End-to-end pipeline evolved over multiple iterations - `brainstorming` - Team ideation process documented **The Refinement Pattern**: All skills, regardless of origin, went through similar evolution: 1. **Initial draft** (adapted/extracted/captured) 2. **Project context** (tech stack, directory structure, tools) 3. **Team patterns** (conventions from code reviews) 4. **Anti-patterns** (mistakes from actual failures) 5. **Coordination** (references to specific agents) 6. **Testing** (validation with real tasks) 7. **Iteration** (multiple refinement rounds) **Why Multiple Sources Work**: - Community skills provide proven methodologies - Books provide authoritative domain expertise - Experience captures unique team workflows - All need project-specific customization to be effective ### 9. Git Worktrees **Key Patterns**: - **Branch Isolation** - each worktree on different branch - **Shared Git State** - common .git directory - **Parallel Development** - work on multiple features simultaneously - **Clean Switching** - no stashing required **Example Application** (`skills/using-git-worktrees/SKILL.md`): ```bash # Create worktree for feature branch git worktree add ../project-feature-x feature/x # Work in that directory cd ../project-feature-x # When done, remove worktree git worktree remove ../project-feature-x ``` **Why This Works**: - No branch switching interrupts work - Can test multiple branches simultaneously - Clean separation of concerns - Easier subagent coordination (each in own worktree) **Replication Strategy**: 1. Document worktree commands for your workflow 2. Explain when to use vs regular branching 3. Include cleanup procedures 4. Show integration with orchestration scripts ### 10. Schema Validation **Concept**: JSON schemas define expected structure for stage outputs, enabling validation. **Key Patterns**: - **One Schema Per Stage** - explicit output structure - **Type Safety** - validate data types - **Required Fields** - prevent missing data - **Format Constraints** - URLs, dates, enums **Example Application** (`scripts/schemas/implement-issue-setup.json`): ```json { "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "required": ["branch_name", "worktree_path", "tasks"], "properties": { "branch_name": { "type": "string", "pattern": "^feature/issue-[0-9]+" }, "worktree_path": { "type": "string" }, "tasks": { "type": "array", "items": { "type": "object", "required": ["description", "agent"], "properties": { "description": {"type": "string"}, "agent": {"type": "string"} } } } } } ``` **Why This Works**: - Fail fast on invalid outputs - Self-documenting stage contracts - Enables reliable orchestration - Catches errors before downstream stages **Replication Strategy**: 1. Define one schema per workflow stage 2. Specify all required fields 3. Add format validations (patterns, enums) 4. Validate in orchestration scripts 5. Use schemas as documentation **File-Specific Applications** (all in `scripts/schemas/`): - **implement-issue-setup.json**: Branch and worktree creation - **implement-issue-plan.json**: Implementation plan structure - **implement-issue-implement.json**: Task completion tracking - **implement-issue-test.json**: Test results and coverage - **implement-issue-pr.json**: Pull request metadata ## Cross-Stack Replication Guide ### For Any Language/Framework **1. Create Directory Structure** ```bash mkdir -p .claude/{agents,hooks,prompts,scripts,skills} ``` **2. Build Your Skill Library (Choose Your Approach)** **Approach A: Adapt from Community** ```bash # Browse and copy foundational skills cp -r superpowers/skills/using-skills .claude/skills/ cp -r superpowers/skills/test-driven-development .claude/skills/ cp -r superpowers/skills/systematic-debugging .claude/skills/ # Customize for your project # - Update test runner commands (pytest, jest, cargo test) # - Add your linting/formatting tools # - Include your debugging tools and workflows ``` **Approach B: Extract from Books/Documentation** ```bash # Example: Extract React patterns from official docs # 1. Copy React documentation sections to file # 2. Use Claude to extract patterns: "I have the React documentation on hooks. Please extract: - Key concepts into a skill (skills/react-hooks/SKILL.md) - Common patterns and anti-patterns - Project-specific: We use TypeScript strict mode - Include examples using our design system" # 3. Iterate through 3-5 refinement rounds # 4. Add team-specific patterns from code reviews ``` **Approach C: Capture from Experience** ```bash # Document your unique workflows mkdir .claude/skills/deployment-workflow mkdir .claude/skills/incident-response # Write skills capturing your team's actual process # Include: Tools used, commands, anti-patterns from actual incidents ``` **3. Create Language-Specific Agents** ```yaml # Example: Python Django Agent --- name: django-backend-developer description: Senior Python/Django developer. Use for models, views, serializers, middleware, ORM queries, migrations, and pytest. --- You are a senior Python/Django developer with expertise in Django 5.x, Python 3.12, and PostgreSQL... ## Anti-Patterns to Avoid - **N+1 queries** - always use `select_related()` and `prefetch_related()` - **Never use `.filter().count()`** - use `.count()` directly - **Use `get_object_or_404()`** - not try/except DoesNotExist ``` **4. Configure Hooks** ```json { "hooks": { "PostToolUse": [ { "matcher": "Edit|Write", "hooks": [{ "type": "command", "command": "black $file && isort $file", "timeout": 30 }] } ] } } ``` **5. Build Orchestration Scripts** - Adapt state machine to your workflow stages - Use JSON schemas for validation - Implement resume capability - Add rate limit handling ### Technology-Specific Examples **React/TypeScript Project**: ``` .claude/ ├── agents/ │ ├── react-component-developer.md │ ├── typescript-type-architect.md │ └── jest-test-specialist.md ├── skills/ │ ├── test-driven-development/ # Adapted from community │ ├── react-patterns/ # Extracted from React docs │ ├── typescript-patterns/ # Extracted from TS handbook │ ├── component-testing/ # Captured from team experience │ └── deployment-workflow/ # Captured from team process └── settings.json (ESLint + Prettier hooks) ``` **Python Data Science Project**: ``` .claude/ ├── agents/ │ ├── data-engineer.md │ ├── ml-model-developer.md │ └── jupyter-notebook-specialist.md ├── skills/ │ ├── test-driven-development/ # Adapted from community │ ├── data-validation/ # Extracted from "Data Quality" book │ ├── model-evaluation/ # Extracted from ML textbooks │ ├── visualization-patterns/ # Captured from team standards │ └── experiment-tracking/ # Captured from workflow └── settings.json (black + mypy hooks) ``` **DevOps/Infrastructure Project**: ``` .claude/ ├── agents/ │ ├── terraform-architect.md │ ├── kubernetes-operator.md │ └── ci-cd-engineer.md ├── skills/ │ ├── systematic-debugging/ # Adapted from community │ ├── infrastructure-as-code/ # Extracted from HashiCorp docs │ ├── deployment-strategies/ # Extracted from "Release It!" book │ ├── incident-response/ # Captured from actual incidents │ └── monitoring-observability/ # Captured from team runbooks └── settings.json (terraform fmt hooks) ``` **Skill Source Strategy by Domain**: | Domain | Adapt from Community | Extract from Books/Docs | Capture from Experience | |--------|---------------------|------------------------|------------------------| | **Methodology** | TDD, debugging, git | N/A | Team retrospectives | | **Framework** | General patterns | Official documentation | Project conventions | | **Design** | Basic principles | Design books, style guides | Design system | | **Architecture** | SOLID, patterns | Architecture books | System decisions | | **DevOps** | Git workflows | Tool documentation | Incident runbooks | | **Domain Logic** | N/A | Domain textbooks | Business rules | ## Key Success Factors ### 1. Choose the Right Source for Each Skill **Don't force one approach for everything:** - Methodologies (TDD, debugging) → Adapt from community - Domain expertise (CSS, security, ML) → Extract from books - Team workflows (deployment, PR process) → Capture from experience - Most skills combine multiple sources through iteration ### 2. Expect Multiple Refinement Rounds **Initial drafts are starting points, not final products:** - Round 1: Get the basic structure (adapt/extract/capture) - Round 2: Add project tech stack and tools - Round 3: Include team conventions and patterns - Round 4: Add anti-patterns from real code reviews - Round 5+: Continuous refinement based on usage **Example: UI design skill evolution** - Draft: Generic CSS patterns from book - Round 1: Project design tokens - Round 2: "No Tailwind" from team decision - Round 3: Blade template specifics - Round 4: Coordination with backend agent - Round 5: Real refactoring examples ### 3. Work Through Issues to Completion, Then Update **The Continuous Improvement Loop - Most Important Pattern** When skills, agents, or workflows fail or produce incorrect results, follow this process: **Step 1: Don't Update Yet - Solve the Problem First** ``` ❌ WRONG: Agent fails → immediately edit agent → hope it works ✅ RIGHT: Agent fails → work through to correct solution → update agent ``` **Why this matters**: You need to understand the *correct* solution before you can teach it. Updating before solving often encodes incorrect assumptions or partial solutions. **Step 2: Work Through to the Correct Solution** Use Claude to iteratively debug and reach the right answer: ``` Agent produces incorrect code → Run tests (fail) ↓ Analyze failure → Understand root cause ↓ Try fix attempt 1 → Run tests (still fail, different error) ↓ Analyze new failure → Refine understanding ↓ Try fix attempt 2 → Run tests (pass) ↓ Verify solution is correct, not just passing ↓ NOW you have the correct solution ``` **Step 3: Ask Claude to Update the Skill/Agent** Once you have the correct solution, prompt: ``` "I just encountered this issue: [describe problem] The agent/skill did: [incorrect behavior] The correct solution was: [working solution] Please update [skill/agent name] to prevent this issue. Add: 1. Specific guidance that would have caught this 2. An anti-pattern entry for the incorrect approach 3. An example showing the correct pattern 4. A red flag if this is a common rationalization" ``` **Real Example from Project**: **Issue Encountered**: ```php // Agent wrote this (seems to work, but breaks in production) public function getUsers() { return User::all(); // Works in dev (100 users), OOM in prod (1M users) } ``` **Work Through Process**: ``` Iteration 1: Add pagination public function getUsers() { return User::paginate(50); // Better, but breaks API contract } Iteration 2: Add chunking public function getUsers() { return User::chunk(1000, function($users) { // Process batch }); } // Wrong pattern for this use case Iteration 3: Correct solution public function getUsers(int $page = 1, int $perPage = 50) { return User::paginate($perPage, ['*'], 'page', $page); // Returns paginated response, maintains API contract } ``` **Update Agent**: ```markdown ## Anti-Patterns to Avoid - **NEVER use `Model::all()` on large tables** - Problem: Loads entire table into memory (OOM in production) - Symptom: Works in dev, fails in production with large datasets - Solution: Always use pagination: `Model::paginate($perPage)` - Red flag: "It works in my local database" ## Red Flags - STOP and Reconsider - "It works with my test data" → Test with production-scale data - "Model::all() is simpler" → Simplicity that breaks at scale is complexity ``` **Step 4: Test the Update** Run the same scenario with updated skill/agent: - Does it now produce correct code? - Does it catch the anti-pattern? - Does it provide the right guidance? If not, refine the update and retest. ### 4. Build Knowledge from Failures **Failure → Refinement Cycle**: ```dot digraph improvement { rankdir=LR; "Use skill/agent" [shape=box]; "Issue occurs" [shape=diamond]; "Work through to correct solution" [shape=box, style=filled, fillcolor=yellow]; "Understand root cause" [shape=box, style=filled, fillcolor=yellow]; "Update skill/agent" [shape=box, style=filled, fillcolor=lightgreen]; "Add anti-pattern" [shape=box]; "Add red flag" [shape=box]; "Test updated version" [shape=box]; "Use skill/agent" -> "Issue occurs"; "Issue occurs" -> "Continue working" [label="no issue"]; "Issue occurs" -> "Work through to correct solution" [label="issue found"]; "Work through to correct solution" -> "Understand root cause"; "Understand root cause" -> "Update skill/agent"; "Update skill/agent" -> "Add anti-pattern"; "Add anti-pattern" -> "Add red flag"; "Add red flag" -> "Test updated version"; "Test updated version" -> "Use skill/agent" [label="improvement verified"]; } ``` **Track Patterns Across Failures**: Keep a log of common issues: ```markdown ## Common Issues Log ### Issue: Agent uses Model::all() on large tables - Occurred: 3 times (UserService, OrderService, ProductService) - Root cause: Agent doesn't consider production data scale - Solution: Added "NEVER use Model::all()" anti-pattern - Prevention: Added red flag "Works in dev" → "Test at scale" - Result: Zero occurrences after update ### Issue: Tests with sleep() instead of event-based waiting - Occurred: 5 times (async operations, polling, race conditions) - Root cause: Agent defaults to timing instead of conditions - Solution: Added condition-based-waiting skill - Prevention: Red flag "Use sleep() to wait" - Result: All new tests use proper wait patterns ``` ### 5. Examples of Iterative Refinement **Example 1: Laravel Backend Agent** **Initial Version** (Generic): ```yaml description: PHP/Laravel backend developer Anti-patterns: - Write clean code - Follow best practices ``` **After Issue #1** (N+1 queries in UserController): ```yaml Anti-patterns: - **N+1 prevention** — Always eager load with `with()` - Never use `Model::all()` on large tables ``` **After Issue #2** (Used `env()` in Service class): ```yaml Anti-patterns: - **N+1 prevention** — Always eager load with `with()` - **Never use `env()`** outside config files — Use `config()` helper - Never use `Model::all()` on large tables ``` **After Issue #3** (Tests lacked RefreshDatabase): ```yaml Anti-patterns: - **N+1 prevention** — Always eager load with `with()` - **Never use `env()`** outside config files - Never use `Model::all()` on large tables - **Missing `RefreshDatabase`** in feature tests — Tests contaminate each other Red Flags: - "Tests pass locally but fail in CI" → Missing RefreshDatabase - "Works in dev" → Test with production-scale data ``` **Example 2: Test Validator Agent** **Initial Version**: ```markdown Validate tests have assertions ``` **After Hollow Test Issue**: ```markdown ### Hollow Assertions Tests that pass but don't verify behavior: ```php // FAIL: Only asserting response code, not content public function test_api_returns_users(): void { $response = $this->get('/api/users'); $response->assertOk(); // What about the users? } ``` Flag: Response checks without data validation ``` **After Mock Abuse Issue**: ```markdown ### Hollow Assertions [previous content] ### Brittle/Cheating Mocks Mocks that bypass the actual logic being tested: ```php // FAIL: Mocking the system under test public function test_user_service(): void { $service = $this->createMock(UserService::class); $service->method('createUser')->willReturn(new User()); // Tests nothing! } ``` Flag: Mocking the class being tested ``` ### 6. When to Update vs When to Discard **Update When**: - Issue is fixable with clearer guidance - Pattern is close but needs refinement - Anti-pattern can prevent future occurrences - Skill/agent is fundamentally sound **Discard/Rewrite When**: - Fundamental approach is wrong - Skill fights against better patterns - Multiple unrelated issues from same skill - Easier to start fresh than patch **Example - Discard**: ```markdown # Original skill: "Always use mocks in unit tests" # After issues: Actually need real objects for domain logic tests # Decision: Skill fundamentally wrong, rewrite with nuance # New skill: "Use mocks for boundaries, real objects for domain" ``` ### 7. Pressure Testing After Updates After updating a skill/agent, test under pressure: **Create Scenarios That Previously Failed**: ``` Updated agent to avoid Model::all() ↓ Test: "Create a service that fetches all users" ↓ Does agent now use pagination? ↓ YES: Update verified NO: Refine update, add more explicit guidance ``` **Test Related Patterns**: ``` Updated: "Never use env() outside config" ↓ Test: "Read database connection settings in service" ↓ Does agent use config('database.connection')? ↓ Verify it doesn't fall back to env() ``` **Test Under Time Pressure**: ``` Add to skill: "This is urgent, just make it work" ↓ Does agent still follow anti-patterns? ↓ If yes: Add stronger language, make non-negotiable ``` ### 8. Track Improvements Over Time **Maintain a changelog for each skill/agent**: ```markdown ## CHANGELOG ### 2025-01-15: Added N+1 query prevention - Issue: UserController loaded all orders without eager loading - Solution: Added "Always use with()" anti-pattern - Verification: Tested with large datasets, no more N+1s ### 2025-01-20: Added env() restriction - Issue: Service class called env('API_KEY') directly - Solution: Added "Never use env() outside config" rule - Verification: Scanned codebase, all env() calls in config/ ### 2025-01-25: Added RefreshDatabase reminder - Issue: Feature tests contaminating each other - Solution: Added "Missing RefreshDatabase" anti-pattern - Verification: All new tests include trait ``` **Benefits**: - See evolution over time - Understand why rules exist - Share learning with team - Identify patterns in failures ### 9. Test with Real Tasks **Validate before relying on skills/agents:** - Skills: Run pressure scenarios with subagents - Agents: Test on representative domain tasks - Hooks: Verify in actual workflow - Keep iterating until they work reliably ### 10. Document Rationale and Sources **Make origins and reasoning clear:** - Note source (community/book/experience) - Explain why patterns exist - Document what problems they solve - Include attribution for adapted/extracted content ### 11. Maintain Discoverability **Keep skills findable:** - Rich, searchable descriptions - Clear naming conventions - Cross-references between components - Regular pruning of unused skills ### 12. Balance Generic and Specific **Find the right level of abstraction:** - Too generic → Not actionable for your project - Too specific → Breaks when project evolves - Sweet spot → Project-specific but adaptable **Example balance:** ```markdown # Too generic (not useful) "Write good CSS" # Too specific (breaks easily) "Use class .btn-primary-lg-blue from line 47 of app.css" # Right balance (project-specific but adaptable) "Use semantic button classes from design system tokens - .btn--primary for main actions - .btn--secondary for supporting actions See /resources/css/components/buttons.css" ``` ## Common Patterns Across Files ### Pattern: Flowchart-Driven Decision Making **Files**: All major skills (TDD, subagent-driven-development, dispatching-parallel-agents) **Concept**: Visual flowcharts clarify when to use a pattern and how to execute it. **Implementation**: ```dot digraph decision { "Have plan?" [shape=diamond]; "Tasks independent?" [shape=diamond]; "Use subagent workflow" [shape=box]; } ``` **Why**: Reduces cognitive load, provides clear decision criteria, visually communicates process. **Replicate For**: Any multi-step process with decision points. ### Pattern: Red Flags / Rationalization Tables **Files**: TDD, writing-skills, subagent-driven-development **Concept**: Anticipate and counter common justifications for skipping best practices. **Implementation**: ```markdown | Excuse | Reality | |--------|---------| | "Too simple to test" | Simple code breaks. Test takes 30 seconds. | | "I'll test after" | Tests passing immediately prove nothing. | ``` **Why**: Pre-emptively addresses resistance to discipline, makes violations obvious. **Replicate For**: Any prescriptive methodology that might be circumvented under pressure. ### Pattern: Skill-Specific Supporting Files **Files**: systematic-debugging, ui-design-fundamentals, bulletproof-frontend **Concept**: Main SKILL.md stays concise, detailed patterns in separate files. **Implementation**: ``` skills/ ui-design-fundamentals/ SKILL.md # Overview + quick reference buttons.md # Button-specific patterns forms.md # Form-specific patterns navigation.md # Navigation patterns ``` **Why**: Keeps main file scannable while providing depth when needed. **Replicate For**: Skills with multiple sub-domains or extensive reference material. ### Pattern: Explicit State Tracking **Files**: implement-issue-orchestrator.sh, subagent-driven-development **Concept**: Maintain explicit state that persists across subagent invocations. **Implementation**: ```bash # Track branch name explicitly FEATURE_BRANCH="feature/issue-123" # Include in every subagent dispatch dispatch_implementer "$FEATURE_BRANCH" "$task_text" ``` **Why**: Subagents have no memory, must receive all context explicitly. **Replicate For**: Multi-step workflows with fresh subagent per step. ### Pattern: Two-Stage Review **Files**: subagent-driven-development, code-reviewer **Concept**: Separate spec compliance from code quality - different concerns, different reviewers. **Implementation**: ```markdown 1. Implement task 2. Spec reviewer: Does it match requirements? 3. Code quality reviewer: Is it well-built? ``` **Why**: Spec compliance prevents over/under-building, quality review ensures good implementation. **Replicate For**: Any implementation workflow where "right thing" differs from "right way". ## Conclusion This project demonstrates a practical AI development system built through: - **Multiple Skill Sources** - Community adaptation, book extraction, experience capture - **Project-Specific Automation** - Custom agents, hooks, and orchestration scripts - **Workflow Integration** - Multi-stage pipelines with state management - **Domain Specialization** - Agents with clear scope and coordination protocols - **Continuous Improvement** - Failure-driven refinement loop **The Three-Path Strategy**: Skills can be built through different approaches depending on the type: **Path 1: Adapt from Community** - Start: Browse obra/superpowers for foundational patterns - Customize: Update examples to your tech stack - Example: TDD, debugging, git workflows **Path 2: Extract from Books/Docs** - Start: Convert authoritative source to text - Process: Have Claude extract concepts into skill format - Refine: Add project context through multiple rounds - Example: UI design fundamentals from "Handcrafted CSS" **Path 3: Capture from Experience** - Start: Identify recurring team patterns - Document: Write skill with real anti-patterns - Example: GitHub workflow, PR processes **The Critical Improvement Loop**: The most important pattern: **Work through issues to the correct solution BEFORE updating skills/agents.** ``` Issue occurs → Work through to correct solution → Understand root cause ↓ Update skill/agent → Add anti-pattern → Add red flag → Test update ↓ Verify improvement → Log the learning → Continue ``` This loop is what makes the system self-improving: - Skills get better with each failure - Anti-patterns accumulate real experience - Red flags prevent future rationalizations - The system learns from actual usage **What Worked in This Project**: | Skill Type | Approach | Refinement Rounds | Key Improvements | |-----------|----------|------------------|------------------| | TDD methodology | Adapted from community | 2-3 | Added project test runners | | Debugging patterns | Adapted from community | 3-4 | Added project-specific tools | | UI design fundamentals | Book extraction | 5+ | Design tokens, no Tailwind, Blade specifics | | Frontend architecture | Book extraction | 5+ | Project CSS architecture, agent coordination | | GitHub workflows | Experience capture | 10+ | Real workflow failures → anti-patterns | | Laravel backend agent | Created + experience | 15+ | N+1 queries, env() usage, RefreshDatabase | | Test validator agent | Created + experience | 8+ | Hollow assertions, mock abuse, TODOs | | Orchestration scripts | Created from scratch | 20+ | Resume capability, rate limits, state management | Notice: More complex components (agents, orchestrators) had more refinement rounds because they encountered more real-world issues. **The Real Work**: Regardless of approach, the value comes from: 1. **Iterative refinement** - Initial draft → project context → team patterns → anti-patterns from failures 2. **Working through issues** - Don't update until you understand the correct solution 3. **Testing with real tasks** - Pressure test skills, validate agents on actual work 4. **Capturing failures** - Each issue becomes an anti-pattern or red flag 5. **Coordination between components** - Agents reference skills, hooks trigger scripts, schemas validate 6. **Building institutional knowledge** - System improves as it encounters and solves problems **Getting Started**: 1. **Pick 5-10 foundational skills** - TDD, debugging, git workflows (adapt from community) 2. **Create 2-3 domain skills** - Your framework/language expertise (extract from books/docs) 3. **Document 1-2 team workflows** - Your unique processes (capture from experience) 4. **Build 2-3 specialized agents** - With project context and coordination rules 5. **Add automation hooks** - For repetitive quality gates 6. **Use the system** - Let it fail, work through issues, update, repeat 7. **Track improvements** - Log common issues and how skills evolved **Key Success Factor**: Don't expect perfection on round 1. The first version of a skill/agent is a hypothesis. Real usage reveals issues. Working through those issues to the correct solution, then encoding that knowledge back into the skill/agent—that's what makes the system valuable. Initial drafts get you started. **The improvement loop is what makes it great.** After 6 months of use: - Generic skills become project-specific - Anti-patterns reflect actual mistakes - Red flags catch real rationalizations - Agents coordinate smoothly - Workflows handle edge cases - The system embodies team knowledge This is infrastructure that improves with use, not documentation that rots. The effort compounds.