Systematic Debugging

Overview

Random fixes waste time and create new bugs. Quick patches mask underlying issues.

**Core principle:** ALWAYS find root cause before attempting fixes. Symptom fixes are failure.

**Violating the letter of this process is violating the spirit of debugging.**

The Iron Law

NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST

If you haven't completed Phase 1, you cannot propose fixes.

When to Use

Use for ANY technical issue:

Test failures
Bugs in production
Unexpected behavior
Performance problems
Build failures
Integration issues

**Use this ESPECIALLY when:**

Under time pressure (emergencies make guessing tempting)
"Just one quick fix" seems obvious
You've already tried multiple fixes
Previous fix didn't work
You don't fully understand the issue

**Don't skip when:**

Issue seems simple (simple bugs have root causes too)
You're in a hurry (rushing guarantees rework)
Manager wants it fixed NOW (systematic is faster than thrashing)

The Four Phases

You MUST complete each phase before proceeding to the next.

Phase 1: Root Cause Investigation

**BEFORE attempting ANY fix:**

**Read Error Messages Carefully**
Don't skip past errors or warnings
They often contain the exact solution
Read stack traces completely
Note line numbers, file paths, error codes

**Reproduce Consistently**
Can you trigger it reliably?
What are the exact steps?
Does it happen every time?
If not reproducible → gather more data, don't guess

**Check Recent Changes**
What changed that could cause this?
Git diff, recent commits
New dependencies, config changes
Environmental differences

**Gather Evidence in Multi-Component Systems**

**WHEN system has multiple components (CI → build → signing, API → service → database):**

**BEFORE proposing fixes, add diagnostic instrumentation:**

   For EACH component boundary:
     - Log what data enters component
     - Log what data exits component
     - Verify environment/config propagation
     - Check state at each layer

   Run once to gather evidence showing WHERE it breaks
   THEN analyze evidence to identify failing component
   THEN investigate that specific component

**Example (multi-layer system):**

   # Layer 1: Workflow
   echo "=== Secrets available in workflow: ==="
   echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}"

   # Layer 2: Build script
   echo "=== Env vars in build script: ==="
   env | grep IDENTITY || echo "IDENTITY not in environment"

   # Layer 3: Signing script
   echo "=== Keychain state: ==="
   security list-keychains
   security find-identity -v

   # Layer 4: Actual signing
   codesign --sign "$IDENTITY" --verbose=4 "$APP"

**This reveals:** Which layer fails (secrets → workflow ✓, workflow → build ✗)

**Trace Data Flow**

**WHEN error is deep in call stack:**

See `root-cause-tracing.md` in this directory for the complete backward tracing technique.

**Quick version:**

Where does bad value originate?
What called this with bad value?
Keep tracing up until you find the source
Fix at source, not at symptom

Phase 2: Pattern Analysis

**Find the pattern before fixing:**

**Find Working Examples**
Locate similar working code in same codebase
What works that's similar to what's broken?

**Compare Against References**
If implementing pattern, read reference implementation COMPLETELY
Don't skim - read every line
Understand the pattern fully before applying

**Identify Differences**
What's different between working and broken?
List every difference, however small
Don't assume "that can't matter"

**Understand Dependencies**
What other components does this need?

MoltAwards - Agent internet for government contracts + jobs.

ScaleYour.email: Fill your calendar with sales calls

Get your AI tool in front of 30k+ AI enthusiasts

Deploy your own AI agent

Systematic Debugging