Problem Register Template

The Problem Register is your audit trail during systematic problem solving. It tracks every issue discovered, its severity, root cause, proposed action, and resolution status.

Why Use a Problem Register?

Benefits:

Complete view of all issues, not just the obvious ones
Prevents scope creep by tracking what's in/out of scope
Enables handoffs with full context for next person
Creates accountability with explicit status tracking
Supports retrospectives to understand what went wrong

Basic Template (Copy-Paste)

## Problem Register

| ID | Sev | Category | Evidence | Root Cause | Proposed Action | Status | Confidence |
|----|-----|----------|----------|------------|-----------------|--------|------------|
| P-01 |  |  |  |  |  |  |  |
| P-02 |  |  |  |  |  |  |  |
| P-03 |  |  |  |  |  |  |  |

Column Definitions

ID

Format: P-01, P-02, etc.

Sequential identifier. Use leading zeros for sorting.

Sev (Severity)

Values: P0, P1, P2

P0 (Critical):

Production completely down
Data loss or corruption
Active security vulnerability
Legal/financial impact
Must resolve before claiming done

P1 (High):

Partial outage (>10% users affected)
Core feature broken
Wrong business logic
Performance degradation (>2x normal)
Must resolve before claiming done

P2 (Medium):

Edge cases (<5% users)
Cosmetic issues
Technical debt
Nice-to-have improvements
Can defer with documented rationale

Evidence

Format: Short snippet or pointer

Good examples:

"EADDRINUSE" in startup logs
p95 latency 800ms (was 150ms)
502 errors on /api/auth endpoint
Missing index on users.email, query takes 4s

Bad examples:

"It's slow" (not specific)
"Doesn't work" (no concrete observation)

Root Cause

Format: Hypothesis about underlying cause

Good examples:

Start command doesn't bind to $PORT
Connection pool size too small (10 connections, 50 concurrent requests)
Missing environment variable DATABASE_URL

Bad examples:

"Bug" (too vague)
"User error" (deflecting responsibility)

If unknown, write: "Under investigation" and update when determined.

Proposed Action

Format: Specific, reversible action

Good examples:

Update start script to use process.env.PORT
Increase pool size from 10 to 30
Add DATABASE_URL to Vercel environment variables

Bad examples:

"Fix it" (not actionable)
"Investigate more" (not an action, keep status as Investigating)

Status

Values: Planned, In Progress, Resolved, Blocked, Deferred

Planned: Root cause known, action identified, not started
In Progress: Currently being addressed
Resolved: Fixed and verified
Blocked: Cannot proceed (specify blocker in Evidence or Root Cause)
Deferred: Acknowledged but postponed (requires rationale)

Confidence

Format: 0.0 to 1.0

Your confidence in the root cause hypothesis.

0.0-0.3: Low confidence, multiple competing hypotheses
0.4-0.6: Medium confidence, needs more investigation
0.7-0.9: High confidence, clear evidence
1.0: Certain (only use when verified with multiple methods)

Example: API Performance Issue

## Problem Register

| ID | Sev | Category | Evidence | Root Cause | Proposed Action | Status | Confidence |
|----|-----|----------|----------|------------|-----------------|--------|------------|
| P-01 | P1 | Database | Queries taking 2-4s to users table | Missing index on users.email | CREATE INDEX idx_users_email | Resolved | 0.85 |
| P-02 | P0 | Runtime | "Connection pool exhausted" every 30s in logs | Pool size (10) < concurrent requests (50) | Increase pool size to 30 | Resolved | 0.95 |
| P-03 | P1 | Performance | Memory grows to 85% under sustained load | Missing client.release() in error handlers | Add release() calls | Resolved | 0.75 |
| P-04 | P2 | Monitoring | No alert fired when p95 exceeded 500ms | Alert threshold too high (1000ms) | Lower threshold to 500ms | Deferred | 0.90 |

Example: Deploy Failure

## Problem Register

| ID | Sev | Category | Evidence | Root Cause | Proposed Action | Status | Confidence |
|----|-----|----------|----------|------------|-----------------|--------|------------|
| P-01 | P0 | Build | Type error in dashboard/page.tsx:45 | Supabase types out of sync with schema | Regenerate types with npx supabase gen | Resolved | 0.90 |
| P-02 | P0 | Config | "Missing SUPABASE_SERVICE_KEY" in preview logs | Not set in Vercel env vars | Add to Vercel dashboard | Resolved | 1.0 |
| P-03 | P1 | Auth | OAuth callback returns 500, redirect_uri mismatch | Preview URL not in Supabase allowed list | Add *.vercel.app to allowed URLs | Resolved | 0.85 |

Example: Data Analysis

## Problem Register

| ID | Sev | Category | Evidence | Root Cause | Proposed Action | Status | Confidence |
|----|-----|----------|----------|------------|-----------------|--------|------------|
| P-01 | P0 | Confound | Conversion 2.1% → 2.9% coincides with Black Friday | Sale inflated baseline comparison | Exclude sale days from analysis | Resolved | 0.95 |
| P-02 | P1 | Segment | Mobile conversion dropped 1.8% → 1.5% (p=0.04) | New checkout has UX issue on mobile | Flag to product team for investigation | Resolved | 0.80 |
| P-03 | P2 | Sample | Desktop segment has only 2K users | May lack power for desktop-specific conclusions | Document limitation, track for 2 more weeks | Deferred | 0.70 |

Companion: Action Log Template

Use alongside Problem Register to document each fix:

### Pass N — Action Log

**Date**: YYYY-MM-DD HH:MM

**Problems Addressed**: P-01, P-02

**Changes Made**:
- `[file/command/decision]`
- `[file/command/decision]`

**Before → After**:
- Metric/observation before: [value/state]
- Metric/observation after: [value/state]

**Verification (Primary)**:
- Method: [how you verified, e.g., "health check endpoint"]
- Result: [what you observed]

**Verification (Independent)**:
- Method: [different method, e.g., "load test"]
- Result: [what you observed]

**New Signals Discovered?**
- [Yes/No]
- If yes: [brief description, added to Problem Register as P-XX]

**Next Pass Focus**:
- [What discovery method will you use next?]

Example: Complete Pass Documentation

### Pass 1 — Action Log

**Date**: 2025-10-29 14:30

**Problems Addressed**: P-01 (Missing index)

**Changes Made**:
```sql
CREATE INDEX idx_users_email ON users(email);

Before → After:

Query time: 2.4s → 0.05s
API p95 latency: 800ms → 450ms

Verification (Primary):

Method: EXPLAIN ANALYZE on sample query
Result: Index scan used, query completes in 50ms

Verification (Independent):

Method: Sample 10 API requests via curl
Result: Average response time 450ms (improved but not at baseline)

New Signals Discovered?

Yes: "Connection pool exhausted" appearing in logs every 30s
Added to Problem Register as P-02

Next Pass Focus:

Check database connection metrics and pool configuration


---

## Tips for Effective Problem Registers

### 1. Start Early
Create the register before making any changes. Capture your initial understanding.

### 2. Update Continuously
Add entries as you discover issues. Don't wait until the end.

### 3. Be Specific
Vague entries like "performance issue" don't help future you or teammates.

### 4. Link Evidence
Reference log timestamps, metric screenshots, PR numbers, commit hashes.

### 5. Track Confidence
If confidence is low (`<0.7`), consider additional investigation before applying the fix.

### 6. Don't Delete Entries
Mark as Resolved/Deferred instead. Preserves audit trail.

### 7. Review Before "Done"
Check: All P0/P1 items Resolved? Any new issues discovered in last pass?

---

## Integration with Complete Mode

When using Complete Problem Solving Mode, the Problem Register is mandatory:

@complete-mode

Task = [Your task] Context = [Context]

[DONE overlay — Domain]

Zero criticals: Problem Register has 0 P0/P1 remaining
Evidence pack: Problem Register with all entries


The AI will maintain the register throughout the conversation and refuse to claim done while P0/P1 items remain.

---

## Export Formats

### For GitHub Issues
```markdown
**Problem Register Summary**

Resolved (3):
- P-01: Missing index on users.email
- P-02: Connection pool too small  
- P-03: Memory leak in error handlers

Deferred (1):
- P-04: Alert threshold too high (low priority)

[Full register in PR description]

For Incident Reports

## Root Cause Analysis

**Timeline** (from Problem Register):
1. 14:00 - P-01 identified: Slow queries
2. 14:15 - P-01 resolved: Index added
3. 14:20 - P-02 identified: Pool exhaustion
4. 14:35 - P-02 resolved: Pool increased
5. 14:40 - P-03 identified: Memory leak
6. 15:00 - P-03 resolved: Release calls added

**All Critical Issues**: Resolved
**Monitoring**: Added alerts for connection pool usage

For Handoffs

## Handoff: API Performance Work

**Completed**:
- See Problem Register P-01, P-02, P-03 (all Resolved)

**Remaining**:
- P-04 (Deferred): Alert tuning, low priority
- Consider adding connection pool metrics to dashboard

**Evidence Pack**:
- PR #789 with all changes
- Datadog screenshot showing p95 back to baseline
- Load test results attached

Quick Start Checklist

Copy basic template to your working doc
Add first entry for known issue
Include Evidence and Sev for each entry
Update Status after each action
Verify all P0/P1 Resolved before done
Save final register for future reference

Complete Mode Framework — Overall methodology
Quick Reference — Cheat sheet
Real Examples — See registers in action
Software/DevOps Guide — Domain-specific usage

Problem Register Template: Track Issues Systematically

Problem Register Template

Why Use a Problem Register?

Basic Template (Copy-Paste)

Column Definitions

ID

Sev (Severity)

Category

Evidence

Root Cause

Proposed Action

Status

Confidence

Example: API Performance Issue

Example: Deploy Failure

Example: Data Analysis

Companion: Action Log Template

Example: Complete Pass Documentation

For Incident Reports

For Handoffs

Quick Start Checklist

Related Resources