Intermediate10 min readGuides

Complete Mode in Action: Real Conversation Examples

See how Complete Problem Solving Mode catches issues that standard prompting misses through multi-pass discovery and independent verification.

Published October 29, 2025

Complete Mode in Action: Real Examples

Learn through annotated examples showing how Complete Problem Solving Mode's multi-pass discovery reveals hidden issues that standard prompting misses.


Example 1: API Performance Degradation

The Problem

Production API response times increased from 150ms (p95) to 800ms over 24 hours. No obvious errors in logs.

Standard Approach (Single Pass)

User: "API is slow, check the logs?"

AI: "Found slow database queries. Add an index on users.email."

Result: Performance improves to 600ms but still 4x baseline. Root causes missed: connection pool exhaustion and memory leak.


Complete Mode Approach (Multi-Pass)

@complete-mode

Task = Fix API response time degradation
Context = REST API, p95 went 150ms → 800ms in 24h, have logs + Datadog + DB access

[DONE overlay — Software/DevOps]
- Performance: p95 back to baseline (≤200ms) for ≥30min
- Logs clean: no errors last 2min
- Zero criticals
- Evidence: Datadog dashboard, load test results

scope:runtime depth:deep strict:on

Pass 1 — Application Logs:

  • Found: Slow queries (2-4s on users table)
  • Action: Added index on users.email
  • Result: Queries now 50ms, API improves to 450ms
  • Re-scan trigger: Still not at baseline

Pass 2 — Database Connection Metrics:

  • Found: "Connection pool exhausted" every 30s
  • Action: Increased pool size 10 → 30
  • Result: p95 improves to 180ms
  • Re-scan trigger: Verify under sustained load

Pass 3 — Load Testing:

  • Found: Memory grows to 85% under sustained 100 req/s
  • Action: Fixed missing client.release() in error handlers
  • Result: Memory stable at 45%, p95=165ms

Final Evidence:

  • Datadog: p95=165ms for 30min (see screenshot)
  • Load test: 1000 reqs, p95=170ms, 0 errors
  • PR #789 with all changes + tests

What Was Caught: Standard approach found 1 of 3 issues. Complete Mode found all 3 through different discovery methods each pass.


Example 2: Data Analysis — False Signal

The Problem

Product team claims new checkout flow increased conversion by 8%.

Standard Approach

User: "Did new checkout increase conversion?"

AI: "Yes! Conversion went from 2.1% to 2.9%. That's 38% relative improvement."

Result: Team announces success. Later discovered the increase was due to Black Friday sale starting the same day. Actual conversion slightly decreased.


Complete Mode Approach

@complete-mode

Task = Validate conversion increase claim
Context = Launched 2 weeks ago, 50K sessions, have Amplitude + DB

[DONE overlay — Data/Analytics]
- Objective met: increase is real with `p<0.05`
- Statistical validity: controlled for confounds
- Reproducibility: SQL + Amplitude provided
- Zero criticals
- Evidence: plots, stats, query, dashboard

scope:experiment-design depth:deep strict:on

Pass 1 — Basic Metrics:

  • Found: 2.1% → 2.9% (p=0.03, appears significant)
  • Re-scan trigger: Check for external events

Pass 2 — External Events:

  • Found: Black Friday sale launched same day
  • Action: Exclude sale days from analysis
  • Result: 2.1% → 2.0% (p=0.68, NOT significant)
  • Re-scan trigger: Segment by device/demographic

Pass 3 — Segmentation:

  • Found: Mobile conversion dropped 1.8% → 1.5% (p=0.04)
  • Desktop: No change (2.4% → 2.6%, p=0.15)
  • Critical finding: Mobile UX regression

Final Evidence:

  • SQL queries showing confound-adjusted analysis
  • Time series plot highlighting sale spike
  • Segment breakdown revealing mobile issue
  • Statistical tests: overall p=0.68, mobile p=0.04

What Was Caught: Standard approach missed confound and mobile regression. Complete Mode prevented false announcement and identified actionable UX fix.


Example 3: Deploy Failure (Vercel)

Standard Approach

User: "Vercel deploy failing, help?"

AI: "Looks like a build error. Check your TypeScript config."

Result: One issue fixed, deploy still fails with different error. Takes 5 iterations to resolve.


Complete Mode Approach

@complete-mode

Task = Fix Vercel production deploy
Context = Next.js 14 + TypeScript + Supabase, last success 3 days ago

[DONE overlay — Software/DevOps]
- Cloud health: /api/health 200 for ≥30min
- Logs clean: last 2min no errors
- Local gates: build + test + lint pass
- Zero criticals
- Evidence: health URL, logs, test output, deploy URL

scope:build depth:deep strict:on

Pass 1 — Build Logs:

| ID | Sev | Category | Evidence | Root Cause | Action | Status | Conf |
|----|-----|----------|----------|------------|--------|--------|------|
| P-01 | P0 | Build | Type error dashboard/page.tsx:45 | Supabase types out of sync | Regen types | Resolved | 0.9 |

Pass 2 — Runtime Logs (Preview Deploy):

| P-02 | P0 | Config | "Missing SUPABASE_SERVICE_KEY" | Not in Vercel env | Add to dashboard | Resolved | 1.0 |

Pass 3 — Manual Testing:

| P-03 | P1 | Auth | OAuth callback 500, redirect_uri | Not in Supabase allowed list | Add preview URL pattern | Resolved | 0.85 |

What Was Caught: All 3 issues found in sequence using different discovery methods (build logs → runtime logs → manual test). Standard approach would have required 5+ back-and-forth iterations.


Key Patterns Across Examples

Pattern 1: Different Discovery Methods Reveal Different Issues

MethodCatches
Application logsRuntime errors, exceptions
System metricsResource exhaustion, performance
Load testingBehavior under stress
Manual testingIntegration issues
External dataConfounds, timing
Code reviewLogic errors, security

Lesson: Single method = single class of issues found. Multi-method = comprehensive discovery.

Pattern 2: Independent Verification Catches False Positives

  • Primary: Tests pass

  • Independent: Load test reveals memory leak

  • Primary: Metric improved

  • Independent: Confound analysis shows it's spurious

Lesson: Two verification methods prevent false confidence.

Pattern 3: Re-scan Triggers

Know when to keep going:

  • ✅ Baseline not restored
  • ✅ Only 1 issue found
  • ✅ Fix seems "too easy"
  • ✅ High impact context
  • ✅ Cannot explain full symptom

Before/After Comparison

Typical Standard Prompting Session

User: Problem X
AI: Try solution A
User: *applies A*
User: Still broken
AI: Try solution B
User: *applies B*
User: Still broken
AI: Try solution C
...

Characteristics:

  • 5-10 iterations
  • Reactive fixing
  • No audit trail
  • Partial solutions
  • No verification strategy

Complete Mode Session

User: @complete-mode [Task + Context + Overlay]
AI: Pass 1 discovery → Fix P-01 → Verify → Re-scan
    Pass 2 discovery → Fix P-02 → Verify → Re-scan  
    Pass 3 discovery → Fix P-03 → Verify → DONE
    Evidence pack + Problem Register + What to Watch

Characteristics:

  • 1 structured request
  • Proactive discovery
  • Full audit trail
  • Comprehensive solution
  • Multi-method verification

When Complete Mode Saves Time

Seems slower: 3 passes vs 1 quick fix

Actually faster when:

  • Standard approach would need 5+ iterations to find all issues
  • Problem recurs within days/weeks (rework cost)
  • Handoff required (documentation time)
  • High stakes (mistake cost)

ROI calculation:

  • Complete Mode: 2x initial time
  • Avoided rework: 10x time savings
  • Net: 5x efficiency gain

Try It Yourself: Practice Prompts

Exercise 1: Debug a "Working" Feature

@complete-mode

Task = Verify password reset flow is production-ready
Context = Feature branch merged, manual test passed once

[DONE overlay — Software/DevOps]
[Add appropriate gates]

depth:deep strict:on

Expected discoveries:

  • Email deliverability issues
  • Token expiration edge cases
  • Rate limiting gaps
  • Security header misconfigurations

Exercise 2: Validate an A/B Test

@complete-mode

Task = Confirm blue CTA button outperformed red
Context = 1 week test, 5K users each variant, 12% vs 10% conversion

[DONE overlay — Data/Analytics]
[Add appropriate gates]

scope:experiment-design depth:deep

Expected discoveries:

  • Sample size adequacy
  • Multiple testing correction
  • Temporal confounds (day of week, time of day)
  • Segment heterogeneity

Common Mistakes

Mistake 1: Using same discovery method multiple times Fix: Explicitly vary methods each pass (logs → metrics → tests)

Mistake 2: Accepting "looks good" without independent verification Fix: Always require second validation approach

Mistake 3: Stopping after first issue resolved Fix: Add require_passes:3 to force deeper exploration

Mistake 4: Not documenting Problem Register Fix: Maintain register from start, it's your audit trail


Next Steps

  1. Pick a recent "fixed" issue — Rerun Complete Mode on it, see what was missed
  2. Practice with low-stakes problems — Build comfort with the pattern
  3. Compare outcomes — Track recurrence rates before/after adoption
  4. Customize gates — Adapt domain overlays to your standards