Technical Guide: AI Agents with Defined Outputs

When building production AI agents, controlling output format is critical. This guide covers techniques for ensuring AI agents produce consistent, structured, machine-readable outputs.

Why Defined Outputs Matter

Unstructured outputs are problematic for:

Automation: Can't reliably parse responses
Integration: Hard to connect with other systems
Validation: Can't verify correctness programmatically
Testing: Can't write automated tests
Scaling: Manual intervention required

Defined outputs solve these problems by enforcing structure.

Output Format Options

1. JSON Output

Best for: API responses, data processing, system integration

Template:

Return your response in this exact JSON format:
{
  "status": "success" | "error",
  "data": {
    // Your structured data here
  },
  "metadata": {
    "timestamp": "ISO 8601 timestamp",
    "confidence": 0.0-1.0
  }
}

Do not include any text outside the JSON object.

Example Agent:

You are a data extraction agent.

Task: Extract key information from customer emails.

Output format (strict JSON):
{
  "intent": "question" | "complaint" | "request" | "feedback",
  "sentiment": "positive" | "neutral" | "negative",
  "urgency": "low" | "medium" | "high",
  "category": "billing" | "technical" | "general",
  "key_points": ["point 1", "point 2"],
  "requires_human": boolean,
  "confidence": 0.0-1.0
}

Example:
Input: "I'm very frustrated. My payment failed 3 times today!"
Output:
{
  "intent": "complaint",
  "sentiment": "negative",
  "urgency": "high",
  "category": "billing",
  "key_points": ["payment failures", "multiple attempts"],
  "requires_human": true,
  "confidence": 0.95
}

2. XML Output

Best for: Complex hierarchical data, legacy system integration

Template:

Return response in XML format:
<?xml version="1.0" encoding="UTF-8"?>
<response>
  <status>success</status>
  <data>
    <!-- Structured data -->
  </data>
</response>

Example:

You are a content analyzer.

Output XML format:
<?xml version="1.0" encoding="UTF-8"?>
<analysis>
  <document>
    <title></title>
    <summary></summary>
    <topics>
      <topic confidence="0.0-1.0"></topic>
    </topics>
    <sentiment score="-1.0 to 1.0"></sentiment>
    <entities>
      <entity type="person|org|location"></entity>
    </entities>
  </document>
</analysis>

3. CSV Output

Best for: Tabular data, spreadsheet import, data analysis

Template:

Return data as CSV with headers:
Column1,Column2,Column3
Value1,Value2,Value3

Rules:
- Include header row
- Use commas as delimiters
- Escape commas in values with quotes
- No extra whitespace

4. Markdown Tables

Best for: Human-readable structured data, documentation

Template:

Return as markdown table:
| Column 1 | Column 2 | Column 3 |
|----------|----------|----------|
| Value 1  | Value 2  | Value 3  |

Rules:
- Align columns properly
- Use pipes for separators
- Include header separator row

5. YAML Output

Best for: Configuration files, hierarchical settings

Template:

Return as YAML:
key: value
nested:
  subkey: value
list:
  - item1
  - item2

Implementation Strategies

Strategy 1: Strict Schema Enforcement

Define schema upfront:

// Define TypeScript interface
interface ExtractedData {
  name: string;
  email: string;
  phone?: string;
  address: {
    street: string;
    city: string;
    zip: string;
  };
  preferences: string[];
}

// System prompt
const systemPrompt = `
Extract information and return as JSON matching this TypeScript interface:

interface ExtractedData {
  name: string;
  email: string;
  phone?: string;  // optional
  address: {
    street: string;
    city: string;
    zip: string;
  };
  preferences: string[];
}

Rules:
- All required fields must be present
- Use null for missing optional fields
- Validate email format
- Ensure arrays even if empty
- Return ONLY valid JSON
`;

Validation:

import { z } from 'zod';

const schema = z.object({
  name: z.string().min(1),
  email: z.string().email(),
  phone: z.string().optional(),
  address: z.object({
    street: z.string(),
    city: z.string(),
    zip: z.string().regex(/^\d{5}$/),
  }),
  preferences: z.array(z.string()),
});

async function extractData(text: string): Promise<ExtractedData> {
  const response = await callAI(systemPrompt, text);
  const parsed = JSON.parse(response);
  
  // Validate against schema
  const validated = schema.parse(parsed);
  
  return validated;
}

Strategy 2: Example-Driven Formatting

Provide concrete examples:

You extract product information from descriptions.

Output format (JSON):
{
  "name": string,
  "price": number,
  "currency": "USD" | "EUR" | "GBP",
  "category": string,
  "features": string[],
  "in_stock": boolean
}

Examples:

Input: "iPhone 15 Pro - $999 - Latest flagship with titanium design"
Output:
{
  "name": "iPhone 15 Pro",
  "price": 999,
  "currency": "USD",
  "category": "Smartphones",
  "features": ["titanium design", "flagship"],
  "in_stock": true
}

Input: "MacBook Air M2, €1299, lightweight laptop, currently unavailable"
Output:
{
  "name": "MacBook Air M2",
  "price": 1299,
  "currency": "EUR",
  "category": "Laptops",
  "features": ["lightweight", "M2 chip"],
  "in_stock": false
}

Now extract from: {USER_INPUT}

Strategy 3: Multi-Step Validation

Step 1: Generate

Extract information from the text.

Step 2: Structure

Format the extracted information as JSON following this schema:
{SCHEMA}

Step 3: Validate

Verify the JSON is valid and all required fields are present.
If any field is missing or invalid, mark it clearly.

Step 4: Retry if needed

async function extractWithRetry(text: string, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      const result = await extract(text);
      const validated = validateSchema(result);
      return validated;
    } catch (error) {
      if (i === maxRetries - 1) throw error;
      // Retry with more specific prompt
    }
  }
}

Real-World Examples

Example 1: Invoice Processing Agent

Goal: Extract structured data from invoices

System Prompt:

You are an invoice processing agent.

Extract information and return as JSON:
{
  "invoice_number": string,
  "date": "YYYY-MM-DD",
  "vendor": {
    "name": string,
    "address": string,
    "tax_id": string
  },
  "customer": {
    "name": string,
    "address": string
  },
  "items": [
    {
      "description": string,
      "quantity": number,
      "unit_price": number,
      "total": number
    }
  ],
  "subtotal": number,
  "tax": number,
  "total": number,
  "currency": "USD" | "EUR" | "GBP",
  "payment_terms": string
}

Validation rules:
- All monetary values as numbers (not strings)
- Dates in ISO format
- Verify total = subtotal + tax
- Verify items totals sum to subtotal
- Return null for missing fields

Usage:

const invoice = await processInvoice(invoiceText);

// Automated checks
if (invoice.total !== invoice.subtotal + invoice.tax) {
  throw new Error('Invoice totals do not match');
}

// Store in database
await db.invoices.create(invoice);

// Trigger payment workflow
if (invoice.total > 10000) {
  await approvalWorkflow.start(invoice);
}

Example 2: Customer Feedback Analyzer

System Prompt:

Analyze customer feedback and return structured insights.

Output JSON format:
{
  "feedback_id": string,
  "timestamp": "ISO 8601",
  "analysis": {
    "sentiment": {
      "overall": "positive" | "neutral" | "negative",
      "score": -1.0 to 1.0,
      "confidence": 0.0 to 1.0
    },
    "topics": [
      {
        "name": string,
        "sentiment": "positive" | "neutral" | "negative",
        "mentions": number
      }
    ],
    "actionable_items": [
      {
        "category": "bug" | "feature_request" | "improvement" | "question",
        "priority": "low" | "medium" | "high",
        "description": string
      }
    ],
    "customer_effort": 1-5,
    "likely_to_churn": boolean,
    "requires_followup": boolean
  },
  "metadata": {
    "processing_time_ms": number,
    "model_version": string
  }
}

Dashboard Integration:

const analyses = await Promise.all(
  feedbacks.map(fb => analyzeFeedback(fb))
);

// Aggregate metrics
const metrics = {
  averageSentiment: avg(analyses.map(a => a.analysis.sentiment.score)),
  topTopics: groupBy(analyses.flatMap(a => a.analysis.topics), 'name'),
  highPriorityIssues: analyses
    .flatMap(a => a.analysis.actionable_items)
    .filter(item => item.priority === 'high'),
  churnRisk: analyses.filter(a => a.analysis.likely_to_churn).length,
};

// Update dashboard
await dashboard.update(metrics);

Example 3: Content Classification Agent

System Prompt:

Classify content into predefined categories.

Output JSON (strict schema):
{
  "classification": {
    "primary_category": string,
    "secondary_categories": string[],
    "confidence_scores": {
      [category: string]: number  // 0.0 to 1.0
    }
  },
  "attributes": {
    "content_type": "article" | "video" | "image" | "audio",
    "language": "en" | "es" | "fr" | etc.,
    "reading_level": 1-12,
    "word_count": number
  },
  "flags": {
    "is_promotional": boolean,
    "is_time_sensitive": boolean,
    "requires_fact_check": boolean,
    "contains_pii": boolean
  },
  "suggested_tags": string[],
  "seo": {
    "meta_title": string,
    "meta_description": string,
    "keywords": string[]
  }
}

Categories: {CATEGORY_LIST}

Always include all fields. Use null for unavailable data.

Advanced Techniques

Technique 1: Nested Validation

const addressSchema = z.object({
  street: z.string(),
  city: z.string(),
  state: z.string().length(2),
  zip: z.string().regex(/^\d{5}(-\d{4})?$/),
});

const personSchema = z.object({
  name: z.string(),
  email: z.string().email(),
  addresses: z.array(addressSchema),
});

const orderSchema = z.object({
  id: z.string().uuid(),
  customer: personSchema,
  items: z.array(z.object({
    sku: z.string(),
    quantity: z.number().positive(),
    price: z.number().positive(),
  })),
  total: z.number().positive(),
}).refine(
  (order) => {
    const calculatedTotal = order.items.reduce(
      (sum, item) => sum + item.quantity * item.price,
      0
    );
    return Math.abs(calculatedTotal - order.total) < 0.01;
  },
  { message: "Total doesn't match items" }
);

Technique 2: Conditional Fields

Output JSON with conditional fields:
{
  "type": "person" | "company",
  // If type === "person":
  "first_name": string,
  "last_name": string,
  "date_of_birth": "YYYY-MM-DD",
  // If type === "company":
  "company_name": string,
  "founded_date": "YYYY-MM-DD",
  "employee_count": number,
  // Common fields:
  "email": string,
  "phone": string
}

Rules:
- Include type-specific fields based on type value
- Omit fields that don't apply
- All common fields always present

Technique 3: Versioned Schemas

Output JSON (schema v2.1):
{
  "schema_version": "2.1",
  "data": {
    // v2.1 structure
  }
}

Schema changelog:
- v2.1: Added "metadata.processed_by" field
- v2.0: Changed "status" from string to enum
- v1.0: Initial schema

Always include schema_version for backward compatibility.

Error Handling

Graceful Degradation

async function extractWithFallback<T>(
  text: string,
  schema: z.Schema<T>
): Promise<T | PartialT> {
  try {
    const result = await extract(text);
    return schema.parse(result);
  } catch (error) {
    // Try partial parsing
    const partial = schema.partial().parse(result);
    logger.warn('Partial extraction', { partial, error });
    return partial;
  }
}

Validation Errors

try {
  const data = schema.parse(aiOutput);
} catch (error) {
  if (error instanceof z.ZodError) {
    const formatted = error.errors.map(err => ({
      field: err.path.join('.'),
      message: err.message,
      received: err.received,
    }));
    
    // Log for improvement
    logger.error('Schema validation failed', { formatted });
    
    // Retry with more specific prompt
    return await retryWithClarification(formatted);
  }
}

Testing Defined Outputs

Unit Tests

describe('Invoice Extraction Agent', () => {
  it('should extract valid invoice data', async () => {
    const sample = readFile('sample-invoice.txt');
    const result = await extractInvoice(sample);
    
    // Schema validation
    expect(() => invoiceSchema.parse(result)).not.toThrow();
    
    // Business logic validation
    expect(result.total).toBe(result.subtotal + result.tax);
    expect(result.items.length).toBeGreaterThan(0);
  });
  
  it('should handle missing optional fields', async () => {
    const minimal = 'Invoice #123, Total: $100';
    const result = await extractInvoice(minimal);
    
    expect(result.invoice_number).toBe('123');
    expect(result.total).toBe(100);
    expect(result.vendor).toBeNull();
  });
});

Integration Tests

describe('End-to-end extraction', () => {
  it('should process and store invoice', async () => {
    const invoice = await extractInvoice(sampleText);
    const stored = await db.invoices.create(invoice);
    
    expect(stored.id).toBeDefined();
    expect(stored.total).toBe(invoice.total);
  });
});

Best Practices

Define schemas strictly: Be explicit about types, formats, required vs. optional
Provide examples: Show 3-5 examples of perfect outputs
Validate always: Never trust AI output without validation
Handle errors gracefully: Have fallback strategies
Version your schemas: Plan for evolution
Log failures: Learn from validation errors
Test extensively: Unit and integration tests
Monitor in production: Track validation failure rates

Conclusion

Defined outputs transform AI agents from experimental to production-ready. By enforcing structure through schemas, validation, and error handling, you create reliable, scalable systems.

Next Steps:

Choose your output format (JSON recommended)
Define strict schemas
Implement validation
Create example prompts
Test thoroughly
Monitor and iterate

Structured outputs are the foundation of production AI systems.