XYLEX Group

Retry & Error Handling

Building resilient workflows with retries and error strategies

Retry & Error Handling

Worlds Engine provides comprehensive retry mechanisms and error handling strategies.

Retry Strategies

Exponential Backoff

Doubles delay between retries:

const activity = activity('api-call', handler, {
  retry: {
    maxAttempts: 5,
    backoff: 'exponential',
    initialInterval: 1000,    // 1s
    maxInterval: 30000,       // 30s cap
    multiplier: 2
  }
});

Retry delays: 1s → 2s → 4s → 8s → 16s

Linear Backoff

Increases delay linearly:

{
  retry: {
    maxAttempts: 3,
    backoff: 'linear',
    initialInterval: 2000,    // 2s
    maxInterval: 10000,       // 10s cap
    multiplier: 1
  }
}

Retry delays: 2s → 4s → 6s

Constant Backoff

Fixed delay between retries:

{
  retry: {
    maxAttempts: 5,
    backoff: 'constant',
    initialInterval: 5000     // Always 5s
  }
}

Predefined Retry Patterns

import { retryPatterns } from 'worlds-engine';

// API calls
const apiActivity = activity('api', handler, {
  retry: retryPatterns.api
});
// { maxAttempts: 5, backoff: 'exponential', initialInterval: 1000, maxInterval: 30000, multiplier: 2 }

// Database operations
const dbActivity = activity('db', handler, {
  retry: retryPatterns.database
});
// { maxAttempts: 3, backoff: 'exponential', initialInterval: 500, maxInterval: 10000, multiplier: 2 }

// Network requests
const networkActivity = activity('network', handler, {
  retry: retryPatterns.network
});
// { maxAttempts: 5, backoff: 'exponential', initialInterval: 2000, maxInterval: 60000, multiplier: 3 }

withRetry Helper

Wrap any async function with retry logic:

import { withRetry, retryPatterns } from 'worlds-engine';

const result = await withRetry(
  async () => {
    const response = await fetch('https://api.example.com/data');
    if (!response.ok) throw new Error('API error');
    return response.json();
  },
  retryPatterns.api
);

Retryable Functions

Create retryable versions of functions:

import { retryable, retryPatterns } from 'worlds-engine';

const fetchData = retryable(
  async (url: string) => {
    const response = await fetch(url);
    return response.json();
  },
  retryPatterns.network
);

// Use with automatic retries
const data = await fetchData('https://api.example.com');

Error Classes

FatalError

Marks error as non-retryable:

import { FatalError } from 'worlds-engine';

const validateInput = activity('validate', async (ctx, input) => {
  if (!input.email) {
    throw new FatalError('Email is required'); // Won't retry
  }
  return { valid: true };
});

RetryableError

Marks error as retryable with optional delay:

import { RetryableError } from 'worlds-engine';

const fetchAPI = activity('fetch-api', async (ctx, input) => {
  const response = await fetch(input.url);
  
  if (response.status === 429) {
    // Rate limited - retry after 10s
    throw new RetryableError('Rate limited', 10000);
  }
  
  if (response.status >= 500) {
    // Server error - use default retry
    throw new RetryableError('Server error');
  }
  
  return response.json();
});

Conditional Retry

Decide whether to retry based on error type:

import { shouldRetryError } from 'worlds-engine';

const activity = activity('smart-retry', async (ctx, input) => {
  try {
    return await someOperation();
  } catch (error) {
    if (shouldRetryError.network(error)) {
      throw error; // Will retry
    }
    throw new FatalError('Non-network error'); // Won't retry
  }
});

Available predicates:

  • shouldRetryError.network(error) - Network errors
  • shouldRetryError.rateLimit(error) - Rate limit errors
  • shouldRetryError.server(error) - 5xx server errors
  • shouldRetryError.transient(error) - Any transient error

Failure Strategies

Retry Strategy

Restart workflow from beginning:

const workflow = workflow('retry', handler, {
  failureStrategy: 'retry',
  retry: {
    maxAttempts: 3,
    backoff: 'exponential',
    initialInterval: 5000
  }
});

Status progression: pendingrunningfailedpendingrunning

Compensate Strategy

Execute saga compensations in reverse:

const workflow = workflow('compensate', handler, {
  failureStrategy: 'compensate'
});

Status progression: pendingrunningfailedcompensatingcompensated

Cascade Strategy

Propagate failure to parent and children:

const workflow = workflow('cascade', handler, {
  failureStrategy: 'cascade'
});

Cancels parent workflow and all child workflows.

Ignore Strategy

Mark failed without action:

const workflow = workflow('ignore', handler, {
  failureStrategy: 'ignore'
});

Final status: failed (no retries or compensations)

Quarantine Strategy

Isolate for debugging:

const workflow = workflow('quarantine', handler, {
  failureStrategy: 'quarantine'
});

Preserves full state, no automatic retry or cleanup.

Timeout Handling

Activity Timeouts

const activity = activity('slow-operation', handler, {
  timeout: '5m'  // 5 minutes
});

Supported formats:

  • '1000' or 1000 - Milliseconds
  • '5s' - Seconds
  • '10m' - Minutes
  • '2h' - Hours

Workflow Timeouts

const workflow = workflow('timed', handler, {
  timeout: '1h'
});

Throws timeout error if execution exceeds limit.

Heartbeat Timeouts

For long-running activities:

const activity = activity('long-task', async (ctx, input) => {
  for (let i = 0; i < 1000; i++) {
    await processItem(i);
    ctx.heartbeat(`Processing item ${i}`); // Keep alive
  }
}, {
  heartbeatTimeout: '30s'
});

Error Propagation

Errors flow through the workflow hierarchy:

Activity Error
    ↓
Activity Retry Logic
    ↓ (if exhausted)
Workflow Error Handler
    ↓
Failure Strategy
    ↓
Compensation / Retry / Cascade

Best Practices

1. Use Appropriate Backoff

// Fast operations - linear or constant
const cacheRead = activity('cache', handler, {
  retry: { backoff: 'constant', maxAttempts: 3, initialInterval: 100 }
});

// External APIs - exponential
const apiCall = activity('api', handler, {
  retry: { backoff: 'exponential', maxAttempts: 5, initialInterval: 1000 }
});

2. Set Reasonable Timeouts

// Quick operations
const validate = activity('validate', handler, { timeout: '5s' });

// External API calls
const fetchData = activity('fetch', handler, { timeout: '30s' });

// Data processing
const processLarge = activity('process', handler, { timeout: '10m' });

3. Use FatalError for Business Logic

const checkCredit = activity('credit-check', async (ctx, input) => {
  const credit = await getCreditScore(input.userId);
  
  if (credit < 600) {
    // Business rule violation - don't retry
    throw new FatalError('Insufficient credit score');
  }
  
  return { approved: true };
});

4. Heartbeat Long Operations

const batchProcess = activity('batch', async (ctx, items) => {
  for (const [index, item] of items.entries()) {
    await process(item);
    
    if (index % 10 === 0) {
      ctx.heartbeat(`Processed ${index}/${items.length}`);
    }
  }
}, {
  heartbeatTimeout: '1m'
});

5. Monitor Retry Metrics

const activity = activity('monitored', handler, {
  retry: retryPatterns.api,
  onRetry: (attempt, error) => {
    console.log(`Retry ${attempt}: ${error.message}`);
    metrics.increment('activity.retry', { activity: 'monitored' });
  }
});

Examples

API with Circuit Breaker

let failureCount = 0;
let lastFailure = 0;
const CIRCUIT_THRESHOLD = 5;
const CIRCUIT_TIMEOUT = 60000;

const apiCall = activity('api-with-circuit', async (ctx, input) => {
  // Check circuit breaker
  if (failureCount >= CIRCUIT_THRESHOLD) {
    if (Date.now() - lastFailure < CIRCUIT_TIMEOUT) {
      throw new FatalError('Circuit breaker open');
    }
    failureCount = 0; // Reset after timeout
  }
  
  try {
    const response = await fetch(input.url);
    failureCount = 0; // Reset on success
    return response.json();
  } catch (error) {
    failureCount++;
    lastFailure = Date.now();
    throw new RetryableError('API call failed');
  }
}, {
  retry: retryPatterns.api
});

Retry with Fallback

const dataFetch = activity('fetch-with-fallback', async (ctx, input) => {
  try {
    return await fetchFromPrimary(input);
  } catch (error) {
    if (ctx.attempt < 3) {
      throw error; // Retry primary
    }
    // Use fallback after retries exhausted
    return await fetchFromSecondary(input);
  }
}, {
  retry: { maxAttempts: 3, backoff: 'exponential', initialInterval: 1000 }
});

Next Steps