Skip to main content

Risk Simulation

Risk Simulation lets you model failure scenarios and test how your application handles degraded services, errors, and edge cases. Run scenarios that simulate API outages, latency spikes, rate limits, and data corruption to validate resilience.

Why Use Risk Simulation

  • Test error handling - Verify your app gracefully handles API failures
  • Validate retries - Ensure retry logic works correctly
  • Simulate rate limits - Test how your app responds to 429 errors
  • Model latency - Identify timeout issues before production
  • Chaos engineering - Introduce controlled failures to find weaknesses

How It Works

  1. Create a scenario defining failure conditions (status codes, latency, error rates)
  2. Attach the scenario to a Sandbox
  3. Run the scenario to activate failure conditions
  4. Monitor results in real-time via Server-Sent Events (SSE)
  5. Analyze impact on your application

Creating Scenarios

API Request

curl -X POST https://api.surestage.com/v1/risk-simulation/scenarios \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Payment Gateway Outage",
"description": "Simulate complete payment service failure",
"sandboxId": "sandbox_abc123",
"conditions": [
{
"type": "error_rate",
"target": "/payments/*",
"errorRate": 1.0,
"errorCode": 503,
"errorMessage": "Service Temporarily Unavailable"
}
],
"duration": 300
}'

Response 201 Created

{
"id": "scenario_xyz789",
"name": "Payment Gateway Outage",
"description": "Simulate complete payment service failure",
"sandboxId": "sandbox_abc123",
"conditions": [ /* ... */ ],
"duration": 300,
"status": "ready",
"createdAt": "2026-03-21T10:00:00Z"
}

Parameters

FieldTypeRequiredDescription
namestringYesScenario name
descriptionstringNoScenario description
sandboxIdstringYesTarget Sandbox ID
conditionsarrayYesArray of failure conditions (see below)
durationnumberNoDuration in seconds (0 = manual stop)

Condition Types

Error Rate

Inject errors at a specific rate:

{
"type": "error_rate",
"target": "/users/*",
"errorRate": 0.3,
"errorCode": 500,
"errorMessage": "Internal Server Error"
}
FieldDescription
errorRatePercentage of requests that fail (0.0-1.0)
errorCodeHTTP status code to return
errorMessageError message in response

Latency

Add artificial delay to responses:

{
"type": "latency",
"target": "/orders/*",
"minLatency": 2000,
"maxLatency": 5000
}
FieldDescription
minLatencyMinimum delay in milliseconds
maxLatencyMaximum delay in milliseconds

Latency is randomized between min and max for each request.

Rate Limit

Simulate rate limiting:

{
"type": "rate_limit",
"target": "/api/*",
"requestsPerMinute": 10,
"errorCode": 429,
"errorMessage": "Too Many Requests"
}
FieldDescription
requestsPerMinuteMaximum requests allowed
errorCodeStatus code when limit exceeded
errorMessageError message

Data Corruption

Return malformed or invalid data:

{
"type": "data_corruption",
"target": "/users/:id",
"corruptionRate": 0.2,
"corruptionType": "missing_fields"
}
FieldDescription
corruptionRatePercentage of responses corrupted (0.0-1.0)
corruptionTypemissing_fields, invalid_types, malformed_json

Timeout

Force requests to timeout:

{
"type": "timeout",
"target": "/external/*",
"timeoutRate": 0.1
}

Requests matching the target hang indefinitely, forcing client timeouts.

Running Scenarios

Start Scenario

curl -X POST https://api.surestage.com/v1/risk-simulation/scenarios/scenario_xyz789/start \
-H "Authorization: Bearer $TOKEN"

Response 202 Accepted

{
"runId": "run_abc123",
"scenarioId": "scenario_xyz789",
"status": "running",
"startedAt": "2026-03-21T11:00:00Z",
"endsAt": "2026-03-21T11:05:00Z"
}

The scenario is immediately active. All requests to the Sandbox matching condition targets are affected.

Stop Scenario

curl -X POST https://api.surestage.com/v1/risk-simulation/scenarios/scenario_xyz789/stop \
-H "Authorization: Bearer $TOKEN"

Response 200 OK

{
"runId": "run_abc123",
"status": "stopped",
"stoppedAt": "2026-03-21T11:03:00Z"
}

The Sandbox immediately returns to normal behavior.

Real-Time Monitoring with SSE

Subscribe to scenario runs via Server-Sent Events to receive real-time updates.

Connect to SSE Stream

curl -N https://api.surestage.com/v1/risk-simulation/scenarios/scenario_xyz789/runs/run_abc123/stream \
-H "Authorization: Bearer $TOKEN"

Event Stream

event: run_started
data: {"runId":"run_abc123","startedAt":"2026-03-21T11:00:00Z"}

event: request_affected
data: {"method":"POST","path":"/payments","condition":"error_rate","result":"503"}

event: request_affected
data: {"method":"GET","path":"/payments/123","condition":"error_rate","result":"503"}

event: metrics_update
data: {"totalRequests":42,"affectedRequests":14,"errorRate":0.33}

event: run_completed
data: {"runId":"run_abc123","endedAt":"2026-03-21T11:05:00Z","totalRequests":156,"affectedRequests":52}

Event Types

EventDescription
run_startedScenario run began
request_affectedA request was impacted by a condition
metrics_updateReal-time metrics (sent every 5 seconds)
run_completedScenario run finished
run_stoppedScenario manually stopped

Analyzing Results

Get Run Summary

curl https://api.surestage.com/v1/risk-simulation/scenarios/scenario_xyz789/runs/run_abc123 \
-H "Authorization: Bearer $TOKEN"

Response 200 OK

{
"runId": "run_abc123",
"scenarioId": "scenario_xyz789",
"status": "completed",
"startedAt": "2026-03-21T11:00:00Z",
"endedAt": "2026-03-21T11:05:00Z",
"duration": 300,
"metrics": {
"totalRequests": 156,
"affectedRequests": 52,
"byCondition": {
"error_rate": {
"requests": 52,
"errorRate": 1.0,
"averageLatency": 12
}
},
"byEndpoint": {
"/payments": {
"requests": 30,
"errors": 30
},
"/payments/:id": {
"requests": 22,
"errors": 22
}
}
}
}

List Scenario Runs

curl https://api.surestage.com/v1/risk-simulation/scenarios/scenario_xyz789/runs \
-H "Authorization: Bearer $TOKEN"

Response 200 OK

{
"runs": [
{
"runId": "run_abc123",
"status": "completed",
"startedAt": "2026-03-21T11:00:00Z",
"duration": 300,
"totalRequests": 156,
"affectedRequests": 52
}
]
}

Managing Scenarios

List Scenarios

curl https://api.surestage.com/v1/risk-simulation/scenarios?sandboxId=sandbox_abc123 \
-H "Authorization: Bearer $TOKEN"

Update Scenario

curl -X PATCH https://api.surestage.com/v1/risk-simulation/scenarios/scenario_xyz789 \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Partial Payment Outage",
"conditions": [
{
"type": "error_rate",
"target": "/payments/*",
"errorRate": 0.5,
"errorCode": 503
}
]
}'

You cannot update a running scenario. Stop it first, then update.

Delete Scenario

curl -X DELETE https://api.surestage.com/v1/risk-simulation/scenarios/scenario_xyz789 \
-H "Authorization: Bearer $TOKEN"

Deleting a scenario stops any active runs.

Example Scenarios

Intermittent Failures

{
"name": "Intermittent Database Errors",
"conditions": [
{
"type": "error_rate",
"target": "/users/*",
"errorRate": 0.2,
"errorCode": 500,
"errorMessage": "Database connection failed"
}
]
}

Slow External API

{
"name": "Slow Payment Processing",
"conditions": [
{
"type": "latency",
"target": "/payments/*",
"minLatency": 3000,
"maxLatency": 8000
}
]
}

Rate Limit Hit

{
"name": "Rate Limit Exceeded",
"conditions": [
{
"type": "rate_limit",
"target": "/api/*",
"requestsPerMinute": 20,
"errorCode": 429
}
]
}

Combined Conditions

{
"name": "Degraded Service",
"conditions": [
{
"type": "latency",
"target": "/orders/*",
"minLatency": 1000,
"maxLatency": 3000
},
{
"type": "error_rate",
"target": "/orders/*",
"errorRate": 0.1,
"errorCode": 502
}
]
}

Security

  • All scenario operations are protected by JwtAuthGuard and TenantGuard
  • Scenarios are scoped to Sandboxes — you cannot affect other Tenants
  • Running scenarios are logged for audit purposes
  • SSE streams require valid authentication tokens

Common Issues

Problem: Scenario not affecting requests

Solution: Verify the target pattern matches your route paths. Use wildcards (*) to match multiple routes.

Problem: SSE stream disconnects

Solution: SSE streams timeout after 60 seconds of inactivity. Reconnect if the scenario is still running.

Problem: Cannot start scenario

Solution: Check that no other scenario is running on the same Sandbox. Only one scenario can run at a time per Sandbox.