Risk Simulation
Risk Simulation lets you model failure scenarios and test how your application handles degraded services, errors, and edge cases. Run scenarios that simulate API outages, latency spikes, rate limits, and data corruption to validate resilience.
Why Use Risk Simulation
- Test error handling - Verify your app gracefully handles API failures
- Validate retries - Ensure retry logic works correctly
- Simulate rate limits - Test how your app responds to 429 errors
- Model latency - Identify timeout issues before production
- Chaos engineering - Introduce controlled failures to find weaknesses
How It Works
- Create a scenario defining failure conditions (status codes, latency, error rates)
- Attach the scenario to a Sandbox
- Run the scenario to activate failure conditions
- Monitor results in real-time via Server-Sent Events (SSE)
- Analyze impact on your application
Creating Scenarios
API Request
curl -X POST https://api.surestage.com/v1/risk-simulation/scenarios \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Payment Gateway Outage",
"description": "Simulate complete payment service failure",
"sandboxId": "sandbox_abc123",
"conditions": [
{
"type": "error_rate",
"target": "/payments/*",
"errorRate": 1.0,
"errorCode": 503,
"errorMessage": "Service Temporarily Unavailable"
}
],
"duration": 300
}'
Response 201 Created
{
"id": "scenario_xyz789",
"name": "Payment Gateway Outage",
"description": "Simulate complete payment service failure",
"sandboxId": "sandbox_abc123",
"conditions": [ /* ... */ ],
"duration": 300,
"status": "ready",
"createdAt": "2026-03-21T10:00:00Z"
}
Parameters
| Field | Type | Required | Description |
|---|---|---|---|
| name | string | Yes | Scenario name |
| description | string | No | Scenario description |
| sandboxId | string | Yes | Target Sandbox ID |
| conditions | array | Yes | Array of failure conditions (see below) |
| duration | number | No | Duration in seconds (0 = manual stop) |
Condition Types
Error Rate
Inject errors at a specific rate:
{
"type": "error_rate",
"target": "/users/*",
"errorRate": 0.3,
"errorCode": 500,
"errorMessage": "Internal Server Error"
}
| Field | Description |
|---|---|
| errorRate | Percentage of requests that fail (0.0-1.0) |
| errorCode | HTTP status code to return |
| errorMessage | Error message in response |
Latency
Add artificial delay to responses:
{
"type": "latency",
"target": "/orders/*",
"minLatency": 2000,
"maxLatency": 5000
}
| Field | Description |
|---|---|
| minLatency | Minimum delay in milliseconds |
| maxLatency | Maximum delay in milliseconds |
Latency is randomized between min and max for each request.
Rate Limit
Simulate rate limiting:
{
"type": "rate_limit",
"target": "/api/*",
"requestsPerMinute": 10,
"errorCode": 429,
"errorMessage": "Too Many Requests"
}
| Field | Description |
|---|---|
| requestsPerMinute | Maximum requests allowed |
| errorCode | Status code when limit exceeded |
| errorMessage | Error message |
Data Corruption
Return malformed or invalid data:
{
"type": "data_corruption",
"target": "/users/:id",
"corruptionRate": 0.2,
"corruptionType": "missing_fields"
}
| Field | Description |
|---|---|
| corruptionRate | Percentage of responses corrupted (0.0-1.0) |
| corruptionType | missing_fields, invalid_types, malformed_json |
Timeout
Force requests to timeout:
{
"type": "timeout",
"target": "/external/*",
"timeoutRate": 0.1
}
Requests matching the target hang indefinitely, forcing client timeouts.
Running Scenarios
Start Scenario
curl -X POST https://api.surestage.com/v1/risk-simulation/scenarios/scenario_xyz789/start \
-H "Authorization: Bearer $TOKEN"
Response 202 Accepted
{
"runId": "run_abc123",
"scenarioId": "scenario_xyz789",
"status": "running",
"startedAt": "2026-03-21T11:00:00Z",
"endsAt": "2026-03-21T11:05:00Z"
}
The scenario is immediately active. All requests to the Sandbox matching condition targets are affected.
Stop Scenario
curl -X POST https://api.surestage.com/v1/risk-simulation/scenarios/scenario_xyz789/stop \
-H "Authorization: Bearer $TOKEN"
Response 200 OK
{
"runId": "run_abc123",
"status": "stopped",
"stoppedAt": "2026-03-21T11:03:00Z"
}
The Sandbox immediately returns to normal behavior.
Real-Time Monitoring with SSE
Subscribe to scenario runs via Server-Sent Events to receive real-time updates.
Connect to SSE Stream
curl -N https://api.surestage.com/v1/risk-simulation/scenarios/scenario_xyz789/runs/run_abc123/stream \
-H "Authorization: Bearer $TOKEN"
Event Stream
event: run_started
data: {"runId":"run_abc123","startedAt":"2026-03-21T11:00:00Z"}
event: request_affected
data: {"method":"POST","path":"/payments","condition":"error_rate","result":"503"}
event: request_affected
data: {"method":"GET","path":"/payments/123","condition":"error_rate","result":"503"}
event: metrics_update
data: {"totalRequests":42,"affectedRequests":14,"errorRate":0.33}
event: run_completed
data: {"runId":"run_abc123","endedAt":"2026-03-21T11:05:00Z","totalRequests":156,"affectedRequests":52}
Event Types
| Event | Description |
|---|---|
| run_started | Scenario run began |
| request_affected | A request was impacted by a condition |
| metrics_update | Real-time metrics (sent every 5 seconds) |
| run_completed | Scenario run finished |
| run_stopped | Scenario manually stopped |
Analyzing Results
Get Run Summary
curl https://api.surestage.com/v1/risk-simulation/scenarios/scenario_xyz789/runs/run_abc123 \
-H "Authorization: Bearer $TOKEN"
Response 200 OK
{
"runId": "run_abc123",
"scenarioId": "scenario_xyz789",
"status": "completed",
"startedAt": "2026-03-21T11:00:00Z",
"endedAt": "2026-03-21T11:05:00Z",
"duration": 300,
"metrics": {
"totalRequests": 156,
"affectedRequests": 52,
"byCondition": {
"error_rate": {
"requests": 52,
"errorRate": 1.0,
"averageLatency": 12
}
},
"byEndpoint": {
"/payments": {
"requests": 30,
"errors": 30
},
"/payments/:id": {
"requests": 22,
"errors": 22
}
}
}
}
List Scenario Runs
curl https://api.surestage.com/v1/risk-simulation/scenarios/scenario_xyz789/runs \
-H "Authorization: Bearer $TOKEN"
Response 200 OK
{
"runs": [
{
"runId": "run_abc123",
"status": "completed",
"startedAt": "2026-03-21T11:00:00Z",
"duration": 300,
"totalRequests": 156,
"affectedRequests": 52
}
]
}
Managing Scenarios
List Scenarios
curl https://api.surestage.com/v1/risk-simulation/scenarios?sandboxId=sandbox_abc123 \
-H "Authorization: Bearer $TOKEN"
Update Scenario
curl -X PATCH https://api.surestage.com/v1/risk-simulation/scenarios/scenario_xyz789 \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Partial Payment Outage",
"conditions": [
{
"type": "error_rate",
"target": "/payments/*",
"errorRate": 0.5,
"errorCode": 503
}
]
}'
You cannot update a running scenario. Stop it first, then update.
Delete Scenario
curl -X DELETE https://api.surestage.com/v1/risk-simulation/scenarios/scenario_xyz789 \
-H "Authorization: Bearer $TOKEN"
Deleting a scenario stops any active runs.
Example Scenarios
Intermittent Failures
{
"name": "Intermittent Database Errors",
"conditions": [
{
"type": "error_rate",
"target": "/users/*",
"errorRate": 0.2,
"errorCode": 500,
"errorMessage": "Database connection failed"
}
]
}
Slow External API
{
"name": "Slow Payment Processing",
"conditions": [
{
"type": "latency",
"target": "/payments/*",
"minLatency": 3000,
"maxLatency": 8000
}
]
}
Rate Limit Hit
{
"name": "Rate Limit Exceeded",
"conditions": [
{
"type": "rate_limit",
"target": "/api/*",
"requestsPerMinute": 20,
"errorCode": 429
}
]
}
Combined Conditions
{
"name": "Degraded Service",
"conditions": [
{
"type": "latency",
"target": "/orders/*",
"minLatency": 1000,
"maxLatency": 3000
},
{
"type": "error_rate",
"target": "/orders/*",
"errorRate": 0.1,
"errorCode": 502
}
]
}
Security
- All scenario operations are protected by
JwtAuthGuardandTenantGuard - Scenarios are scoped to Sandboxes — you cannot affect other Tenants
- Running scenarios are logged for audit purposes
- SSE streams require valid authentication tokens
Common Issues
Problem: Scenario not affecting requests
Solution: Verify the target pattern matches your route paths. Use wildcards (*) to match multiple routes.
Problem: SSE stream disconnects
Solution: SSE streams timeout after 60 seconds of inactivity. Reconnect if the scenario is still running.
Problem: Cannot start scenario
Solution: Check that no other scenario is running on the same Sandbox. Only one scenario can run at a time per Sandbox.
Related
- Routes & Responses - Understand route matching
- Compliance & Governance - Enforce testing requirements
- API Contract Testing - Validate error responses match contracts