LLM API Mocking
Mock OpenAI and Anthropic chat completions APIs with configurable responses, streaming simulation, passthrough forwarding, and Model Context Protocol (MCP) server mocking. This guide covers setting up LLM mock simulations and configuring realistic response behaviors.
Prerequisites
- An existing mock simulation or ability to create a new one
- Basic understanding of LLM API concepts (providers, models, chat completions)
- Optional: API keys if using passthrough forwarding to real providers
Creating an LLM Mock Simulation
- Navigate to Simulations and click New Simulation
- Fill in simulation details (name, description, base path)
- Under Simulation Type, select LLM Mock (if available in your version)
- Click Create
Your simulation is now ready to mock LLM API requests.
Configuring LLM Provider Settings
Access LLM Configuration
- Open your LLM mock simulation
- Click the gear icon and select LLM Settings (or navigate to the LLM Config tab)
Set Provider and Model Defaults
Configure the default LLM provider and model:
{
"provider": "openai",
"defaultModel": "gpt-4o",
"authMode": "any",
"defaultResponse": "I am a mock LLM.",
"defaultLatencyMs": 500,
"defaultTokensPerWord": 1.3,
"simulateUsage": true,
"promptTokenMultiplier": 1.0,
"completionTokenMultiplier": 1.0
}
Configuration Options:
| Option | Description | Values | Example |
|---|---|---|---|
provider | LLM provider to mock | openai, anthropic | openai |
defaultModel | Default model identifier | Model name | gpt-4o, claude-3-opus |
authMode | API key validation | any, specific, none | any |
validApiKeys | Valid keys when authMode is "specific" | Array of strings | ["sk-test-123"] |
defaultResponse | Fallback response text | String | "I am a mock LLM." |
defaultLatencyMs | Simulated latency | 0-30000 ms | 500 |
defaultTokensPerWord | Token calculation multiplier | Number | 1.3 |
simulateUsage | Include token usage in responses | Boolean | true |
promptTokenMultiplier | Prompt token count multiplier | Number | 1.0 |
completionTokenMultiplier | Completion token count multiplier | Number | 1.0 |
Enable Passthrough Forwarding (Optional)
To forward requests to real LLM providers when no mock response matches:
{
"passthroughEnabled": true,
"passthroughProviderUrl": "https://api.openai.com",
"passthroughApiKey": "sk-real-key-here",
"passthroughRecord": true,
"rateLimitRpm": 60,
"rateLimitTpm": 100000
}
Passthrough Options:
passthroughEnabled- Enable real provider forwardingpassthroughProviderUrl- URL of the real providerpassthroughApiKey- Credentials for the real provider (stored securely in Secrets Manager)passthroughRecord- Record real provider responses for later replayrateLimitRpm- Rate limit requests per minute (null = unlimited)rateLimitTpm- Rate limit tokens per minute (null = unlimited)
Adding Mock Responses
Create a Basic Text Response
- In your LLM simulation, click Add LLM Response
- Fill in response details:
{
"name": "Weather Query",
"description": "Responds to weather-related prompts",
"matchStrategy": "contains",
"matchValue": "weather",
"priority": 10,
"responseType": "text",
"responseText": "The weather in New York is currently sunny with a high of 72°F.",
"latencyMs": 200,
"isEnabled": true
}
Match Strategies:
| Strategy | Description | Example |
|---|---|---|
exact | Exact prompt match | "What is the weather?" |
contains | Prompt contains substring | "weather" |
regex | Regular expression pattern | /weather.*today/i |
semantic | Semantic similarity (requires embeddings) | Threshold-based |
any | Matches any prompt (catch-all) | No value needed |
Configure Streaming Responses
Enable Server-Sent Events (SSE) streaming with token simulation:
{
"responseType": "text",
"responseText": "This is a streamed response.",
"streamingEnabled": true,
"tokensPerSecond": 50,
"chunkSize": 5,
"streamAbortAt": 0.8
}
Streaming Options:
streamingEnabled- Enable SSE streamingtokensPerSecond- Simulated token generation rate (1-1000)chunkSize- Tokens per SSE chunk (1-50)streamAbortAt- Abort stream at this fraction (0-1) to simulate disconnections
Example: With tokensPerSecond: 50 and chunkSize: 5, the mock will send a chunk every 100ms.
Add Tool Calls
Create a response that returns tool/function calls:
{
"responseType": "tool_calls",
"toolCalls": [
{
"id": "call_123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"New York\", \"unit\": \"celsius\"}"
}
}
],
"toolCallWithText": "I'll get the weather for you.",
"finishReason": "tool_calls"
}
Add Refusals and Errors
Mock refusals or errors:
{
"responseType": "refusal",
"refusalText": "I can't help with that request.",
"finishReason": "content_filter"
}
Or error responses:
{
"responseType": "error",
"errorConfig": {
"errorCode": "rate_limit_exceeded",
"errorMessage": "Rate limit exceeded. Try again later."
}
}
Set Priority and Order
Responses are matched in priority order (higher = checked first):
- Click Reorder LLM Responses
- Drag responses to reorder or set priority values
- Save ordering
Configuring Token Usage Simulation
Mock token counting without streaming:
{
"simulateUsage": true,
"customUsage": {
"prompt_tokens": 25,
"completion_tokens": 45,
"total_tokens": 70
}
}
The mock will include this in the response:
{
"usage": {
"prompt_tokens": 25,
"completion_tokens": 45,
"total_tokens": 70
}
}
MCP Server Mocking
Mock Model Context Protocol servers with tools, resources, and prompts.
Create MCP Configuration
- In your simulation, click MCP Settings
- Configure the MCP server:
{
"enabled": true,
"serverName": "mock_tools",
"protocolVersion": "2024-11-05"
}
Add MCP Tools
Tools expose callable functions to AI clients:
{
"name": "calculator",
"description": "Performs basic arithmetic operations",
"inputSchema": {
"type": "object",
"properties": {
"operation": {
"type": "string",
"enum": ["add", "subtract", "multiply", "divide"]
},
"a": { "type": "number" },
"b": { "type": "number" }
},
"required": ["operation", "a", "b"]
}
}
Add MCP Resources
Resources expose static or dynamic data:
{
"name": "readme",
"uri": "file:///mnt/documents/README.md",
"mimeType": "text/plain",
"contents": "# Project Documentation\n\nThis is the main README file."
}
Add MCP Prompts
Prompts provide templates for common tasks:
{
"name": "code_review",
"description": "Analyzes code and provides feedback",
"arguments": [
{
"name": "language",
"description": "Programming language",
"required": true
},
{
"name": "code",
"description": "Code to review",
"required": true
}
]
}
Testing LLM Mock Responses
Using curl
curl -X POST https://your-instance.surestage.io/v1/chat/completions \
-H "Authorization: Bearer sk-test-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "What is the weather?"}
],
"stream": false
}'
Using Streaming
curl -X POST https://your-instance.surestage.io/v1/chat/completions \
-H "Authorization: Bearer sk-test-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Tell me a story"}
],
"stream": true
}'
Using Python
import requests
response = requests.post(
"https://your-instance.surestage.io/v1/chat/completions",
headers={
"Authorization": "Bearer sk-test-key",
"Content-Type": "application/json"
},
json={
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "What is the weather?"}
]
}
)
print(response.json())
Common Use Cases
Mock GPT-4 Completions
Create multiple responses for different scenarios:
[
{
"name": "Success Case",
"matchStrategy": "contains",
"matchValue": "summarize",
"responseType": "text",
"responseText": "Here's a concise summary: ..."
},
{
"name": "Error Case",
"matchStrategy": "contains",
"matchValue": "invalid",
"responseType": "error",
"errorConfig": {"errorCode": "invalid_request_error"}
}
]
Simulate Streaming Responses
Test client-side streaming handling:
{
"streamingEnabled": true,
"tokensPerSecond": 30,
"chunkSize": 3,
"responseText": "Streaming response with multiple chunks"
}
Mock Function Calling
Test tool/function call workflows:
{
"responseType": "tool_calls",
"toolCalls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "search",
"arguments": "{\"query\": \"latest news\"}"
}
}
]
}
Fallback to Real Provider
Enable passthrough for unmocked requests:
{
"passthroughEnabled": true,
"passthroughProviderUrl": "https://api.openai.com",
"passthroughApiKey": "sk-real-key"
}
Troubleshooting
Responses Not Matching
- Verify match strategy is correct for your prompt
- Check match value is a substring (for
contains) or valid regex (forregex) - Ensure response priority is higher than other catch-all responses
- Test match logic with the preview feature if available
Streaming Not Working
- Verify
streamingEnabledis true on the response - Confirm client accepts
stream: truein request - Check browser DevTools Network tab for SSE events
- Verify
tokensPerSecondandchunkSizevalues are reasonable
Token Count Issues
- Ensure
simulateUsageis enabled - Verify
promptTokenMultiplierandcompletionTokenMultiplierare set correctly - Check
defaultTokensPerWordmultiplier for word-based calculations
Passthrough Not Working
- Verify API key is correct and stored securely
- Check provider URL matches the real provider endpoint
- Ensure rate limits are not exceeded
- Review simulation logs for passthrough errors
API Reference
LLM Config Endpoints
| Method | Path | Description |
|---|---|---|
GET | /instances/:id/llm-config | Get LLM configuration |
PATCH | /instances/:id/llm-config | Create or update LLM config |
DELETE | /instances/:id/llm-config | Delete LLM configuration |
LLM Responses Endpoints
| Method | Path | Description |
|---|---|---|
GET | /instances/:id/llm-responses | List all LLM responses |
POST | /instances/:id/llm-responses | Create new LLM response |
GET | /instances/:id/llm-responses/:responseId | Get response details |
PATCH | /instances/:id/llm-responses/:responseId | Update LLM response |
DELETE | /instances/:id/llm-responses/:responseId | Delete LLM response |
POST | /instances/:id/llm-responses/reorder | Reorder responses by priority |
Next Steps
- Response Versioning - Track response changes over time
- Request Validation - Validate incoming LLM API requests
- Analytics - Monitor LLM mock traffic and usage
- Mock Snapshots - Share LLM mock scenarios