Skip to main content

LLM API Mocking

Mock OpenAI and Anthropic chat completions APIs with configurable responses, streaming simulation, passthrough forwarding, and Model Context Protocol (MCP) server mocking. This guide covers setting up LLM mock simulations and configuring realistic response behaviors.

Prerequisites

  • An existing mock simulation or ability to create a new one
  • Basic understanding of LLM API concepts (providers, models, chat completions)
  • Optional: API keys if using passthrough forwarding to real providers

Creating an LLM Mock Simulation

  1. Navigate to Simulations and click New Simulation
  2. Fill in simulation details (name, description, base path)
  3. Under Simulation Type, select LLM Mock (if available in your version)
  4. Click Create

Your simulation is now ready to mock LLM API requests.

Configuring LLM Provider Settings

Access LLM Configuration

  1. Open your LLM mock simulation
  2. Click the gear icon and select LLM Settings (or navigate to the LLM Config tab)

Set Provider and Model Defaults

Configure the default LLM provider and model:

{
"provider": "openai",
"defaultModel": "gpt-4o",
"authMode": "any",
"defaultResponse": "I am a mock LLM.",
"defaultLatencyMs": 500,
"defaultTokensPerWord": 1.3,
"simulateUsage": true,
"promptTokenMultiplier": 1.0,
"completionTokenMultiplier": 1.0
}

Configuration Options:

OptionDescriptionValuesExample
providerLLM provider to mockopenai, anthropicopenai
defaultModelDefault model identifierModel namegpt-4o, claude-3-opus
authModeAPI key validationany, specific, noneany
validApiKeysValid keys when authMode is "specific"Array of strings["sk-test-123"]
defaultResponseFallback response textString"I am a mock LLM."
defaultLatencyMsSimulated latency0-30000 ms500
defaultTokensPerWordToken calculation multiplierNumber1.3
simulateUsageInclude token usage in responsesBooleantrue
promptTokenMultiplierPrompt token count multiplierNumber1.0
completionTokenMultiplierCompletion token count multiplierNumber1.0

Enable Passthrough Forwarding (Optional)

To forward requests to real LLM providers when no mock response matches:

{
"passthroughEnabled": true,
"passthroughProviderUrl": "https://api.openai.com",
"passthroughApiKey": "sk-real-key-here",
"passthroughRecord": true,
"rateLimitRpm": 60,
"rateLimitTpm": 100000
}

Passthrough Options:

  • passthroughEnabled - Enable real provider forwarding
  • passthroughProviderUrl - URL of the real provider
  • passthroughApiKey - Credentials for the real provider (stored securely in Secrets Manager)
  • passthroughRecord - Record real provider responses for later replay
  • rateLimitRpm - Rate limit requests per minute (null = unlimited)
  • rateLimitTpm - Rate limit tokens per minute (null = unlimited)

Adding Mock Responses

Create a Basic Text Response

  1. In your LLM simulation, click Add LLM Response
  2. Fill in response details:
{
"name": "Weather Query",
"description": "Responds to weather-related prompts",
"matchStrategy": "contains",
"matchValue": "weather",
"priority": 10,
"responseType": "text",
"responseText": "The weather in New York is currently sunny with a high of 72°F.",
"latencyMs": 200,
"isEnabled": true
}

Match Strategies:

StrategyDescriptionExample
exactExact prompt match"What is the weather?"
containsPrompt contains substring"weather"
regexRegular expression pattern/weather.*today/i
semanticSemantic similarity (requires embeddings)Threshold-based
anyMatches any prompt (catch-all)No value needed

Configure Streaming Responses

Enable Server-Sent Events (SSE) streaming with token simulation:

{
"responseType": "text",
"responseText": "This is a streamed response.",
"streamingEnabled": true,
"tokensPerSecond": 50,
"chunkSize": 5,
"streamAbortAt": 0.8
}

Streaming Options:

  • streamingEnabled - Enable SSE streaming
  • tokensPerSecond - Simulated token generation rate (1-1000)
  • chunkSize - Tokens per SSE chunk (1-50)
  • streamAbortAt - Abort stream at this fraction (0-1) to simulate disconnections

Example: With tokensPerSecond: 50 and chunkSize: 5, the mock will send a chunk every 100ms.

Add Tool Calls

Create a response that returns tool/function calls:

{
"responseType": "tool_calls",
"toolCalls": [
{
"id": "call_123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"New York\", \"unit\": \"celsius\"}"
}
}
],
"toolCallWithText": "I'll get the weather for you.",
"finishReason": "tool_calls"
}

Add Refusals and Errors

Mock refusals or errors:

{
"responseType": "refusal",
"refusalText": "I can't help with that request.",
"finishReason": "content_filter"
}

Or error responses:

{
"responseType": "error",
"errorConfig": {
"errorCode": "rate_limit_exceeded",
"errorMessage": "Rate limit exceeded. Try again later."
}
}

Set Priority and Order

Responses are matched in priority order (higher = checked first):

  1. Click Reorder LLM Responses
  2. Drag responses to reorder or set priority values
  3. Save ordering

Configuring Token Usage Simulation

Mock token counting without streaming:

{
"simulateUsage": true,
"customUsage": {
"prompt_tokens": 25,
"completion_tokens": 45,
"total_tokens": 70
}
}

The mock will include this in the response:

{
"usage": {
"prompt_tokens": 25,
"completion_tokens": 45,
"total_tokens": 70
}
}

MCP Server Mocking

Mock Model Context Protocol servers with tools, resources, and prompts.

Create MCP Configuration

  1. In your simulation, click MCP Settings
  2. Configure the MCP server:
{
"enabled": true,
"serverName": "mock_tools",
"protocolVersion": "2024-11-05"
}

Add MCP Tools

Tools expose callable functions to AI clients:

{
"name": "calculator",
"description": "Performs basic arithmetic operations",
"inputSchema": {
"type": "object",
"properties": {
"operation": {
"type": "string",
"enum": ["add", "subtract", "multiply", "divide"]
},
"a": { "type": "number" },
"b": { "type": "number" }
},
"required": ["operation", "a", "b"]
}
}

Add MCP Resources

Resources expose static or dynamic data:

{
"name": "readme",
"uri": "file:///mnt/documents/README.md",
"mimeType": "text/plain",
"contents": "# Project Documentation\n\nThis is the main README file."
}

Add MCP Prompts

Prompts provide templates for common tasks:

{
"name": "code_review",
"description": "Analyzes code and provides feedback",
"arguments": [
{
"name": "language",
"description": "Programming language",
"required": true
},
{
"name": "code",
"description": "Code to review",
"required": true
}
]
}

Testing LLM Mock Responses

Using curl

curl -X POST https://your-instance.surestage.io/v1/chat/completions \
-H "Authorization: Bearer sk-test-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "What is the weather?"}
],
"stream": false
}'

Using Streaming

curl -X POST https://your-instance.surestage.io/v1/chat/completions \
-H "Authorization: Bearer sk-test-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Tell me a story"}
],
"stream": true
}'

Using Python

import requests

response = requests.post(
"https://your-instance.surestage.io/v1/chat/completions",
headers={
"Authorization": "Bearer sk-test-key",
"Content-Type": "application/json"
},
json={
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "What is the weather?"}
]
}
)

print(response.json())

Common Use Cases

Mock GPT-4 Completions

Create multiple responses for different scenarios:

[
{
"name": "Success Case",
"matchStrategy": "contains",
"matchValue": "summarize",
"responseType": "text",
"responseText": "Here's a concise summary: ..."
},
{
"name": "Error Case",
"matchStrategy": "contains",
"matchValue": "invalid",
"responseType": "error",
"errorConfig": {"errorCode": "invalid_request_error"}
}
]

Simulate Streaming Responses

Test client-side streaming handling:

{
"streamingEnabled": true,
"tokensPerSecond": 30,
"chunkSize": 3,
"responseText": "Streaming response with multiple chunks"
}

Mock Function Calling

Test tool/function call workflows:

{
"responseType": "tool_calls",
"toolCalls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "search",
"arguments": "{\"query\": \"latest news\"}"
}
}
]
}

Fallback to Real Provider

Enable passthrough for unmocked requests:

{
"passthroughEnabled": true,
"passthroughProviderUrl": "https://api.openai.com",
"passthroughApiKey": "sk-real-key"
}

Troubleshooting

Responses Not Matching

  • Verify match strategy is correct for your prompt
  • Check match value is a substring (for contains) or valid regex (for regex)
  • Ensure response priority is higher than other catch-all responses
  • Test match logic with the preview feature if available

Streaming Not Working

  • Verify streamingEnabled is true on the response
  • Confirm client accepts stream: true in request
  • Check browser DevTools Network tab for SSE events
  • Verify tokensPerSecond and chunkSize values are reasonable

Token Count Issues

  • Ensure simulateUsage is enabled
  • Verify promptTokenMultiplier and completionTokenMultiplier are set correctly
  • Check defaultTokensPerWord multiplier for word-based calculations

Passthrough Not Working

  • Verify API key is correct and stored securely
  • Check provider URL matches the real provider endpoint
  • Ensure rate limits are not exceeded
  • Review simulation logs for passthrough errors

API Reference

LLM Config Endpoints

MethodPathDescription
GET/instances/:id/llm-configGet LLM configuration
PATCH/instances/:id/llm-configCreate or update LLM config
DELETE/instances/:id/llm-configDelete LLM configuration

LLM Responses Endpoints

MethodPathDescription
GET/instances/:id/llm-responsesList all LLM responses
POST/instances/:id/llm-responsesCreate new LLM response
GET/instances/:id/llm-responses/:responseIdGet response details
PATCH/instances/:id/llm-responses/:responseIdUpdate LLM response
DELETE/instances/:id/llm-responses/:responseIdDelete LLM response
POST/instances/:id/llm-responses/reorderReorder responses by priority

Next Steps