LLM API Mocking

Mock OpenAI and Anthropic chat completions APIs with configurable responses, streaming simulation, passthrough forwarding, and Model Context Protocol (MCP) server mocking. This guide covers setting up LLM mock simulations and configuring realistic response behaviors.

Prerequisites

An existing mock simulation or ability to create a new one
Basic understanding of LLM API concepts (providers, models, chat completions)
Optional: API keys if using passthrough forwarding to real providers

Creating an LLM Mock Simulation

Navigate to Simulations and click New Simulation
Fill in simulation details (name, description, base path)
Under Simulation Type, select LLM Mock (if available in your version)
Click Create

Your simulation is now ready to mock LLM API requests.

Configuring LLM Provider Settings

Access LLM Configuration

Open your LLM mock simulation
Click the gear icon and select LLM Settings (or navigate to the LLM Config tab)

Set Provider and Model Defaults

Configure the default LLM provider and model:

{
  "provider": "openai",
  "defaultModel": "gpt-4o",
  "authMode": "any",
  "defaultResponse": "I am a mock LLM.",
  "defaultLatencyMs": 500,
  "defaultTokensPerWord": 1.3,
  "simulateUsage": true,
  "promptTokenMultiplier": 1.0,
  "completionTokenMultiplier": 1.0
}

Configuration Options:

Option	Description	Values	Example
`provider`	LLM provider to mock	`openai`, `anthropic`	`openai`
`defaultModel`	Default model identifier	Model name	`gpt-4o`, `claude-3-opus`
`authMode`	API key validation	`any`, `specific`, `none`	`any`
`validApiKeys`	Valid keys when authMode is "specific"	Array of strings	`["sk-test-123"]`
`defaultResponse`	Fallback response text	String	`"I am a mock LLM."`
`defaultLatencyMs`	Simulated latency	0-30000 ms	`500`
`defaultTokensPerWord`	Token calculation multiplier	Number	`1.3`
`simulateUsage`	Include token usage in responses	Boolean	`true`
`promptTokenMultiplier`	Prompt token count multiplier	Number	`1.0`
`completionTokenMultiplier`	Completion token count multiplier	Number	`1.0`

Enable Passthrough Forwarding (Optional)

To forward requests to real LLM providers when no mock response matches:

{
  "passthroughEnabled": true,
  "passthroughProviderUrl": "https://api.openai.com",
  "passthroughApiKey": "sk-real-key-here",
  "passthroughRecord": true,
  "rateLimitRpm": 60,
  "rateLimitTpm": 100000
}

Passthrough Options:

passthroughEnabled - Enable real provider forwarding
passthroughProviderUrl - URL of the real provider
passthroughApiKey - Credentials for the real provider (stored securely in Secrets Manager)
passthroughRecord - Record real provider responses for later replay
rateLimitRpm - Rate limit requests per minute (null = unlimited)
rateLimitTpm - Rate limit tokens per minute (null = unlimited)

Adding Mock Responses

Create a Basic Text Response

In your LLM simulation, click Add LLM Response
Fill in response details:

{
  "name": "Weather Query",
  "description": "Responds to weather-related prompts",
  "matchStrategy": "contains",
  "matchValue": "weather",
  "priority": 10,
  "responseType": "text",
  "responseText": "The weather in New York is currently sunny with a high of 72°F.",
  "latencyMs": 200,
  "isEnabled": true
}

Match Strategies:

Strategy	Description	Example
`exact`	Exact prompt match	`"What is the weather?"`
`contains`	Prompt contains substring	`"weather"`
`regex`	Regular expression pattern	`/weather.*today/i`
`semantic`	Semantic similarity (requires embeddings)	Threshold-based
`any`	Matches any prompt (catch-all)	No value needed

Configure Streaming Responses

Enable Server-Sent Events (SSE) streaming with token simulation:

{
  "responseType": "text",
  "responseText": "This is a streamed response.",
  "streamingEnabled": true,
  "tokensPerSecond": 50,
  "chunkSize": 5,
  "streamAbortAt": 0.8
}

Streaming Options:

streamingEnabled - Enable SSE streaming
tokensPerSecond - Simulated token generation rate (1-1000)
chunkSize - Tokens per SSE chunk (1-50)
streamAbortAt - Abort stream at this fraction (0-1) to simulate disconnections

Example: With tokensPerSecond: 50 and chunkSize: 5, the mock will send a chunk every 100ms.

Add Tool Calls

Create a response that returns tool/function calls:

{
  "responseType": "tool_calls",
  "toolCalls": [
    {
      "id": "call_123",
      "type": "function",
      "function": {
        "name": "get_weather",
        "arguments": "{\"location\": \"New York\", \"unit\": \"celsius\"}"
      }
    }
  ],
  "toolCallWithText": "I'll get the weather for you.",
  "finishReason": "tool_calls"
}

Add Refusals and Errors

Mock refusals or errors:

{
  "responseType": "refusal",
  "refusalText": "I can't help with that request.",
  "finishReason": "content_filter"
}

Or error responses:

{
  "responseType": "error",
  "errorConfig": {
    "errorCode": "rate_limit_exceeded",
    "errorMessage": "Rate limit exceeded. Try again later."
  }
}

Set Priority and Order

Responses are matched in priority order (higher = checked first):

Click Reorder LLM Responses
Drag responses to reorder or set priority values
Save ordering

Configuring Token Usage Simulation

Mock token counting without streaming:

{
  "simulateUsage": true,
  "customUsage": {
    "prompt_tokens": 25,
    "completion_tokens": 45,
    "total_tokens": 70
  }
}

The mock will include this in the response:

{
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 45,
    "total_tokens": 70
  }
}

MCP Server Mocking

Mock Model Context Protocol servers with tools, resources, and prompts.

Create MCP Configuration

In your simulation, click MCP Settings
Configure the MCP server:

{
  "enabled": true,
  "serverName": "mock_tools",
  "protocolVersion": "2024-11-05"
}

Add MCP Tools

Tools expose callable functions to AI clients:

{
  "name": "calculator",
  "description": "Performs basic arithmetic operations",
  "inputSchema": {
    "type": "object",
    "properties": {
      "operation": {
        "type": "string",
        "enum": ["add", "subtract", "multiply", "divide"]
      },
      "a": { "type": "number" },
      "b": { "type": "number" }
    },
    "required": ["operation", "a", "b"]
  }
}

Add MCP Resources

Resources expose static or dynamic data:

{
  "name": "readme",
  "uri": "file:///mnt/documents/README.md",
  "mimeType": "text/plain",
  "contents": "# Project Documentation\n\nThis is the main README file."
}

Add MCP Prompts

Prompts provide templates for common tasks:

{
  "name": "code_review",
  "description": "Analyzes code and provides feedback",
  "arguments": [
    {
      "name": "language",
      "description": "Programming language",
      "required": true
    },
    {
      "name": "code",
      "description": "Code to review",
      "required": true
    }
  ]
}

Testing LLM Mock Responses

Using curl

curl -X POST https://your-instance.surestage.io/v1/chat/completions \
  -H "Authorization: Bearer sk-test-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "What is the weather?"}
    ],
    "stream": false
  }'

Using Streaming

curl -X POST https://your-instance.surestage.io/v1/chat/completions \
  -H "Authorization: Bearer sk-test-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Tell me a story"}
    ],
    "stream": true
  }'

Using Python

import requests

response = requests.post(
    "https://your-instance.surestage.io/v1/chat/completions",
    headers={
        "Authorization": "Bearer sk-test-key",
        "Content-Type": "application/json"
    },
    json={
        "model": "gpt-4o",
        "messages": [
            {"role": "user", "content": "What is the weather?"}
        ]
    }
)

print(response.json())

Common Use Cases

Mock GPT-4 Completions

Create multiple responses for different scenarios:

[
  {
    "name": "Success Case",
    "matchStrategy": "contains",
    "matchValue": "summarize",
    "responseType": "text",
    "responseText": "Here's a concise summary: ..."
  },
  {
    "name": "Error Case",
    "matchStrategy": "contains",
    "matchValue": "invalid",
    "responseType": "error",
    "errorConfig": {"errorCode": "invalid_request_error"}
  }
]

Simulate Streaming Responses

Test client-side streaming handling:

{
  "streamingEnabled": true,
  "tokensPerSecond": 30,
  "chunkSize": 3,
  "responseText": "Streaming response with multiple chunks"
}

Mock Function Calling

Test tool/function call workflows:

{
  "responseType": "tool_calls",
  "toolCalls": [
    {
      "id": "call_abc123",
      "type": "function",
      "function": {
        "name": "search",
        "arguments": "{\"query\": \"latest news\"}"
      }
    }
  ]
}

Fallback to Real Provider

Enable passthrough for unmocked requests:

{
  "passthroughEnabled": true,
  "passthroughProviderUrl": "https://api.openai.com",
  "passthroughApiKey": "sk-real-key"
}

Troubleshooting

Responses Not Matching

Verify match strategy is correct for your prompt
Check match value is a substring (for contains) or valid regex (for regex)
Ensure response priority is higher than other catch-all responses
Test match logic with the preview feature if available

Streaming Not Working

Verify streamingEnabled is true on the response
Confirm client accepts stream: true in request
Check browser DevTools Network tab for SSE events
Verify tokensPerSecond and chunkSize values are reasonable

Token Count Issues

Ensure simulateUsage is enabled
Verify promptTokenMultiplier and completionTokenMultiplier are set correctly
Check defaultTokensPerWord multiplier for word-based calculations

Passthrough Not Working

Verify API key is correct and stored securely
Check provider URL matches the real provider endpoint
Ensure rate limits are not exceeded
Review simulation logs for passthrough errors

API Reference

LLM Config Endpoints

Method	Path	Description
`GET`	`/instances/:id/llm-config`	Get LLM configuration
`PATCH`	`/instances/:id/llm-config`	Create or update LLM config
`DELETE`	`/instances/:id/llm-config`	Delete LLM configuration

LLM Responses Endpoints

Method	Path	Description
`GET`	`/instances/:id/llm-responses`	List all LLM responses
`POST`	`/instances/:id/llm-responses`	Create new LLM response
`GET`	`/instances/:id/llm-responses/:responseId`	Get response details
`PATCH`	`/instances/:id/llm-responses/:responseId`	Update LLM response
`DELETE`	`/instances/:id/llm-responses/:responseId`	Delete LLM response
`POST`	`/instances/:id/llm-responses/reorder`	Reorder responses by priority

Next Steps

Response Versioning - Track response changes over time
Request Validation - Validate incoming LLM API requests
Analytics - Monitor LLM mock traffic and usage
Mock Snapshots - Share LLM mock scenarios

Prerequisites​

Creating an LLM Mock Simulation​

Configuring LLM Provider Settings​

Access LLM Configuration​

Set Provider and Model Defaults​

Enable Passthrough Forwarding (Optional)​

Adding Mock Responses​

Create a Basic Text Response​

Configure Streaming Responses​

Add Tool Calls​

Add Refusals and Errors​

Set Priority and Order​

Configuring Token Usage Simulation​

MCP Server Mocking​

Create MCP Configuration​

Add MCP Tools​

Add MCP Resources​

Add MCP Prompts​

Testing LLM Mock Responses​

Using curl​

Using Streaming​

Using Python​

Common Use Cases​

Mock GPT-4 Completions​

Simulate Streaming Responses​

Mock Function Calling​

Fallback to Real Provider​

Troubleshooting​

Responses Not Matching​

Streaming Not Working​

Token Count Issues​

Passthrough Not Working​

API Reference​

LLM Config Endpoints​

LLM Responses Endpoints​

Next Steps​