Generating Test Data

Data Studio uses AI to generate realistic test data based on your schema. This guide covers the generation workflow, prompting strategies, locale support, and version management.

How Generation Works

When you generate data, Data Studio:

Reads your schema and field descriptions
Sends a prompt to the AI with context about the data shape
Generates the requested number of rows
Validates each row against the schema
Stores the result as a new immutable version

Each generation creates a new version. Previous versions remain accessible in the Versions tab.

Generating Data

Open a dataset and click the Generate button in the toolbar
Configure the generation in the dialog:
- Row count — number of records to generate (1 to your tier max)
- Prompt (optional) — natural-language context for the AI (e.g., "Generate B2B SaaS customers with enterprise-tier pricing")
- Locale (optional) — language/region for internationalized data (defaults to en_US)
Click Generate

Small Datasets (Under 500 Rows)

Generation completes immediately. The Data tab refreshes to show the new records, and a new entry appears in the Versions tab.

Large Datasets (500+ Rows)

A progress bar appears in the toolbar showing generation progress. You can navigate away — the generation continues in the background. When complete:

A notification appears confirming the generation finished
The Data tab updates with the new records
The Versions tab shows the new version with row count, duration, and tokens used

Writing Effective Prompts

Prompts provide semantic context beyond field descriptions. The difference is significant:

Prompt	AI Output
(empty)	Generic customers with random attributes
"Generate Fortune 500 enterprise customers with multi-year contracts and annual revenue over $1M"	Enterprise-focused customers with realistic high-value attributes

Prompt Tips

Be specific about the domain — "B2B SaaS" vs "B2C retail" produces very different data
Describe relationships — "customers who recently churned" adds temporal context
Specify value ranges — "revenue between $50K and $500K" constrains numeric output
Include temporal context — "signed up in the last 30 days" produces recent timestamps
Name the persona — "healthcare providers in California" narrows the output

Example Prompts by Use Case

Use Case	Prompt
Integration testing	"Generate active users with verified emails and complete profiles"
Load testing	"Generate diverse customer demographics across all US regions"
Error handling	"Generate users with missing fields, invalid emails, and expired subscriptions"
Demo data	"Generate polished, realistic Fortune 100 companies with recognizable industry names"
Edge cases	"Generate records with Unicode names, maximum-length strings, and boundary numeric values"

Locale Support

Select a locale from the Locale dropdown in the generation dialog to produce internationalized data.

Supported locales:

Locale	Region
`en_US`	United States English (default)
`en_GB`	British English
`fr_FR`	French (France)
`de_DE`	German (Germany)
`es_ES`	Spanish (Spain)
`ja_JP`	Japanese (Japan)

Locales affect name generation (culture-appropriate names), address formats, phone number formats, and date/time conventions.

tip

Combine locale with a prompt for the best results. Select fr_FR and add "Generate French customers with Paris addresses" to get fully localized output.

Version History

Every generation creates an immutable version. Open the Versions tab to see the full history.

Each version row shows:

Version number and timestamp
Row count — records in this version
Tokens used — AI tokens consumed
Duration — how long generation took
Prompt — the prompt used (if any)

Comparing Versions

Open the Versions tab
Click a version row to preview its data
Use version history to compare data across generations or roll back to an earlier dataset

Switching the Active Version

The Data tab always shows the latest version by default. To view an older version:

Open the Versions tab
Click the version you want to inspect
The Data tab updates to show that version's records

Token Usage

Each generation consumes AI tokens. Token usage is displayed:

In the generation completion notification
On each version row in the Versions tab
In the organization-level usage dashboard under Settings > Usage

Factors affecting token usage:

Number of rows generated
Schema complexity (field count, nested objects)
Prompt length
Locale variants

Optimizing usage:

Generate larger batches instead of many small generations
Use concise, targeted field descriptions
Reuse existing versions when the data hasn't changed

Rate Limits

Tier	Generations per Hour	Max Rows per Generation
Free	10	1,000
Pro	100	10,000
Enterprise	Unlimited	Unlimited

When you reach the limit, the Generate button is disabled. A banner shows when the limit resets. Limits reset hourly.

Troubleshooting

Generated data is too generic

Add more specific prompts and field descriptions. A field named status with the description "Order fulfillment status: pending, processing, shipped, delivered, cancelled" produces far better results than status alone.

Data doesn't match the schema

Check for type mismatches in your schema. Ensure field descriptions are consistent with the selected types.

Large generation seems stuck

Generations of 10,000+ rows can take several minutes. The progress bar updates periodically. You can navigate away and return — the generation continues in the background.

Next Steps

Scenarios — create test case variants with different prompts
Sharing — share generated datasets via public links
Datasets — manage dataset metadata and versions

How Generation Works​

Generating Data​

Small Datasets (Under 500 Rows)​

Large Datasets (500+ Rows)​

Writing Effective Prompts​

Prompt Tips​

Example Prompts by Use Case​

Locale Support​

Version History​

Comparing Versions​

Switching the Active Version​

Token Usage​

Rate Limits​

Troubleshooting​

Generated data is too generic​

Data doesn't match the schema​

Large generation seems stuck​

Next Steps​