LaunchDarkly Experiment Setup
You're using a skill that guides you through setting up and running experiments in LaunchDarkly. Your job is to design the experiment, create it with the right metrics, treatments, and flag config, start data collection, evolve the design between iterations when needed, and stop with a winner.
Prerequisites
This skill requires the remotely hosted LaunchDarkly MCP server to be configured in your environment.
Required MCP tools:
create-experiment— create a new experiment with its initial iteration (hypothesis, metrics, treatments, flag config).start-experiment-iteration— begin collecting data for an experiment's current draft iteration.get-experiment— check experiment status, treatments, metrics, and current iteration.
Optional MCP tools:
list-experiments— browse existing experiments in the project.update-experiment— update fields on the experiment or its current iteration. HonoursmutableFieldsByStatus, so what's editable depends on whether the iteration isnot_started,running, orstopped. Returns rejected inputs underskipped.save-and-start-experiment-iteration— the API-recommended way to change locked fields on a running experiment. Stops the current iteration, creates a new draft with the supplied field updates, and starts it in one call.stop-experiment-iteration— stop the running iteration. You must declare a winner: pass thewinningTreatmentId(and awinningReason). If no variation outperformed, pick the baseline/control as the winner.list-metrics,create-metric,list-metric-events— manage metrics referenced by the experiment.
Core Concepts
What Are Experiments?
Experiments in LaunchDarkly measure the impact of feature flag variations on key metrics. An experiment consists of:
- Treatments: the flag variations being compared (control vs. test). Each treatment has an
allocationPercent; the values across treatments should sum to 100. - Metrics: what you're measuring (conversion rate, latency, revenue, etc.). One must be the primary metric.
- Flag config: the
flagKey,ruleId, andflagConfigVersionof the targeting rule that drives the experiment. - Iteration: a single data-collection window. Created in
not_startedstatus, becomesrunningwhen started, transitions tostoppedwhen ended. - Holdout (optional): a project-level group of users excluded from the experiment for baseline measurement (
holdoutId).
Experiment Lifecycle
- Create the experiment with its first iteration (
create-experiment). - Start the iteration to begin data collection (
start-experiment-iteration). - Monitor results as data accumulates (
get-experiment). - Evolve the design mid-experiment if needed — change locked fields like
treatments,metrics, ormethodologyby callingsave-and-start-experiment-iteration, which stops the current iteration, creates a new draft with your changes, and starts it. - Stop the iteration when you have a winner or a clear call (
stop-experiment-iteration). - Ship the winning variation.
Core Principles
- Metrics first: ensure the metrics you'll reference exist before creating the experiment.
- Clear hypothesis: every iteration requires a
hypothesisstring; state what you expect to improve and by how much. - Proper controls: exactly one treatment must have
baseline: true. - Sufficient sample size: let iterations run long enough for statistical significance.
- One change at a time: test one variable per experiment for clear attribution.
Workflow
Step 1: Prepare Metrics
- Use
list-metricsto find existing metrics. - If you need a new one, use
create-metricand note the key. - Decide which is the primary metric (a single metric or a funnel group). You'll pass its key as
primarySingleMetricKeyorprimaryFunnelKeyon the iteration.
| Goal | Metric type | Example key |
|---|---|---|
| Conversion | Custom conversion | checkout-completed |
| Performance | Custom numeric | page-load-time-ms |
| Engagement | Custom conversion | feature-clicked |
| Revenue | Custom numeric | order-value |
Step 2: Identify the Targeting Rule
You need the ruleId and current flagConfigVersion of the flag rule that will drive the experiment. Use get-flag on the flag (or its environment-scoped status) to find them. The fallthrough rule's id is the string "fallthrough".
Step 3: Create the Experiment
Call create-experiment. The top-level fields describe the experiment; the nested iteration object describes the first data-collection window.
{
"projectKey": "my-project",
"environmentKey": "production",
"key": "checkout-flow-v2-experiment",
"name": "Checkout Flow v2 Experiment",
"description": "Compare the redesigned checkout against the current flow.",
"tags": ["growth", "checkout"],
"methodology": "bayesian",
"iteration": {
"hypothesis": "The redesigned checkout will lift completion rate by 3%.",
"primarySingleMetricKey": "checkout-completed",
"metrics": [
{ "key": "checkout-completed" },
{ "key": "checkout-time-seconds" }
],
"treatments": [
{
"name": "Control",
"baseline": true,
"allocationPercent": 50,
"parameters": [
{ "flagKey": "checkout-flow-v2", "variationId": "variation-a-id" }
]
},
{
"name": "New Checkout",
"baseline": false,
"allocationPercent": 50,
"parameters": [
{ "flagKey": "checkout-flow-v2", "variationId": "variation-b-id" }
]
}
],
"flags": {
"checkout-flow-v2": {
"ruleId": "fallthrough",
"flagConfigVersion": 7
}
},
"randomizationUnit": "user"
}
}
Useful optional top-level fields:
holdoutId— attach an existing holdout.dataSource—"launchdarkly"(default),"snowflake", or"databricks".methodology—"bayesian"(default),"frequentist", or"export_only".analysisConfig— set thresholds, multiple-comparison correction, or sequential testing.
Useful optional iteration fields:
attributes— array of context attribute keys to slice results by (e.g.["country", "device"]).covariateId— covariate CSV id for stratified sampling.canReshuffleTraffic— defaults totrue; setfalseto lock users to their initial variation when allocations change.
Step 4: Start Data Collection
{
"projectKey": "my-project",
"environmentKey": "production",
"experimentKey": "checkout-flow-v2-experiment"
}
Before starting, the API requires that:
- the flag is toggled on,
- the iteration has a
randomizationUnit, and - at least one treatment has a non-zero
allocationPercent.
Pass changeJustification if you're restarting after a prior iteration was stopped.
Step 5: Verify
- Call
get-experimentand confirmcurrentIteration.status === "running". - Check that treatments are present with the expected allocations.
- Check the metric list and the primary metric.
Step 6: Evolve the Design Mid-Experiment (when needed)
Most structural fields (treatments, metrics, methodology, hypothesis, …) are locked while an iteration is running. Two ways to change them:
- Light edits while running —
update-experimentwill let through anythingmutableFieldsByStatuspermits in therunningstate (typically just metadata likename,description,maintainerId,tags, plus appendingmetrics/attributes). It surfaces rejected fields underskippedwith a reason. - Real design changes — call
save-and-start-experiment-iteration. It stops the current iteration, creates a new draft with the supplied field updates applied, and starts it in one call. Inputs matchupdate-experiment, pluschangeJustification. Mutability is checked againstnot_startedsince updates land on the new draft.
Example: swap the treatment allocation and add a metric in a single call.
{
"projectKey": "my-project",
"environmentKey": "production",
"experimentKey": "checkout-flow-v2-experiment",
"changeJustification": "Lowering control allocation now that variant looks safe.",
"treatments": [
{
"name": "Control",
"baseline": true,
"allocationPercent": 30,
"parameters": [{ "flagKey": "checkout-flow-v2", "variationId": "variation-a-id" }]
},
{
"name": "New Checkout",
"baseline": false,
"allocationPercent": 70,
"parameters": [{ "flagKey": "checkout-flow-v2", "variationId": "variation-b-id" }]
}
],
"metrics": [
{ "key": "checkout-completed" },
{ "key": "checkout-time-seconds" },
{ "key": "checkout-error-rate" }
]
}
Step 7: Stop the Iteration
When you've reached significance or made a call, stop the iteration. A winning treatment is required to stop — LaunchDarkly does not let you end an iteration without declaring a winner. Pass the winning treatment's id (returned in get-experiment as _id on each treatment) plus a winningReason.
If the experiment was inconclusive or no variation beat the control, declare the baseline/control treatment as the winner and say so in winningReason (e.g. "Inconclusive — no significant lift, keeping control"). There is no "stop without a winner" path.
{
"projectKey": "my-project",
"environmentKey": "production",
"experimentKey": "checkout-flow-v2-experiment",
"winningTreatmentId": "treat-002",
"winningReason": "Two weeks of data, +4.1% lift on the primary metric with PBBL > 95%."
}
Report results:
- Iteration stopped with the declared
winningTreatmentId(the control/baseline if inconclusive). - Lift / significance summary on the primary metric.
- Next steps (ship the winner, roll back, or start a follow-up iteration).
Edge Cases
| Situation | Action |
|---|---|
| Metric doesn't exist | Create it first with create-metric. |
| Flag has no variations to compare | Create flag variations before designing treatments. |
You don't know the flag's ruleId / flagConfigVersion | Use get-flag or get-flag-status-across-envs. The fallthrough rule's id is the string "fallthrough". |
| Experiment already exists | Use list-experiments to find it; get-experiment for details. |
| Need to change locked fields mid-experiment | Use save-and-start-experiment-iteration (single call) rather than stopping and recreating by hand. |
update-experiment returns skipped for a field | Inspect the currentStatus and allowedFields in the response — that field isn't mutable in the current iteration status. Either stop the iteration first or use save-and-start-experiment-iteration. |
What NOT to Do
- Don't omit
iterationoncreate-experiment— it's required. - Don't set
baseline: trueon more than one treatment. - Don't let
allocationPercentvalues fail to sum to 100 across treatments. - Don't try to change locked iteration fields with
update-experimentwhile the iteration isrunning— reach forsave-and-start-experiment-iterationinstead. - Don't stop iterations early — wait for statistical significance.
- Don't run multiple experiments on the same flag at the same time without a careful holdout design.

