June 27, 2026
How to Debug Playwright Tests That Fail Only in Headless CI but Pass in Local Interactive Runs
A practical step-by-step guide to debugging Playwright tests that fail only in headless CI but pass locally, with checks for timing, viewport, auth, environment, and trace-based diagnosis.
When a Playwright test passes in local interactive mode but fails only in headless CI, the problem is usually not “CI being weird.” It is a signal that the test depends on something your local run provides implicitly, while the pipeline removes, delays, or changes it. The browser is different, the rendering environment is different, the timing is different, and sometimes the app itself behaves differently under production-like conditions.
This is one of the most common forms of modern browser automation failure, and it is also one of the easiest to misunderstand. The key is to debug it as a systems problem, not just as a flaky assertion. You are trying to reconcile three environments at once: your local interactive session, the headless browser runtime, and the CI worker that hosts it.
For the official Playwright basics, the Playwright documentation is the right starting point. This guide focuses on the failure pattern itself, how to isolate it quickly, and how to make the test stable without hiding real regressions.
What “local pass, CI fail” usually means
If a Playwright test fails only in headless CI but passes locally in headed mode, one of these is usually true:
- The test relies on a timing assumption that is true on your laptop but false on CI.
- The browser viewport, device scale factor, locale, time zone, or font set differs.
- The app behaves differently when request latency, CPU contention, or cold caches appear.
- The test uses a locator or assertion that is too broad, too early, or too coupled to layout.
- Authentication, storage, or test data is not identical across environments.
- The test hides a genuine product bug that only appears under production-like constraints.
If a test is stable only when watched by a human, it is often coupled to a human-paced feedback loop, not to the app state.
That distinction matters. A good debugging workflow first determines whether the issue is in the test, the environment, or the application.
Step 1: Reproduce the CI conditions locally
Do not start by changing the test. Start by making your local run look like CI.
Run headless locally
If you normally use npx playwright test --ui or headed mode, switch to headless:
import { test, expect } from '@playwright/test';
test('checkout works', async ({ page }) => {
await page.goto('http://localhost:3000');
await page.getByRole('button', { name: 'Checkout' }).click();
await expect(page.getByText('Payment details')).toBeVisible();
});
Then run it with the same project and browser channel as CI.
Match browser version and OS as closely as possible
CI-only failures often disappear on a developer laptop simply because the local browser, GPU acceleration, fonts, and system libraries are different. If your pipeline uses Linux containers, test inside a Linux environment locally too. If the CI image is pinned, use the same Playwright version and browser revision.
Match viewport and device settings
A local interactive run often has a larger viewport than CI. Responsive breakpoints can hide or reveal elements, move buttons, or trigger mobile menus. Set the same viewport explicitly in your Playwright config.
import { defineConfig } from '@playwright/test';
export default defineConfig({ use: { viewport: { width: 1280, height: 720 }, locale: ‘en-US’, timezoneId: ‘UTC’ } });
If the test still passes locally after you match CI conditions, you have likely ruled out the most obvious environment drift.
Step 2: Inspect the failure with traces, screenshots, and video
When a test fails only in CI, the fastest path to truth is usually the trace. Playwright’s trace viewer gives you a timeline of actions, DOM snapshots, console logs, network activity, and screenshots.
Enable trace collection on first retry or on failure:
import { defineConfig } from '@playwright/test';
export default defineConfig({ retries: 1, use: { trace: ‘on-first-retry’, screenshot: ‘only-on-failure’, video: ‘retain-on-failure’ } });
Then review:
- Which action failed, click, fill, navigation, or assertion
- Whether the element existed but was not visible, not stable, or covered
- Whether the page was still loading when the assertion ran
- Whether there were console errors or network failures
- Whether the failure happens before or after a redirect, auth step, or async data load
If your CI system allows artifacts, make sure the trace archive, screenshots, and videos are downloadable from the job. The point is not just to see the failure, but to compare the failed run with the successful local run.
Look for these trace patterns
- The test clicked a button before the app finished rendering.
- The selector matched multiple elements in CI because the DOM was slightly different.
- A spinner disappeared later in CI than locally.
- A popup, cookie banner, or sticky header blocked the target element.
- An API call returned slower or with different data in CI.
A trace often tells you whether you need a better wait condition, a narrower locator, or a corrected environment setup.
Step 3: Check whether the problem is timing, not logic
The most common cause of local pass, CI fail is timing. But “add a wait” is not a debugging strategy. It is only useful if you know what you are waiting for.
Prefer state-based waits over fixed delays
Avoid this:
typescript
await page.waitForTimeout(2000);
That hides the symptom without explaining the state transition. Instead, wait for a specific condition that reflects readiness.
typescript
await expect(page.getByRole('heading', { name: 'Payment details' })).toBeVisible();
await page.getByRole('button', { name: 'Continue' }).click();
Or wait for a network-driven event if that is what your application depends on:
typescript
await Promise.all([
page.waitForResponse(resp => resp.url().includes('/api/cart') && resp.ok()),
page.getByRole('button', { name: 'Refresh cart' }).click()
]);
Check for races in test steps
A common anti-pattern is clicking and immediately asserting without waiting for the resulting UI state.
typescript
await page.getByRole('button', { name: 'Save' }).click();
await expect(page.getByText('Saved')).toBeVisible();
This is fine only if the app itself guarantees the toast appears after the click. If the save action triggers a route change, modal close, or async request, you may need to wait for that specific transition.
Watch for animation and transition differences
Headless runs can be faster or slower depending on the environment. CSS transitions, animated overlays, and skeleton loaders can change the moment an element becomes actionable. If a button is covered briefly by an animation, the test may pass locally because your click is slower, but fail in CI because it happens earlier.
In those cases, wait for the overlay to disappear or for the target to become enabled, not for a vague timeout.
Step 4: Verify locators, because “it works locally” can still mean “the selector is fragile”
A locator that is good enough in a calm local session may fail when the DOM changes slightly in CI. This happens often with text-based selectors, index-based selectors, or CSS selectors tied to layout.
Use semantic locators where possible
Playwright’s accessible selectors are usually the safest starting point:
typescript
await page.getByRole('button', { name: 'Submit order' }).click();
This is more resilient than targeting a class name that may change when a feature flag or responsive breakpoint alters the DOM.
Make sure your locator is unique in the real rendered state
A duplicate label, hidden template, or mobile menu clone may cause the same text to appear twice in CI but not locally. To confirm, use strict locators or assert uniqueness.
typescript
const submit = page.getByRole('button', { name: 'Submit order' });
await expect(submit).toHaveCount(1);
await submit.click();
Be careful with nth selectors
If you use locator.nth(0) or :nth-child, you are encoding DOM order into the test. That order can shift because of ads, banners, feature flags, locale differences, or a new A/B experiment. When a test fails only in CI, inspect whether the intended target is actually the first matching element.
Step 5: Compare environment differences systematically
Some failures are not about Playwright at all. They are about the environment around it.
Viewport and responsive layout
If CI uses a smaller viewport, the page may collapse into a menu, move content below the fold, or hide elements behind a hamburger button.
Time zone and locale
Date formatting, relative labels, and locale-dependent sorting can differ. A test that expects 12/01/2026 may fail in a UTC or non-US environment. Align both when testing date-sensitive flows.
Fonts and rendering
Different fonts can alter text width, causing labels to wrap or buttons to shift. That can change visibility, overlap, or the precise point at which an element becomes clickable.
Auth and storage state
Interactive local runs often preserve cookies, localStorage, or logged-in sessions longer than CI. In CI, every run may start fresh. If a test depends on auth state, load it intentionally:
import { test } from '@playwright/test';
test.use({ storageState: ‘storage-state.json’ });
If that file is created during setup, make sure the setup step itself is deterministic and not tied to an expired session or environment-specific identity provider behavior.
Network and backend dependencies
A CI worker may have slower access to APIs, lower DNS cache warmth, or stricter network policies. A local run against a dev backend is not the same as CI running in parallel against a shared staging environment.
If a browser test depends on a real backend, the test is also a distributed systems test.
That means latency, data drift, eventual consistency, and rate limiting can all surface as browser flakiness.
Step 6: Read the browser and page logs, not just the assertion error
An assertion failure often hides the actual root cause. For example, “expected visible” may really mean the page redirected to login, a JavaScript error stopped rendering, or a request returned 500.
Add console and page error logging during debugging:
import { test } from '@playwright/test';
test.beforeEach(async ({ page }) => {
page.on(‘console’, msg => console.log(console:${msg.type()}, msg.text()));
page.on(‘pageerror’, err => console.log(pageerror:${err.message}));
});
Also inspect failed network requests in the trace. A 401, 403, 500, or CORS problem may only appear in CI because of secrets, routing, or environment configuration.
Step 7: Reduce the test to the smallest failing scenario
If the test is large, break it down. CI-only failures become much easier to reason about when you isolate the exact line that changes state.
Binary search the steps
Comment out half the flow, rerun, and continue narrowing the scope. The goal is to identify whether failure starts during navigation, data entry, a specific click, or a post-action assertion.
Split setup from behavior
If the failure occurs after a lengthy login, cart setup, or data seeding routine, separate that setup into a fixture or dedicated helper. This makes it easier to verify whether the setup is stable before the actual test begins.
Make the test deterministic
Where possible, seed data, stub unstable endpoints, or create isolated test records. If the test needs a payment intent, email, or third-party callback, avoid depending on a live side effect that can vary run to run.
Step 8: Decide whether to fix the test or the product
Not every CI-only failure should be “solved” by a better wait.
Here is a practical decision rule:
- If the trace shows the test acting before the UI is ready, fix the test.
- If the test reveals a real rendering or interaction bug under realistic conditions, fix the product.
- If the test depends on unstable shared data or external systems, redesign the test setup.
- If the failure disappears when you remove a brittle locator or fixed timeout, fix the test.
- If the failure remains after environment parity and deterministic setup, treat it like an application defect.
That last point matters. Browser automation is valuable precisely because it can expose issues that only appear under actual browser timing and CI-like conditions. Do not suppress those defects just to make the suite green.
A debugging checklist you can reuse
When a Playwright test fails only in headless CI, work through this order:
- Reproduce headless locally.
- Match browser, viewport, locale, and time zone.
- Open the trace and inspect the exact failed action.
- Check console logs, page errors, and network failures.
- Verify whether the locator is unique and semantic.
- Remove fixed sleeps and replace them with state-based waits.
- Compare auth, storage state, and backend data.
- Reduce the test to the smallest reproducible flow.
- Decide whether the root cause is test fragility or a real bug.
This order saves time because it avoids random code changes and focuses on evidence.
A practical example of a brittle versus stable approach
Suppose a test submits a form and then checks for a confirmation message.
A brittle version might look like this:
typescript
await page.getByRole('button', { name: 'Submit' }).click();
await page.waitForTimeout(1000);
await expect(page.locator('.toast')).toContainText('Saved');
A more robust version waits for the actual outcome and uses a more stable selector:
typescript
await Promise.all([
page.waitForResponse(resp => resp.url().includes('/api/profile') && resp.ok()),
page.getByRole('button', { name: 'Submit' }).click()
]);
await expect(page.getByRole(‘status’)).toHaveText(‘Saved’);
The second version expresses intent. It says, “the save request must complete successfully, and the page must announce success.” That is much easier to debug when CI behaves differently than your laptop.
When Playwright is doing its job, not failing it
It is tempting to treat Playwright as the source of instability because the failure appears in the browser test. But Playwright is often just exposing a mismatch between test assumptions and real-world execution.
In comparison with tools like Selenium or Cypress, Playwright tends to make these issues easier to diagnose because it gives you modern tracing, strong auto-waiting, and better browser-context control. But those features do not eliminate root causes such as race conditions, weak selectors, mismatched environments, or backend nondeterminism. They just make the failure more visible.
If you are building a broader testing strategy, it helps to separate concerns:
- Use code-based browser tests for realistic user flows.
- Keep setup deterministic and environment-parity focused.
- Use traces and logs as first-class debugging artifacts.
- Do not rely on interactive observation as proof of stability.
For foundational background on continuous integration, test automation, and software testing, the definitions are broad, but the debugging process here is very concrete: align environments, inspect traces, and eliminate assumptions.
Final takeaway
A Playwright test that passes locally and fails only in headless CI is rarely random. It is usually telling you that the test encodes an assumption about timing, layout, state, or infrastructure that your local run happens to satisfy. The fastest way to fix it is to stop guessing, reproduce CI conditions locally, inspect the trace, and make each assertion reflect a real application state transition.
If you do that consistently, CI-only failures become much less mysterious. They also become much more valuable, because they stop being noisy flake and start being useful signals about the quality of your app and your test suite.