July 5, 2026
How to Debug Playwright Tests That Pass on Chromium but Fail on Firefox or WebKit
A practical guide to debugging Playwright tests that pass on Chromium but fail on Firefox or WebKit, including timing issues, locator drift, focus handling, and browser-specific CSS differences.
Playwright can make cross-browser testing feel straightforward until a test passes reliably in Chromium and then fails in Firefox or WebKit with no obvious code change. That gap usually means the test is leaning on an assumption that Chromium happens to tolerate, while another engine exposes the real problem.
The good news is that these failures are usually diagnosable. The bad news is that they often hide in places that feel mundane, like click timing, focus behavior, CSS layout, text normalization, scrolling, or the way a selector matches a slightly different DOM state. If you have ever stared at a test that passes in headless Chromium and fails in Firefox only on CI, this guide is for you.
This article focuses on the practical side of cross-browser drift debugging: how to isolate browser-specific failures, what to inspect first, and how to harden tests so they stop depending on accidental Chromium behavior.
Why a test can pass in Chromium and fail elsewhere
A browser-specific failure is not automatically a Playwright problem. In many cases, Playwright is doing exactly what it should, and the application or the test is making assumptions that do not hold across engines.
The most common sources of drift are:
- Timing differences, especially around animation, network idle, hydration, and re-rendering
- Focus handling, including keyboard navigation, input activation, and shadow DOM interactions
- CSS and layout differences, particularly around visibility, scrolling, and hit targets
- Selector assumptions, such as exact text, DOM order, or use of brittle CSS paths
- Assertion assumptions, like expecting an element to be visible, enabled, or focused at a moment when that is only true in one engine
- Platform and font differences that change text wrapping, overflow, or computed size
Playwright’s browser coverage is one reason it is popular for cross-browser validation, because the same test can run against Chromium, Firefox, and WebKit from the same API surface. See the Playwright docs for the fundamentals of multi-browser test execution.
A test that only passes in one engine is often not “flaky”, it is revealing an assumption your test or UI made about the browser.
Start by identifying whether the failure is in the app, the test, or the browser interaction
Before changing selectors or sprinkling extra waits, classify the failure.
1. Does the app behave differently, or only the test?
If a user-visible workflow also fails in Firefox or WebKit, the bug may be in the application, not the test. For example:
- A button is covered by a fixed header only in Safari because layout differs
- A focus trap fails in Firefox due to a missing
tabindex - A CSS transition delays interactability in one browser but not another
If the UI itself is wrong, fixing the test only masks the product issue.
2. Is the failure tied to an action or an assertion?
A failing action usually means Playwright could not perform the interaction as expected, for example a click, fill, or press. A failing assertion usually means the interaction succeeded, but the resulting state differed.
That distinction matters:
- Action failures often point to visibility, hit testing, scrolling, or overlay problems
- Assertion failures often point to rendering, timing, or DOM state mismatch
3. Is it browser-specific or environment-specific?
A Firefox-only failure on CI but not locally can involve:
- Different font availability
- Slower machine or container performance
- Missing GPU or OS-specific rendering details
- Headless behavior differences
- Different viewport or device scale factor
Do not assume browser engine is the only variable. Compare browser plus environment.
Reproduce the failure in the simplest possible setup
The best first step is to reduce noise. Keep the same browser, same viewport, same test, and remove unrelated setup where possible.
Run the test against each browser explicitly
import { test, expect } from '@playwright/test';
test('checkout flow', async ({ page, browserName }) => {
await page.goto('/checkout');
await page.getByRole('button', { name: 'Continue' }).click();
await expect(page.getByText('Payment details')).toBeVisible();
console.log(`running in ${browserName}`);
});
When you isolate the browser, the output can reveal whether the failure is engine-specific or a broader state issue.
Capture traces and screenshots for the failing browser
Playwright traces are especially useful because they show DOM snapshots, screenshots, actions, and console output. If a test passes in Chromium but fails in Firefox or WebKit, trace the failing browser, not just the passing one.
A useful habit is to collect traces only on retry or failure in CI. That keeps the suite fast but gives you evidence when something breaks.
// playwright.config.ts
import { defineConfig } from '@playwright/test';
export default defineConfig({ use: { trace: ‘retain-on-failure’, screenshot: ‘only-on-failure’, video: ‘retain-on-failure’ } });
Compare the rendered state, not just the DOM
Cross-browser failures are often visual-state mismatches. An element may exist in the DOM but be offscreen, covered, clipped, or not yet actionable. Inspect:
- Bounding box
- Computed styles
- Focus state
- Scroll position
- Whether another element overlays the target
Playwright can help you introspect these conditions directly.
typescript
const button = page.getByRole('button', { name: 'Continue' });
console.log(await button.isVisible());
console.log(await button.isEnabled());
console.log(await button.evaluate(el => {
const rect = el.getBoundingClientRect();
return { x: rect.x, y: rect.y, width: rect.width, height: rect.height };
}));
Timing drift is the first thing to suspect
Many tests that pass on Chromium but fail on Firefox are actually timing problems.
Chromium may render faster, batch events differently, or settle the DOM in a sequence that makes your test appear stable. Firefox or WebKit might expose a race condition that was always there.
Common timing patterns that break cross-browser tests
Waiting for the wrong thing
A classic mistake is waiting for network idle or a generic timeout instead of waiting for the real UI condition.
Bad pattern:
typescript
await page.waitForTimeout(1000);
await page.getByRole('button', { name: 'Save' }).click();
Better pattern:
typescript
await expect(page.getByRole('button', { name: 'Save' })).toBeEnabled();
await page.getByRole('button', { name: 'Save' }).click();
Racing hydration or client-side rendering
If the page server-renders markup and then hydrates it, Chromium may settle quickly enough that the test appears stable. Firefox might expose a moment when the target exists but is not yet wired up.
Symptoms include:
- Clicks that do nothing
- Text that changes after the assertion runs
- Intermittent “element is detached” failures
The fix is usually to wait for a UI condition tied to the app, not an arbitrary delay.
Assuming DOM order is stable
If a test uses a locator that depends on order, such as the first button in a list, browser-driven layout or async rendering can change which element you hit.
Prefer semantic locators that target the actual control, not the incidental position.
Focus handling can diverge more than people expect
Firefox-only test failures frequently involve focus because keyboard and click interactions can differ in subtle ways.
Look for these symptoms
press()targets the wrong elementfill()works, but subsequent keyboard actions land elsewhere- A dialog or menu opens in Chromium but not in Firefox
- A test expects an input to be focused, but the browser places focus on a wrapper or body element
Why focus drift happens
Browsers do not always agree on when focus shifts after:
- Programmatic click events
- Mouse down versus mouse up behavior
- Custom controls built from
divelements - Shadow DOM boundaries
- Disabled or partially disabled controls
For example, a custom button that looks clickable may work in Chromium because the event chain happens to align with your handler, while Firefox requires a more explicit accessible role or a stronger focus management implementation.
Debug focus directly
typescript
const active = await page.evaluate(() => document.activeElement?.outerHTML);
console.log(active);
You can also check a specific element’s focus state:
typescript
await expect(page.getByRole('textbox', { name: 'Email' })).toBeFocused();
If focus is missing in only one browser, inspect the component code. Common fixes include adding proper label associations, using native controls instead of div-based controls, and avoiding brittle custom keyboard logic.
CSS and layout differences are a major source of WebKit failures
WebKit failures often point to layout assumptions. Safari-style rendering can differ from Chromium in ways that affect interactability and text assertions.
What to inspect first
Overlays and sticky elements
A click can fail if the target is technically visible but covered by another element. This can happen when:
- A sticky header overlaps the target at one viewport height
- An animation keeps a modal backdrop on screen longer in one browser
- A tooltip or toast steals pointer events
Use Playwright’s trial action or inspect hit target behavior.
typescript
await page.getByRole('button', { name: 'Continue' }).click({ trial: true });
If the trial fails, you know the interaction is not safe yet.
Text wrapping and overflow
Different font metrics and text layout can change line wrapping, making assertions about visibility or size unreliable.
This matters if your test says:
- A label should be visible in one line
- A card should contain a specific substring in view
- A menu should not overflow the viewport
Instead of asserting pixel-perfect text layout, assert the functional result.
Position-based clicking
Avoid coordinates unless you are intentionally testing pointer geometry. If a browser-specific failure only happens when clicking at a point, you may have a layout or overlay issue, not a locator issue.
Selector drift is often the hidden culprit
If a test works in Chromium and fails in Firefox or WebKit, it may be because the selector is too loose or too dependent on DOM shape.
Prefer user-facing locators
Playwright’s role and text locators usually produce more stable cross-browser tests than CSS selectors based on implementation details.
typescript
await page.getByRole('button', { name: 'Submit order' }).click();
This is usually more robust than:
typescript
await page.locator('.checkout-form > div:nth-child(3) > button').click();
Watch out for exact text assumptions
Exact text assertions can fail when browsers normalize whitespace differently, when fonts reflow text, or when hidden text appears in the accessible tree.
If you are checking copy, consider whether the assertion is too rigid. Often, the real requirement is that the page contains the correct action or status, not that the text node is byte-for-byte identical.
Make hidden state explicit
If a test selects the first visible item and only Chromium happens to render the right one, the locator is hiding a problem.
Good debugging trick:
typescript
const items = page.getByRole('listitem');
console.log(await items.count());
for (let i = 0; i < await items.count(); i++) {
console.log(await items.nth(i).textContent());
}
This helps reveal whether the browser is surfacing a different DOM structure or accessible tree.
Don’t confuse actionability with readiness
Playwright waits for elements to be actionable, but your app may still not be ready in the way the test assumes.
Example: button is enabled too early
A button can become clickable before the underlying request is finished or before validation has populated. Chromium may happen to schedule things so the click works, but Firefox may expose the gap.
If your app has its own loading state, wait for that state to clear.
typescript
await expect(page.getByTestId('save-spinner')).toBeHidden();
await page.getByRole('button', { name: 'Save' }).click();
Example: menu opens, but animation is still running
The menu exists in the DOM, but the animation leaves it partially transparent or not yet hit-testable. Do not assert only on presence if the user needs it to be interactable.
Use assertions that match the actual user task, for example visibility plus enabled state, or a successful click on the intended control.
Browser-specific failures often expose test data problems
Sometimes the bug is in your fixture data or mock responses.
Check for these data-related pitfalls
- Missing translation keys that only some browsers surface due to different code paths
- Image or font loading differences that affect rendering
- API payloads that rely on object ordering or undefined property handling
- Locale-sensitive parsing, such as dates and numbers
If your test uses mocked JSON, validate that the response is equally complete for all browsers. If the UI depends on locale or timezone, standardize them in the test environment.
Use browser comparison as a debugging technique, not just a test target
When a test fails only on Firefox or WebKit, it helps to compare the same interaction across browsers and look for the first divergence.
A useful debugging checklist
- Does the page load the same data in each browser?
- Does the DOM tree differ meaningfully?
- Is the target visible in the same place?
- Does focus land on the same element?
- Is the failing assertion too strict?
- Is there a timing dependency on animation or network?
- Is the locator selecting the intended control?
You can print browser name and use it to branch debugging output without changing the test logic.
import { test } from '@playwright/test';
test('debug browser-specific behavior', async ({ page, browserName }) => {
console.log(`browser = ${browserName}`);
await page.goto('/settings');
console.log(await page.locator('body').innerText());
});
Cross-browser drift debugging patterns that actually help
Here are the patterns that usually save time.
1. Replace arbitrary waits with state-based waits
If a failure disappears after adding waitForTimeout, the test is still wrong. The timeout only hides the race.
Instead, wait for a deterministic UI condition, like a button becoming enabled, a status badge changing, or a modal becoming visible.
2. Use semantic selectors
Prefer getByRole, getByLabel, and getByText when appropriate. They survive DOM refactors better than CSS chains and are less likely to differ by browser rendering order.
3. Assert the user outcome, not an implementation detail
Avoid asserting that a specific DOM node exists if the important thing is that the user can complete a task. Assertions tied to internal structure are brittle across engines.
4. Inspect focus and active element state
If keyboard-driven tests behave differently in Firefox, examine focus on each step. Custom components are frequent offenders.
5. Compare computed layout for suspicious elements
A quick inspection of bounding boxes and computed styles can reveal clipping, overflow, or unexpected display changes.
6. Re-run with a narrowed viewport or default viewport
Some failures only appear because the target is hidden by responsive layout. Viewport-sensitive bugs can masquerade as browser-specific failures.
7. Confirm the same browser channel and version in CI and local runs
A mismatch between local stable browsers and CI-installed browsers can create confusing false differences. In continuous integration, keep browser versions and test environment as consistent as possible. For background on CI practices, see continuous integration.
What to change in the test code once you find the issue
When you have identified the root cause, fix the test so it no longer depends on accidental browser behavior.
Good fixes
- Use accessible locators instead of structural selectors
- Wait for functional readiness, not arbitrary time
- Assert visible, enabled, and focused states only when they matter to the user flow
- Reduce reliance on coordinate clicks
- Make test data deterministic
- Add explicit waits around known async transitions in the app, not inside every test step
Bad fixes
- Increasing timeouts everywhere
- Retrying the same flaky step without diagnosing it
- Disabling the browser where the test fails
- Adding
force: trueto clicks just to make the suite green - Converting a user-level test into an implementation-level test because it is easier to stabilize
A force click can be useful in rare cases, but if the browser says the element is not actionable, that often means the user would also struggle.
A small example of tightening a fragile test
A brittle test might look like this:
typescript
await page.waitForTimeout(500);
await page.locator('.card button').click();
await expect(page.locator('.toast')).toContainText('Saved');
A more stable version is usually closer to:
typescript
const saveButton = page.getByRole('button', { name: 'Save' });
await expect(saveButton).toBeEnabled();
await saveButton.click();
await expect(page.getByRole('status')).toHaveText(/saved/i);
The second version is better because it communicates intent, waits on a meaningful state, and is less sensitive to layout or browser-specific DOM details.
When the problem is the app, not the test
Sometimes the most important result of debugging is discovering a product bug. A test that fails only in Firefox or WebKit can reveal genuine browser compatibility work.
Common app issues include:
- Custom focus management that assumes Chromium event ordering
- CSS that depends on nonstandard layout behavior
- Event handlers that miss pointer or keyboard edge cases
- JavaScript that reads layout too early after render
- Feature use that is not equally supported across engines
If the same flow fails for a human user in one browser, the right action is to fix the app and keep the test as a regression guard.
A practical triage workflow for cross-browser drift
If you need a repeatable process, use this order:
- Reproduce the failure in the exact browser that fails
- Compare browser state at the failure point, not just final output
- Determine whether the action failed or the assertion failed
- Inspect focus, overlays, and computed layout
- Replace brittle locators and arbitrary waits
- Confirm whether the issue exists in the product itself
- Lock in a regression test once the root cause is fixed
This workflow avoids the common trap of treating cross-browser failures as random flakiness. They are usually deterministic once you inspect the right state.
A few rules of thumb for preventing future browser drift
- Use native HTML controls when possible, because they carry the most consistent browser semantics
- Make loading and saving states explicit in the UI, so tests can wait on them cleanly
- Keep selectors aligned with accessibility, not DOM implementation details
- Standardize test environments in CI, including viewport, locale, and timezone where relevant
- Run important flows in more than one engine before merging browser-sensitive UI changes
If a test depends on Chromium being forgiving, it is already weaker than it looks.
Related reading
If you want to dig deeper into how Playwright structures browser execution and test isolation, the official Playwright documentation is the best starting point. For the broader context of why multi-browser testing exists, the ideas behind software testing and test automation are useful background.
Final takeaway
When Playwright tests pass on Chromium but fail on Firefox or WebKit, the fastest path forward is not guessing. Classify the failure, reproduce it in the failing engine, inspect focus and layout state, and look for hidden assumptions in selectors or assertions. In most cases, the browser difference is not the real bug, it is the signal that your test or UI relied on behavior one engine happened to tolerate.
The best cross-browser tests are not the ones with the most waits or the least strict assertions. They are the ones that encode the user’s real intent and make as few assumptions as possible about how Chromium, Firefox, or WebKit happen to get there.