How to Debug Playwright Tests That Pass on Chromium but Fail on Firefox or WebKit

Playwright can make cross-browser testing feel straightforward until a test passes reliably in Chromium and then fails in Firefox or WebKit with no obvious code change. That gap usually means the test is leaning on an assumption that Chromium happens to tolerate, while another engine exposes the real problem.

The good news is that these failures are usually diagnosable. The bad news is that they often hide in places that feel mundane, like click timing, focus behavior, CSS layout, text normalization, scrolling, or the way a selector matches a slightly different DOM state. If you have ever stared at a test that passes in headless Chromium and fails in Firefox only on CI, this guide is for you.

This article focuses on the practical side of cross-browser drift debugging: how to isolate browser-specific failures, what to inspect first, and how to harden tests so they stop depending on accidental Chromium behavior.

Why a test can pass in Chromium and fail elsewhere

A browser-specific failure is not automatically a Playwright problem. In many cases, Playwright is doing exactly what it should, and the application or the test is making assumptions that do not hold across engines.

The most common sources of drift are:

Timing differences, especially around animation, network idle, hydration, and re-rendering
Focus handling, including keyboard navigation, input activation, and shadow DOM interactions
CSS and layout differences, particularly around visibility, scrolling, and hit targets
Selector assumptions, such as exact text, DOM order, or use of brittle CSS paths
Assertion assumptions, like expecting an element to be visible, enabled, or focused at a moment when that is only true in one engine
Platform and font differences that change text wrapping, overflow, or computed size

Playwright’s browser coverage is one reason it is popular for cross-browser validation, because the same test can run against Chromium, Firefox, and WebKit from the same API surface. See the Playwright docs for the fundamentals of multi-browser test execution.

A test that only passes in one engine is often not “flaky”, it is revealing an assumption your test or UI made about the browser.

Start by identifying whether the failure is in the app, the test, or the browser interaction

Before changing selectors or sprinkling extra waits, classify the failure.

1. Does the app behave differently, or only the test?

If a user-visible workflow also fails in Firefox or WebKit, the bug may be in the application, not the test. For example:

A button is covered by a fixed header only in Safari because layout differs
A focus trap fails in Firefox due to a missing tabindex
A CSS transition delays interactability in one browser but not another

If the UI itself is wrong, fixing the test only masks the product issue.

2. Is the failure tied to an action or an assertion?

A failing action usually means Playwright could not perform the interaction as expected, for example a click, fill, or press. A failing assertion usually means the interaction succeeded, but the resulting state differed.

That distinction matters:

Action failures often point to visibility, hit testing, scrolling, or overlay problems
Assertion failures often point to rendering, timing, or DOM state mismatch

3. Is it browser-specific or environment-specific?

A Firefox-only failure on CI but not locally can involve:

Different font availability
Slower machine or container performance
Missing GPU or OS-specific rendering details
Headless behavior differences
Different viewport or device scale factor

Do not assume browser engine is the only variable. Compare browser plus environment.

Reproduce the failure in the simplest possible setup

The best first step is to reduce noise. Keep the same browser, same viewport, same test, and remove unrelated setup where possible.

Run the test against each browser explicitly

import { test, expect } from '@playwright/test';

test('checkout flow', async ({ page, browserName }) => {
  await page.goto('/checkout');
  await page.getByRole('button', { name: 'Continue' }).click();
  await expect(page.getByText('Payment details')).toBeVisible();
  console.log(`running in ${browserName}`);
});

When you isolate the browser, the output can reveal whether the failure is engine-specific or a broader state issue.

Capture traces and screenshots for the failing browser

Playwright traces are especially useful because they show DOM snapshots, screenshots, actions, and console output. If a test passes in Chromium but fails in Firefox or WebKit, trace the failing browser, not just the passing one.

A useful habit is to collect traces only on retry or failure in CI. That keeps the suite fast but gives you evidence when something breaks.

// playwright.config.ts
import { defineConfig } from '@playwright/test';

export default defineConfig({ use: { trace: ‘retain-on-failure’, screenshot: ‘only-on-failure’, video: ‘retain-on-failure’ } });

Compare the rendered state, not just the DOM

Cross-browser failures are often visual-state mismatches. An element may exist in the DOM but be offscreen, covered, clipped, or not yet actionable. Inspect:

Bounding box
Computed styles
Focus state
Scroll position
Whether another element overlays the target

Playwright can help you introspect these conditions directly.

typescript

const button = page.getByRole('button', { name: 'Continue' });
console.log(await button.isVisible());
console.log(await button.isEnabled());
console.log(await button.evaluate(el => {
  const rect = el.getBoundingClientRect();
  return { x: rect.x, y: rect.y, width: rect.width, height: rect.height };
}));

Timing drift is the first thing to suspect

Many tests that pass on Chromium but fail on Firefox are actually timing problems.

Chromium may render faster, batch events differently, or settle the DOM in a sequence that makes your test appear stable. Firefox or WebKit might expose a race condition that was always there.

Common timing patterns that break cross-browser tests

Waiting for the wrong thing

A classic mistake is waiting for network idle or a generic timeout instead of waiting for the real UI condition.

Bad pattern:

typescript

await page.waitForTimeout(1000);
await page.getByRole('button', { name: 'Save' }).click();

Better pattern:

typescript

await expect(page.getByRole('button', { name: 'Save' })).toBeEnabled();
await page.getByRole('button', { name: 'Save' }).click();

Racing hydration or client-side rendering

If the page server-renders markup and then hydrates it, Chromium may settle quickly enough that the test appears stable. Firefox might expose a moment when the target exists but is not yet wired up.

Symptoms include:

Clicks that do nothing
Text that changes after the assertion runs
Intermittent “element is detached” failures

The fix is usually to wait for a UI condition tied to the app, not an arbitrary delay.

Assuming DOM order is stable

If a test uses a locator that depends on order, such as the first button in a list, browser-driven layout or async rendering can change which element you hit.

Prefer semantic locators that target the actual control, not the incidental position.

Focus handling can diverge more than people expect

Firefox-only test failures frequently involve focus because keyboard and click interactions can differ in subtle ways.

Look for these symptoms

press() targets the wrong element
fill() works, but subsequent keyboard actions land elsewhere
A dialog or menu opens in Chromium but not in Firefox
A test expects an input to be focused, but the browser places focus on a wrapper or body element

Why focus drift happens

Browsers do not always agree on when focus shifts after:

Programmatic click events
Mouse down versus mouse up behavior
Custom controls built from div elements
Shadow DOM boundaries
Disabled or partially disabled controls

For example, a custom button that looks clickable may work in Chromium because the event chain happens to align with your handler, while Firefox requires a more explicit accessible role or a stronger focus management implementation.

Debug focus directly

typescript

const active = await page.evaluate(() => document.activeElement?.outerHTML);
console.log(active);

You can also check a specific element’s focus state:

typescript

await expect(page.getByRole('textbox', { name: 'Email' })).toBeFocused();

If focus is missing in only one browser, inspect the component code. Common fixes include adding proper label associations, using native controls instead of div-based controls, and avoiding brittle custom keyboard logic.

CSS and layout differences are a major source of WebKit failures

WebKit failures often point to layout assumptions. Safari-style rendering can differ from Chromium in ways that affect interactability and text assertions.

What to inspect first

Overlays and sticky elements

A click can fail if the target is technically visible but covered by another element. This can happen when:

A sticky header overlaps the target at one viewport height
An animation keeps a modal backdrop on screen longer in one browser
A tooltip or toast steals pointer events

Use Playwright’s trial action or inspect hit target behavior.

typescript

await page.getByRole('button', { name: 'Continue' }).click({ trial: true });

If the trial fails, you know the interaction is not safe yet.

Text wrapping and overflow

Different font metrics and text layout can change line wrapping, making assertions about visibility or size unreliable.

This matters if your test says:

A label should be visible in one line
A card should contain a specific substring in view
A menu should not overflow the viewport

Instead of asserting pixel-perfect text layout, assert the functional result.

Position-based clicking

Avoid coordinates unless you are intentionally testing pointer geometry. If a browser-specific failure only happens when clicking at a point, you may have a layout or overlay issue, not a locator issue.

Selector drift is often the hidden culprit

If a test works in Chromium and fails in Firefox or WebKit, it may be because the selector is too loose or too dependent on DOM shape.

Prefer user-facing locators

Playwright’s role and text locators usually produce more stable cross-browser tests than CSS selectors based on implementation details.

typescript

await page.getByRole('button', { name: 'Submit order' }).click();

This is usually more robust than:

typescript

await page.locator('.checkout-form > div:nth-child(3) > button').click();

Watch out for exact text assumptions

Exact text assertions can fail when browsers normalize whitespace differently, when fonts reflow text, or when hidden text appears in the accessible tree.

If you are checking copy, consider whether the assertion is too rigid. Often, the real requirement is that the page contains the correct action or status, not that the text node is byte-for-byte identical.

Make hidden state explicit

If a test selects the first visible item and only Chromium happens to render the right one, the locator is hiding a problem.

Good debugging trick:

typescript

const items = page.getByRole('listitem');
console.log(await items.count());
for (let i = 0; i < await items.count(); i++) {
  console.log(await items.nth(i).textContent());
}

This helps reveal whether the browser is surfacing a different DOM structure or accessible tree.

Don’t confuse actionability with readiness

Playwright waits for elements to be actionable, but your app may still not be ready in the way the test assumes.

Example: button is enabled too early

A button can become clickable before the underlying request is finished or before validation has populated. Chromium may happen to schedule things so the click works, but Firefox may expose the gap.

If your app has its own loading state, wait for that state to clear.

typescript

await expect(page.getByTestId('save-spinner')).toBeHidden();
await page.getByRole('button', { name: 'Save' }).click();

The menu exists in the DOM, but the animation leaves it partially transparent or not yet hit-testable. Do not assert only on presence if the user needs it to be interactable.

Use assertions that match the actual user task, for example visibility plus enabled state, or a successful click on the intended control.

Browser-specific failures often expose test data problems

Sometimes the bug is in your fixture data or mock responses.

Missing translation keys that only some browsers surface due to different code paths
Image or font loading differences that affect rendering
API payloads that rely on object ordering or undefined property handling
Locale-sensitive parsing, such as dates and numbers

If your test uses mocked JSON, validate that the response is equally complete for all browsers. If the UI depends on locale or timezone, standardize them in the test environment.

Use browser comparison as a debugging technique, not just a test target

When a test fails only on Firefox or WebKit, it helps to compare the same interaction across browsers and look for the first divergence.

A useful debugging checklist

Does the page load the same data in each browser?
Does the DOM tree differ meaningfully?
Is the target visible in the same place?
Does focus land on the same element?
Is the failing assertion too strict?
Is there a timing dependency on animation or network?
Is the locator selecting the intended control?

You can print browser name and use it to branch debugging output without changing the test logic.

import { test } from '@playwright/test';

test('debug browser-specific behavior', async ({ page, browserName }) => {
  console.log(`browser = ${browserName}`);
  await page.goto('/settings');
  console.log(await page.locator('body').innerText());
});

Cross-browser drift debugging patterns that actually help

Here are the patterns that usually save time.

1. Replace arbitrary waits with state-based waits

If a failure disappears after adding waitForTimeout, the test is still wrong. The timeout only hides the race.

Instead, wait for a deterministic UI condition, like a button becoming enabled, a status badge changing, or a modal becoming visible.

2. Use semantic selectors

Prefer getByRole, getByLabel, and getByText when appropriate. They survive DOM refactors better than CSS chains and are less likely to differ by browser rendering order.

3. Assert the user outcome, not an implementation detail

Avoid asserting that a specific DOM node exists if the important thing is that the user can complete a task. Assertions tied to internal structure are brittle across engines.

4. Inspect focus and active element state

If keyboard-driven tests behave differently in Firefox, examine focus on each step. Custom components are frequent offenders.

5. Compare computed layout for suspicious elements

A quick inspection of bounding boxes and computed styles can reveal clipping, overflow, or unexpected display changes.

6. Re-run with a narrowed viewport or default viewport

Some failures only appear because the target is hidden by responsive layout. Viewport-sensitive bugs can masquerade as browser-specific failures.

7. Confirm the same browser channel and version in CI and local runs

A mismatch between local stable browsers and CI-installed browsers can create confusing false differences. In continuous integration, keep browser versions and test environment as consistent as possible. For background on CI practices, see continuous integration.

What to change in the test code once you find the issue

When you have identified the root cause, fix the test so it no longer depends on accidental browser behavior.

Good fixes

Use accessible locators instead of structural selectors
Wait for functional readiness, not arbitrary time
Assert visible, enabled, and focused states only when they matter to the user flow
Reduce reliance on coordinate clicks
Make test data deterministic
Add explicit waits around known async transitions in the app, not inside every test step

Bad fixes

Increasing timeouts everywhere
Retrying the same flaky step without diagnosing it
Disabling the browser where the test fails
Adding force: true to clicks just to make the suite green
Converting a user-level test into an implementation-level test because it is easier to stabilize

A force click can be useful in rare cases, but if the browser says the element is not actionable, that often means the user would also struggle.

A small example of tightening a fragile test

A brittle test might look like this:

typescript

await page.waitForTimeout(500);
await page.locator('.card button').click();
await expect(page.locator('.toast')).toContainText('Saved');

A more stable version is usually closer to:

typescript

const saveButton = page.getByRole('button', { name: 'Save' });
await expect(saveButton).toBeEnabled();
await saveButton.click();
await expect(page.getByRole('status')).toHaveText(/saved/i);

The second version is better because it communicates intent, waits on a meaningful state, and is less sensitive to layout or browser-specific DOM details.

When the problem is the app, not the test

Sometimes the most important result of debugging is discovering a product bug. A test that fails only in Firefox or WebKit can reveal genuine browser compatibility work.

Common app issues include:

Custom focus management that assumes Chromium event ordering
CSS that depends on nonstandard layout behavior
Event handlers that miss pointer or keyboard edge cases
JavaScript that reads layout too early after render
Feature use that is not equally supported across engines

If the same flow fails for a human user in one browser, the right action is to fix the app and keep the test as a regression guard.

A practical triage workflow for cross-browser drift

If you need a repeatable process, use this order:

Reproduce the failure in the exact browser that fails
Compare browser state at the failure point, not just final output
Determine whether the action failed or the assertion failed
Inspect focus, overlays, and computed layout
Replace brittle locators and arbitrary waits
Confirm whether the issue exists in the product itself
Lock in a regression test once the root cause is fixed

This workflow avoids the common trap of treating cross-browser failures as random flakiness. They are usually deterministic once you inspect the right state.

A few rules of thumb for preventing future browser drift

Use native HTML controls when possible, because they carry the most consistent browser semantics
Make loading and saving states explicit in the UI, so tests can wait on them cleanly
Keep selectors aligned with accessibility, not DOM implementation details
Standardize test environments in CI, including viewport, locale, and timezone where relevant
Run important flows in more than one engine before merging browser-sensitive UI changes

If a test depends on Chromium being forgiving, it is already weaker than it looks.

If you want to dig deeper into how Playwright structures browser execution and test isolation, the official Playwright documentation is the best starting point. For the broader context of why multi-browser testing exists, the ideas behind software testing and test automation are useful background.

Final takeaway

When Playwright tests pass on Chromium but fail on Firefox or WebKit, the fastest path forward is not guessing. Classify the failure, reproduce it in the failing engine, inspect focus and layout state, and look for hidden assumptions in selectors or assertions. In most cases, the browser difference is not the real bug, it is the signal that your test or UI relied on behavior one engine happened to tolerate.

The best cross-browser tests are not the ones with the most waits or the least strict assertions. They are the ones that encode the user’s real intent and make as few assumptions as possible about how Chromium, Firefox, or WebKit happen to get there.

Why a test can pass in Chromium and fail elsewhere

Start by identifying whether the failure is in the app, the test, or the browser interaction

1. Does the app behave differently, or only the test?

2. Is the failure tied to an action or an assertion?

3. Is it browser-specific or environment-specific?

Reproduce the failure in the simplest possible setup

Run the test against each browser explicitly

Capture traces and screenshots for the failing browser

Compare the rendered state, not just the DOM

Timing drift is the first thing to suspect

Common timing patterns that break cross-browser tests

Waiting for the wrong thing

Racing hydration or client-side rendering

Assuming DOM order is stable

Focus handling can diverge more than people expect

Look for these symptoms

Why focus drift happens

Debug focus directly

CSS and layout differences are a major source of WebKit failures

What to inspect first

Overlays and sticky elements

Text wrapping and overflow

Position-based clicking

Selector drift is often the hidden culprit

Prefer user-facing locators

Watch out for exact text assumptions

Make hidden state explicit

Don’t confuse actionability with readiness

Example: button is enabled too early

Example: menu opens, but animation is still running

Browser-specific failures often expose test data problems

Check for these data-related pitfalls

Use browser comparison as a debugging technique, not just a test target

A useful debugging checklist

Cross-browser drift debugging patterns that actually help

1. Replace arbitrary waits with state-based waits

2. Use semantic selectors

3. Assert the user outcome, not an implementation detail

4. Inspect focus and active element state

5. Compare computed layout for suspicious elements

6. Re-run with a narrowed viewport or default viewport

7. Confirm the same browser channel and version in CI and local runs

What to change in the test code once you find the issue

Good fixes

Bad fixes

A small example of tightening a fragile test

When the problem is the app, not the test

A practical triage workflow for cross-browser drift

A few rules of thumb for preventing future browser drift

Related reading

Final takeaway