Why Playwright Tests Fail Only in CI on Linux but Pass on Mac and Windows

I have seen this pattern enough times to treat it as a category, not a coincidence: a Playwright test passes on a developer laptop, passes on macOS and Windows, then fails only in Linux CI. The failure message often looks ordinary, a timeout, a missing element, a screenshot mismatch, or a browser crash, but the real problem is usually environment drift. The test is not necessarily wrong, and CI is not necessarily broken. The gap is often in assumptions the test made about fonts, rendering, permissions, timing, file paths, or the browser runtime.

If you are trying to understand why Playwright tests fail only in CI on Linux, the fastest path is to stop asking, “Why does CI hate my test?” and start asking, “What is different about Linux CI that my local machines hide?” That shift usually leads to the real root cause.

For reference, Playwright is designed to drive Chromium, Firefox, and WebKit reliably across platforms, but it still runs inside the reality of the host system and container it is given. The official docs are a good baseline for understanding the framework itself: Playwright introduction.

The most common failure shape

The symptom is usually one of these:

a locator times out only in CI
a screenshot or visual assertion differs only on Linux
a click fails because the element is not visible or not stable
a download, file upload, or file write behaves differently
a browser launch fails with missing libraries or sandbox errors
a test passes alone, but fails when the suite runs in parallel

These are not random. They cluster around differences in the Linux runtime environment, especially when CI runs inside a container or a minimal VM.

If a test behaves differently by operating system, assume the test is coupled to environment details until proven otherwise.

Start with the big picture, not the test code

The fastest debugging mistake is to stare at the test implementation before confirming the execution environment. For cross-platform failures, I usually compare these first:

browser version
Playwright version
Node version
OS image or container base image
installed system packages
available fonts
locale and time zone
viewport and device scale factor
user permissions and sandbox configuration
CPU and memory limits

A local Mac or Windows machine often has richer defaults, more fonts, a GUI stack, and more forgiving timing. A Linux CI job, especially in Docker, is often stripped down to the minimum needed to start a browser.

The root causes I check first

1. Font differences and text rendering drift

One of the most underestimated causes of cross-platform test failure is fonts. Your assertion may not mention text metrics directly, but the browser layout engine absolutely cares about them. If Linux CI does not have the same fonts as your laptop, text wraps differently, buttons change width, and screenshots shift by a few pixels.

Typical symptoms:

screenshot diffs that appear only on Linux
elements shifting under a click target
locator text matching but the layout changing enough to alter visibility
component snapshots failing because line breaks changed

This is especially common when the app depends on system fonts such as Arial, Helvetica, Segoe UI, or San Francisco substitutes. Linux images often use fallback fonts unless you install the needed packages.

A practical fix is to standardize the browser environment in CI and make sure the font stack is explicit. In Docker-based CI, that can mean installing fonts and font rendering packages in the image.

bash apt-get update && apt-get install -y
fonts-liberation
fonts-dejavu-core
fontconfig

If visual stability matters, also compare screenshots at the same viewport size and device scale factor. Do not assume your local default window size matches the CI browser window.

2. Timing differences and slower Linux CI startup

A lot of tests are not flaky because of race conditions in the application alone, they are flaky because they implicitly rely on timing that happens to work on a fast local machine. Linux CI may start more slowly, render differently, or schedule browser and test processes with less predictable CPU availability.

Symptoms:

page.waitForTimeout() seems to “fix” the test locally, but not in CI
a click happens before animation or hydration is complete
a locator is present but not interactable yet
the test waits for the wrong condition, such as element count instead of stable visibility

The better fix is to wait for a meaningful state, not just a delay. In Playwright, that usually means waiting for visibility, enabled state, network idle only when appropriate, or a specific application signal.

typescript

await page.getByRole('button', { name: 'Save' }).waitFor({ state: 'visible' })
await page.getByRole('button', { name: 'Save' }).click()

I try to avoid waitForTimeout except when diagnosing a race or dealing with a third-party animation I cannot influence. If the test only works with arbitrary sleeps, the bug is usually in the synchronization strategy.

3. Container defaults and missing system dependencies

Linux CI often runs in Docker or another container runtime. That means your browser is not just running on Linux, it is running in a constrained Linux environment with reduced default packages. Playwright browsers need a working set of OS libraries, and minimal images sometimes omit them.

Symptoms:

browser launch failures
segmentation faults or crashes when starting headless Chromium
fonts or emoji render incorrectly
downloads, file dialogs, or PDF generation behave oddly

The first step is to use the Playwright-recommended installation path for your environment, rather than hand-picking packages by guesswork. The browser automation stack is sensitive to version and library compatibility, especially in CI.

If your CI image is custom, compare it with a known working Playwright base image, then narrow down differences one by one. I prefer this approach to random package installation because it makes the environment reproducible.

4. Permissions, sandboxing, and filesystem assumptions

Linux CI tends to be stricter about permissions than local machines. Tests that write files, create downloads, read fixtures, or interact with temporary directories can fail when the working directory is read-only, when the user running the process is not what you expect, or when sandbox settings restrict browser behavior.

Common examples:

downloading a file into a location that does not exist or is not writable
using relative paths that work from the repo root locally, but fail in CI because the working directory differs
relying on /tmp behavior without checking cleanup or permissions
assuming a browser can access a mounted volume the same way a local process can

A disciplined way to avoid this is to resolve paths explicitly and use test fixtures for filesystem interactions.

import path from 'path'

const downloadPath = path.join(process.cwd(), ‘test-artifacts’, ‘downloads’)

Also check whether your CI runs the browser as root or as a non-root user. Root can mask permission mistakes locally, then the same test fails in a hardened CI environment.

5. Locale, time zone, and date formatting surprises

Linux CI images often default to UTC and a minimal locale set. That becomes a problem when the app renders dates, currency, sorting, or localized labels.

Symptoms:

date assertions fail by one day because the time zone differs
month names or number formatting change
tests that sort visible values differ across operating systems

If your app is user-facing and locale-sensitive, make the test environment deterministic. Set the time zone and locale in the browser context or in the CI container, and make assertions against normalized values where possible.

typescript

const context = await browser.newContext({
  locale: 'en-US',
  timezoneId: 'UTC'
})

This is not just a test concern. It is a product concern if your user base spans regions.

6. Viewport, scale factor, and headless rendering differences

A test that clicks a button by coordinates or relies on screenshot shape can easily fail when the browser viewport differs. Local Playwright runs may open at one size, while CI uses headless defaults or a smaller window. Linux container environments can also differ in device scale factor and rendering backends.

Symptoms:

elements appear below the fold in CI
sticky headers overlap target elements
screenshot diffs due to a one or two pixel shift
click interception by overlays that are not visible locally

The fix is to set viewport size explicitly and avoid coordinate-based interactions unless there is no alternative.

typescript

const browser = await chromium.launch()
const context = await browser.newContext({
  viewport: { width: 1440, height: 900 },
  deviceScaleFactor: 1
})

When a test depends on responsive layout, make that dependency explicit in the test. Do not let it inherit the CI default.

7. Parallelization exposing hidden shared state

Sometimes the Linux failure is not about Linux at all, it is about concurrency. CI often runs a broader matrix, more workers, or more parallel test files than a local laptop session. If tests share accounts, data, files, ports, or cookies, parallel execution can produce failures that only appear under CI load.

Symptoms:

tests pass individually but fail in a suite
data from one test leaks into another
a login state file is overwritten
a temp port is already in use

The first question I ask is whether the tests are hermetic. If they depend on shared state, make that state isolated per worker or per test. Playwright fixtures help here, but the design principle matters more than the tooling.

A practical debugging workflow

When I debug these failures, I try to reduce the problem systematically instead of guessing.

Step 1: Reproduce the CI environment locally

If possible, run the same container image or base OS locally. This helps separate “Linux behavior” from “CI behavior”. If you cannot reproduce exactly, at least align these variables:

Node version
Playwright version
browser version
locale and timezone
viewport size
environment variables

Step 2: Compare the browser context

Log the runtime details in the failing job. This often reveals a mismatch immediately.

console.log({
  userAgent: await page.evaluate(() => navigator.userAgent),
  viewport: page.viewportSize(),
  locale: await page.evaluate(() => navigator.language),
})

Step 3: Inspect screenshots and traces

Playwright trace artifacts are valuable because they show what the browser saw, not what the test author expected. When a test fails only in CI, capture trace, screenshots, and video, then compare them with local runs. Playwright supports tracing and debugging workflows that are worth using instead of guessing from stack traces alone.

Step 4: Remove brittle assumptions

Look for these patterns in the test:

hard-coded timeouts
exact pixel assertions where semantic assertions would do
nth() locators where role or test ids would be better
assumptions about text wrapping or element position
shared test accounts or global state

Step 5: Binary search the environment difference

If the test only fails in Linux CI, ask what changed between local and CI. Is it the browser? The fonts? The container image? The viewport? The file system? The network? Remove one difference at a time.

The goal is not to make the test pass once. The goal is to identify the one environmental assumption the test was silently making.

Examples of fixes that actually help

Use stable locators

Prefer semantic locators over CSS structure that changes with layout. This reduces sensitivity to font and rendering differences.

typescript

await page.getByRole('button', { name: 'Submit order' }).click()

Pin the test environment

Use a consistent browser and OS image in CI. The more the environment drifts, the harder the failure becomes to reason about.

Install fonts explicitly

If the app uses custom or common desktop fonts, add them to the CI image. Visual tests are only as stable as the rendering stack.

Make waits reflect application state

Wait for UI readiness, not arbitrary milliseconds.

typescript

await expect(page.getByText('Dashboard')).toBeVisible()

Set locale, time zone, and viewport

This removes a large class of platform-specific behavior.

Isolate test data

Give every test its own account, workspace, or seed data when possible. Shared data failures often look like browser issues but are really environment collisions.

A few Linux-specific gotchas I keep seeing

Headless browser sandbox restrictions

Some CI systems use restrictive security profiles. If the browser cannot launch sandboxed, the fix is usually in the container or runner configuration, not the test itself. Be careful here, though, because disabling security mechanisms indiscriminately can hide real problems.

File upload behavior

Upload tests can fail when the path works locally but not in a container, or when a file fixture is missing from the CI checkout. Always verify the artifact is present and the path is resolved from a known base.

Case sensitivity in paths

Windows and macOS can hide case mismatches that Linux exposes. A file named Login.png is not the same as login.png on Linux. This can affect fixtures, snapshots, and imports.

Different default shell and environment variables

A test or CI step that reads environment variables may behave differently because Linux jobs have different defaults than local shells. Do not rely on ambient variables unless the pipeline sets them intentionally.

How I would triage a failing test today

If a Playwright test passes on Mac and Windows but fails only in CI on Linux, I would check these in order:

Is the browser version identical everywhere?
Does the CI job use the same viewport and device scale factor?
Are fonts installed in the CI image?
Are locale and timezone consistent?
Is the test relying on arbitrary waits or animation timing?
Are there permission or path issues with downloads, uploads, or temp files?
Is the suite running in parallel and sharing state?
Can I reproduce the failure in the same container locally?

That order usually gets me to the real issue faster than editing the test blindly.

When the test is actually correct and the app is not

Sometimes the test is exposing a genuine platform bug. For example, a responsive component might overflow only under Linux because the font fallback is wider, or a modal might render incorrectly when layout timing changes. In that case, the Playwright failure is useful signal. The fix is not to weaken the test, but to fix the application or make the responsive behavior explicit.

This is why I avoid the habit of calling every CI-only failure a flaky test. Flaky tests do exist, but platform-specific failures often reveal real product defects or real environment assumptions. Both matter.

A simple mental model

If you want one rule of thumb, use this:

If the test passes across browsers but fails across operating systems, suspect environment drift before you suspect Playwright.

That means looking at fonts, permissions, timing, locales, and container defaults, not just selectors.

Closing thought

When Playwright tests fail only in CI on Linux, the issue is usually not “Linux being flaky”. It is the test or application relying on something that local desktop systems provide implicitly. The more your test suite depends on visual layout, browser startup behavior, or filesystem assumptions, the more you need to make those dependencies explicit.

The good news is that these failures are often fixable. Once you standardize the browser environment, remove hidden timing assumptions, and treat CI as a distinct runtime rather than a generic place where tests run, the failure rate drops quickly. More importantly, the tests become better documentation of how the app really behaves.

For deeper background on the broader concepts, it can help to revisit continuous integration, test automation, and the broader practice of software testing. The details change, but the core lesson stays the same, reliable tests need reliable environments.