June 14, 2026
Why Playwright Tests Fail Only in CI on Linux but Pass on Mac and Windows
A practical root-cause guide for Playwright tests that pass locally on Mac and Windows but fail in Linux CI, covering fonts, permissions, timing, containers, and browser environment drift.
I have seen this pattern enough times to treat it as a category, not a coincidence: a Playwright test passes on a developer laptop, passes on macOS and Windows, then fails only in Linux CI. The failure message often looks ordinary, a timeout, a missing element, a screenshot mismatch, or a browser crash, but the real problem is usually environment drift. The test is not necessarily wrong, and CI is not necessarily broken. The gap is often in assumptions the test made about fonts, rendering, permissions, timing, file paths, or the browser runtime.
If you are trying to understand why Playwright tests fail only in CI on Linux, the fastest path is to stop asking, “Why does CI hate my test?” and start asking, “What is different about Linux CI that my local machines hide?” That shift usually leads to the real root cause.
For reference, Playwright is designed to drive Chromium, Firefox, and WebKit reliably across platforms, but it still runs inside the reality of the host system and container it is given. The official docs are a good baseline for understanding the framework itself: Playwright introduction.
The most common failure shape
The symptom is usually one of these:
- a locator times out only in CI
- a screenshot or visual assertion differs only on Linux
- a click fails because the element is not visible or not stable
- a download, file upload, or file write behaves differently
- a browser launch fails with missing libraries or sandbox errors
- a test passes alone, but fails when the suite runs in parallel
These are not random. They cluster around differences in the Linux runtime environment, especially when CI runs inside a container or a minimal VM.
If a test behaves differently by operating system, assume the test is coupled to environment details until proven otherwise.
Start with the big picture, not the test code
The fastest debugging mistake is to stare at the test implementation before confirming the execution environment. For cross-platform failures, I usually compare these first:
- browser version
- Playwright version
- Node version
- OS image or container base image
- installed system packages
- available fonts
- locale and time zone
- viewport and device scale factor
- user permissions and sandbox configuration
- CPU and memory limits
A local Mac or Windows machine often has richer defaults, more fonts, a GUI stack, and more forgiving timing. A Linux CI job, especially in Docker, is often stripped down to the minimum needed to start a browser.
The root causes I check first
1. Font differences and text rendering drift
One of the most underestimated causes of cross-platform test failure is fonts. Your assertion may not mention text metrics directly, but the browser layout engine absolutely cares about them. If Linux CI does not have the same fonts as your laptop, text wraps differently, buttons change width, and screenshots shift by a few pixels.
Typical symptoms:
- screenshot diffs that appear only on Linux
- elements shifting under a click target
- locator text matching but the layout changing enough to alter visibility
- component snapshots failing because line breaks changed
This is especially common when the app depends on system fonts such as Arial, Helvetica, Segoe UI, or San Francisco substitutes. Linux images often use fallback fonts unless you install the needed packages.
A practical fix is to standardize the browser environment in CI and make sure the font stack is explicit. In Docker-based CI, that can mean installing fonts and font rendering packages in the image.
bash
apt-get update && apt-get install -y
fonts-liberation
fonts-dejavu-core
fontconfig
If visual stability matters, also compare screenshots at the same viewport size and device scale factor. Do not assume your local default window size matches the CI browser window.
2. Timing differences and slower Linux CI startup
A lot of tests are not flaky because of race conditions in the application alone, they are flaky because they implicitly rely on timing that happens to work on a fast local machine. Linux CI may start more slowly, render differently, or schedule browser and test processes with less predictable CPU availability.
Symptoms:
page.waitForTimeout()seems to “fix” the test locally, but not in CI- a click happens before animation or hydration is complete
- a locator is present but not interactable yet
- the test waits for the wrong condition, such as element count instead of stable visibility
The better fix is to wait for a meaningful state, not just a delay. In Playwright, that usually means waiting for visibility, enabled state, network idle only when appropriate, or a specific application signal.
typescript
await page.getByRole('button', { name: 'Save' }).waitFor({ state: 'visible' })
await page.getByRole('button', { name: 'Save' }).click()
I try to avoid waitForTimeout except when diagnosing a race or dealing with a third-party animation I cannot influence. If the test only works with arbitrary sleeps, the bug is usually in the synchronization strategy.
3. Container defaults and missing system dependencies
Linux CI often runs in Docker or another container runtime. That means your browser is not just running on Linux, it is running in a constrained Linux environment with reduced default packages. Playwright browsers need a working set of OS libraries, and minimal images sometimes omit them.
Symptoms:
- browser launch failures
- segmentation faults or crashes when starting headless Chromium
- fonts or emoji render incorrectly
- downloads, file dialogs, or PDF generation behave oddly
The first step is to use the Playwright-recommended installation path for your environment, rather than hand-picking packages by guesswork. The browser automation stack is sensitive to version and library compatibility, especially in CI.
If your CI image is custom, compare it with a known working Playwright base image, then narrow down differences one by one. I prefer this approach to random package installation because it makes the environment reproducible.
4. Permissions, sandboxing, and filesystem assumptions
Linux CI tends to be stricter about permissions than local machines. Tests that write files, create downloads, read fixtures, or interact with temporary directories can fail when the working directory is read-only, when the user running the process is not what you expect, or when sandbox settings restrict browser behavior.
Common examples:
- downloading a file into a location that does not exist or is not writable
- using relative paths that work from the repo root locally, but fail in CI because the working directory differs
- relying on
/tmpbehavior without checking cleanup or permissions - assuming a browser can access a mounted volume the same way a local process can
A disciplined way to avoid this is to resolve paths explicitly and use test fixtures for filesystem interactions.
import path from 'path'
const downloadPath = path.join(process.cwd(), ‘test-artifacts’, ‘downloads’)
Also check whether your CI runs the browser as root or as a non-root user. Root can mask permission mistakes locally, then the same test fails in a hardened CI environment.
5. Locale, time zone, and date formatting surprises
Linux CI images often default to UTC and a minimal locale set. That becomes a problem when the app renders dates, currency, sorting, or localized labels.
Symptoms:
- date assertions fail by one day because the time zone differs
- month names or number formatting change
- tests that sort visible values differ across operating systems
If your app is user-facing and locale-sensitive, make the test environment deterministic. Set the time zone and locale in the browser context or in the CI container, and make assertions against normalized values where possible.
typescript
const context = await browser.newContext({
locale: 'en-US',
timezoneId: 'UTC'
})
This is not just a test concern. It is a product concern if your user base spans regions.
6. Viewport, scale factor, and headless rendering differences
A test that clicks a button by coordinates or relies on screenshot shape can easily fail when the browser viewport differs. Local Playwright runs may open at one size, while CI uses headless defaults or a smaller window. Linux container environments can also differ in device scale factor and rendering backends.
Symptoms:
- elements appear below the fold in CI
- sticky headers overlap target elements
- screenshot diffs due to a one or two pixel shift
- click interception by overlays that are not visible locally
The fix is to set viewport size explicitly and avoid coordinate-based interactions unless there is no alternative.
typescript
const browser = await chromium.launch()
const context = await browser.newContext({
viewport: { width: 1440, height: 900 },
deviceScaleFactor: 1
})
When a test depends on responsive layout, make that dependency explicit in the test. Do not let it inherit the CI default.
7. Parallelization exposing hidden shared state
Sometimes the Linux failure is not about Linux at all, it is about concurrency. CI often runs a broader matrix, more workers, or more parallel test files than a local laptop session. If tests share accounts, data, files, ports, or cookies, parallel execution can produce failures that only appear under CI load.
Symptoms:
- tests pass individually but fail in a suite
- data from one test leaks into another
- a login state file is overwritten
- a temp port is already in use
The first question I ask is whether the tests are hermetic. If they depend on shared state, make that state isolated per worker or per test. Playwright fixtures help here, but the design principle matters more than the tooling.
A practical debugging workflow
When I debug these failures, I try to reduce the problem systematically instead of guessing.
Step 1: Reproduce the CI environment locally
If possible, run the same container image or base OS locally. This helps separate “Linux behavior” from “CI behavior”. If you cannot reproduce exactly, at least align these variables:
- Node version
- Playwright version
- browser version
- locale and timezone
- viewport size
- environment variables
Step 2: Compare the browser context
Log the runtime details in the failing job. This often reveals a mismatch immediately.
console.log({
userAgent: await page.evaluate(() => navigator.userAgent),
viewport: page.viewportSize(),
locale: await page.evaluate(() => navigator.language),
})
Step 3: Inspect screenshots and traces
Playwright trace artifacts are valuable because they show what the browser saw, not what the test author expected. When a test fails only in CI, capture trace, screenshots, and video, then compare them with local runs. Playwright supports tracing and debugging workflows that are worth using instead of guessing from stack traces alone.
Step 4: Remove brittle assumptions
Look for these patterns in the test:
- hard-coded timeouts
- exact pixel assertions where semantic assertions would do
nth()locators where role or test ids would be better- assumptions about text wrapping or element position
- shared test accounts or global state
Step 5: Binary search the environment difference
If the test only fails in Linux CI, ask what changed between local and CI. Is it the browser? The fonts? The container image? The viewport? The file system? The network? Remove one difference at a time.
The goal is not to make the test pass once. The goal is to identify the one environmental assumption the test was silently making.
Examples of fixes that actually help
Use stable locators
Prefer semantic locators over CSS structure that changes with layout. This reduces sensitivity to font and rendering differences.
typescript
await page.getByRole('button', { name: 'Submit order' }).click()
Pin the test environment
Use a consistent browser and OS image in CI. The more the environment drifts, the harder the failure becomes to reason about.
Install fonts explicitly
If the app uses custom or common desktop fonts, add them to the CI image. Visual tests are only as stable as the rendering stack.
Make waits reflect application state
Wait for UI readiness, not arbitrary milliseconds.
typescript
await expect(page.getByText('Dashboard')).toBeVisible()
Set locale, time zone, and viewport
This removes a large class of platform-specific behavior.
Isolate test data
Give every test its own account, workspace, or seed data when possible. Shared data failures often look like browser issues but are really environment collisions.
A few Linux-specific gotchas I keep seeing
Headless browser sandbox restrictions
Some CI systems use restrictive security profiles. If the browser cannot launch sandboxed, the fix is usually in the container or runner configuration, not the test itself. Be careful here, though, because disabling security mechanisms indiscriminately can hide real problems.
File upload behavior
Upload tests can fail when the path works locally but not in a container, or when a file fixture is missing from the CI checkout. Always verify the artifact is present and the path is resolved from a known base.
Case sensitivity in paths
Windows and macOS can hide case mismatches that Linux exposes. A file named Login.png is not the same as login.png on Linux. This can affect fixtures, snapshots, and imports.
Different default shell and environment variables
A test or CI step that reads environment variables may behave differently because Linux jobs have different defaults than local shells. Do not rely on ambient variables unless the pipeline sets them intentionally.
How I would triage a failing test today
If a Playwright test passes on Mac and Windows but fails only in CI on Linux, I would check these in order:
- Is the browser version identical everywhere?
- Does the CI job use the same viewport and device scale factor?
- Are fonts installed in the CI image?
- Are locale and timezone consistent?
- Is the test relying on arbitrary waits or animation timing?
- Are there permission or path issues with downloads, uploads, or temp files?
- Is the suite running in parallel and sharing state?
- Can I reproduce the failure in the same container locally?
That order usually gets me to the real issue faster than editing the test blindly.
When the test is actually correct and the app is not
Sometimes the test is exposing a genuine platform bug. For example, a responsive component might overflow only under Linux because the font fallback is wider, or a modal might render incorrectly when layout timing changes. In that case, the Playwright failure is useful signal. The fix is not to weaken the test, but to fix the application or make the responsive behavior explicit.
This is why I avoid the habit of calling every CI-only failure a flaky test. Flaky tests do exist, but platform-specific failures often reveal real product defects or real environment assumptions. Both matter.
A simple mental model
If you want one rule of thumb, use this:
If the test passes across browsers but fails across operating systems, suspect environment drift before you suspect Playwright.
That means looking at fonts, permissions, timing, locales, and container defaults, not just selectors.
Closing thought
When Playwright tests fail only in CI on Linux, the issue is usually not “Linux being flaky”. It is the test or application relying on something that local desktop systems provide implicitly. The more your test suite depends on visual layout, browser startup behavior, or filesystem assumptions, the more you need to make those dependencies explicit.
The good news is that these failures are often fixable. Once you standardize the browser environment, remove hidden timing assumptions, and treat CI as a distinct runtime rather than a generic place where tests run, the failure rate drops quickly. More importantly, the tests become better documentation of how the app really behaves.
For deeper background on the broader concepts, it can help to revisit continuous integration, test automation, and the broader practice of software testing. The details change, but the core lesson stays the same, reliable tests need reliable environments.