When an end-to-end test passes on your laptop and fails in CI, the instinct is often to blame flakiness. Sometimes that is correct. More often, the test is telling you something precise about the environment, timing, or assumptions baked into the flow.

I treat these failures like a debugging tree, not a guessing game. If you can classify the failure into environment drift, timing, secrets, viewport behavior, container limitations, or browser-specific behavior, you usually find the root cause faster and avoid papering over it with retries.

A test that passes locally and fails in CI is not random by default. It is usually deterministic under a different set of constraints.

This checklist is the one I use when I need to answer a practical question: why do e2e tests fail in CI but pass locally? It is written for QA leads, SDETs, and DevOps engineers who need to turn browser automation debugging into a repeatable process instead of a series of hotfixes.

Start with the simplest split: is this an app problem or a test problem?

Before changing selectors or adding waits, separate the failure into one of two buckets.

1. Does the app behave differently in CI?

If the application itself sees different configuration, data, auth, or network conditions in CI, the test may be correct and the environment may be wrong.

Check:

  • Environment variables
  • Feature flags
  • API endpoints
  • Auth callbacks and redirect URLs
  • Test data availability
  • Network access to external services

2. Does the test behave differently in CI?

If the app is fine, the issue is often in the test harness.

Check:

  • Browser version differences
  • Headless mode behavior
  • Viewport size
  • Wait strategy
  • Locator stability
  • Parallel execution order
  • Docker image differences

A useful discipline is to ask, “If I run the same browser against the same app URL, what exactly changes between local and CI?” If you cannot answer that clearly, you have not narrowed the problem enough.

Checklist 1: Environment mismatch

Environment mismatch is the most common reason CI only failures show up. The application starts, the tests run, but some hidden assumption is false in the pipeline.

Check the base URL and target environment

It sounds obvious, but I still see pipelines pointing at a staging app while the local test points at localhost or a dev stack.

Verify:

  • BASE_URL is the same place you think it is
  • CI is not using a stale .env file
  • PR builds and main branch builds are not pointing to different services without you realizing it
  • Browser tests are not mixing local backend with remote frontend, or vice versa

A quick sanity assertion at the beginning of the suite helps:

import { test, expect } from '@playwright/test';
test('environment sanity', async ({ page }) => {
  await page.goto('/health');
  await expect(page.locator('body')).toContainText('ok');
});

Check feature flags and config drift

If a feature is enabled locally but disabled in CI, the test may fail because the expected UI does not exist.

Common drift sources:

  • Feature flags controlled by remote config
  • Different build-time variables
  • Hard-coded fallbacks in test helpers
  • Multiple app variants deployed from the same repo

For browser automation debugging, log the effective runtime configuration in CI. It is often faster than inspecting the pipeline manually.

Check test data and seed state

Your local database may contain a user, record, or fixture that CI does not.

Questions to ask:

  • Is data created in the test itself, or assumed to exist?
  • Does the test depend on a seed job that can fail silently?
  • Are tests sharing mutable records?
  • Is cleanup racing with another suite?

If a test needs a user with a specific role, create that user inside the test setup or via a deterministic seed. Avoid relying on “someone already made it in the seed script.”

Check auth and secrets

CI often has incomplete secret setup, expired tokens, or different OAuth redirect behavior.

Watch for:

  • Missing API keys
  • Wrong service account scopes
  • Redirect URI mismatch
  • Signed cookies failing because of different secret keys
  • SSO providers rejecting headless browser sessions

If your test uses login by UI, make sure the auth provider supports automated flows in CI. If it uses storage state or token injection, verify the secrets rotation process did not invalidate your saved state.

Checklist 2: Timing issues and synchronization

If the environment is correct, the next suspect is timing. CI machines often run slower, more variably, or under higher contention than your laptop.

Local runs hide timing bugs because your machine is warm, idle, and usually much faster than the average CI worker.

Check for fixed sleeps

A fixed waitForTimeout(2000) may appear to work locally and fail in CI when a page takes 3.5 seconds instead of 1.2.

Prefer event-based waits, state checks, or explicit assertions.

typescript

await page.getByRole('button', { name: 'Save' }).click();
await expect(page.getByText('Saved successfully')).toBeVisible();

Check for missing waits on navigation or XHR

A common bug is clicking a button and immediately asserting on the next page without waiting for the navigation or API call to settle.

In Playwright, wait on the action that matters, not an arbitrary timeout:

typescript

await Promise.all([
  page.waitForURL('**/checkout/complete'),
  page.getByRole('button', { name: 'Submit order' }).click()
]);

In Selenium, wait on a concrete condition instead of sleeping:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

WebDriverWait(driver, 10).until( EC.visibility_of_element_located((By.CSS_SELECTOR, ‘[data-test=”success”]’)) )

Check asynchronous UI updates

Modern frontends often render in multiple passes. A test may find the element before the app finishes updating its text, disabled state, or route.

Watch for:

  • Skeleton screens replaced later
  • Debounced search results
  • Virtualized lists
  • Lazy-loaded widgets
  • Animation-driven state changes

A locator that resolves is not the same thing as a UI state that is ready.

Check for race conditions in your own test setup

Sometimes the test suite creates data and immediately starts another request that depends on the previous operation. Locally, the gap is invisible. In CI, it breaks.

Examples:

  • User creation followed by login before replication is complete
  • Email verification expecting an inbox service that lags
  • Database seeding and UI test running in parallel against the same tenant

If a test depends on setup completion, make the setup step observable and wait for it explicitly.

Checklist 3: Browser differences, viewport, and rendering

Browser automation debugging often gets misdiagnosed as “the selector changed.” Sometimes the selector is fine, but the browser state is different in CI.

Check headless versus headed behavior

Headless browsers may expose timing, focus, and layout differences that you never see in a headed local run.

Look for:

  • Hover menus that require real pointer behavior
  • Focus traps in dialogs
  • File upload flows that behave differently without a visible window
  • Scroll-related interactions that fail when the viewport is smaller

If the test fails only in headless mode, compare the browser flags and viewport settings between environments.

Check viewport size and responsive layout

CI often uses the browser default viewport, which may not match your laptop.

A sidebar might collapse, a button might move into a menu, or a text label might disappear behind a mobile layout breakpoint.

Always set the viewport explicitly when the test depends on layout:

import { defineConfig } from '@playwright/test';

export default defineConfig({ use: { viewport: { width: 1440, height: 900 } } });

In Selenium, use a fixed window size in the test bootstrap or container startup.

Check font, animation, and rendering differences

CI containers may not have the same fonts or GPU acceleration as your local machine. That can change text wrapping, button sizes, and screenshot assertions.

You do not need to eliminate all rendering differences, but you should avoid asserting on brittle visual details unless they are the feature under test.

If you use visual regression checks, make sure the environment is standardized, including fonts, browser version, and scaling.

Checklist 4: Container-specific failures

If your CI runs inside Docker, the container itself may be the issue. This is especially common when the same test passes on a dev laptop but fails in a slim image.

Check shared memory and browser crashes

Chrome and Chromium can fail in containers with low /dev/shm space. That can show up as random browser crashes, tab timeouts, or blank pages.

If the runner is resource constrained, inspect container logs and consider browser launch settings appropriate for CI.

Check CPU and memory pressure

A machine under load may slow event handling enough to expose race conditions.

Symptoms include:

  • Timeouts only on shared runners
  • Intermittent page load delays
  • Browser disconnects
  • Promise or wait timeouts in otherwise stable tests

If a test needs 10 seconds locally and 30 seconds in CI, the fix may be to reduce unnecessary UI work or mock expensive dependencies, not to keep increasing the timeout.

Check filesystem assumptions

Some browser tests write files, read fixtures, or rely on case-sensitive paths.

In containers, common failures are:

  • Missing writable directories
  • Relative path confusion
  • Working directory not what the test expects
  • Differences between Mac, Linux, and Windows path behavior

A brittle file upload test may work locally because the file path exists on your laptop, then fail in Linux CI because the fixture is not copied into the image.

Check time zone and locale

Date and time assertions are a classic source of CI only failures.

Questions:

  • Is CI running in UTC while local runs in your local time zone?
  • Are dates rendered with locale-specific formatting?
  • Does a midnight boundary change the tested date?

Use explicit time zones and avoid asserting on ambiguous date strings.

Checklist 5: Network, services, and third-party dependencies

Many e2e tests are not really isolated browser tests, they are distributed integration tests wearing a browser costume.

Check external service access

The CI environment may block outbound calls, or allow them only intermittently.

Common failures:

  • Payment sandbox unavailable
  • Email service rate limits
  • OAuth provider rejection
  • Maps or analytics calls timing out
  • Webhook callbacks failing in an isolated network

For continuous integration, deterministic tests are easier when external dependencies are mocked or stubbed at the boundary.

Check service readiness

One app service may be ready before another. The UI loads, but API calls return 502 because a downstream service is still starting.

Use health checks or readiness gates in CI, not just “container started” as a signal.

Check DNS and service discovery

Inside CI, service names and ports may differ from local Docker Compose defaults.

If a test passes locally against localhost:3000 but fails in CI against web:3000, verify the networking model and the CI job topology.

Checklist 6: Locators, assertions, and test design

Sometimes the local pass is an accident. The test is weak, and CI simply exposes that weakness.

Check if the locator is too brittle

If your selector relies on a full class chain, a generated ID, or text that changes with data, it may pass locally and fail when the app renders slightly differently in CI.

Prefer user-facing, stable hooks such as roles and test IDs.

typescript

await page.getByRole('button', { name: 'Create project' }).click();
await expect(page.getByTestId('project-list')).toBeVisible();

Check if the assertion is over-specific

Assertions that compare exact text, exact counts, or exact ordering can fail when CI timing changes the render order.

Ask:

  • Does the test care about exact order, or only presence?
  • Is the assertion sensitive to whitespace or formatting?
  • Is the UI still in a transient state when the assertion runs?

Check if the test does too much

Long, end-to-end flows are valuable, but a single test that covers login, setup, billing, and checkout can become impossible to diagnose.

If you cannot tell which step failed, split the flow into smaller checks and keep one or two true end-to-end paths for coverage.

A practical decision tree for CI only failures

When I am debugging a failure, I move through this order:

Step 1, confirm the same commit and same test command

Make sure the local run and CI run are actually comparable.

  • Same branch
  • Same commit
  • Same browser
  • Same environment variables
  • Same test file or pattern

Step 2, compare environment inputs

If the app renders differently, inspect config, secrets, base URL, feature flags, and seeded data.

Step 3, reproduce under CI-like constraints locally

Try to make your machine behave more like CI:

  • Run headless
  • Match browser version
  • Set the same viewport
  • Run inside Docker
  • Limit resources if possible

Step 4, inspect timing and readiness

If the app is correct but the test is early, replace sleeps with explicit waits and wait on state, not time.

Step 5, verify container and browser runtime behavior

If the browser crashes or hangs, inspect logs, memory, /dev/shm, and launch configuration.

Step 6, inspect the test architecture

If the failure still feels random, the test may be too coupled to internal implementation details or external dependencies.

The goal is not to make every test pass at any cost. The goal is to make failures explain themselves.

What to log when the failure happens

Good diagnostics turn a vague CI failure into a concrete bug.

I like to capture:

  • Browser name and version
  • Viewport size
  • Time zone and locale
  • Base URL
  • Environment name
  • Screenshot on failure
  • DOM snapshot or page HTML around the failing step
  • Network errors and console errors
  • Test data identifiers
  • Container resource limits

In Playwright, a simple failure trace setup is often enough to start:

import { defineConfig } from '@playwright/test';

export default defineConfig({ use: { trace: ‘on-first-retry’, screenshot: ‘only-on-failure’, video: ‘retain-on-failure’ } });

In CI, logs are only useful if they are collected consistently and attached to the job output.

A few anti-patterns I avoid

Re-running the same flaky test three times and calling it fixed

Retries can reduce noise, but they hide root cause. If the failure is environment drift or timing, retries are a bandage.

Increasing timeouts everywhere

A bigger timeout is not a diagnosis. It may just make the suite slower while keeping the same underlying race condition.

Sharing state between tests

If one test creates a user and another test assumes that user exists, order dependence will eventually bite you in CI.

Using the UI for setup when an API will do

If you need a precondition, create it through a direct API call or test fixture when possible. This keeps the browser focused on what only the browser can validate.

For general background on software testing and test automation, the broad concepts are helpful, but the real work is in controlling the execution environment and reducing hidden assumptions.

A short checklist you can paste into a ticket

When I file or triage a CI only failure, I try to include answers to these questions:

  • What is the exact local command and the exact CI command?
  • What browser, version, and mode are used in each environment?
  • Are base URL, secrets, and feature flags identical?
  • Does the test depend on seeded data or shared state?
  • Is there any fixed sleep or missing wait around the failure step?
  • Does the failure disappear with a larger viewport or headed mode?
  • Does the test still fail inside a container locally?
  • Are there console errors, network failures, or service readiness problems?
  • Is the assertion too strict for a transient UI state?
  • Can the setup be made deterministic with API calls or fixtures?

If you can answer most of those quickly, the debugging session gets much shorter.

Final thought

The phrase “e2e tests fail in CI but pass locally” usually means the system under test is not identical across environments, or the test depends on behavior that is only stable on a developer machine. Once you start classifying failures by drift, timing, layout, container runtime, and external dependency, the problem becomes much easier to reason about.

In practice, the best fix is rarely a single line of code. It is usually a combination of better test isolation, explicit waits, stricter environment control, and more honest assertions about what the browser flow actually needs to prove.

If you build your checklist around those principles, CI stops being a mysterious source of failures and becomes what it should be, a reliable signal that something changed.