How to Debug Playwright Tests That Pass Locally but Fail After GitHub Actions Cache Changes

If a Playwright suite runs green on your laptop but starts failing right after a GitHub Actions cache change, the problem is usually not Playwright itself. It is almost always some form of environment drift, hidden dependency reuse, or a build artifact that stopped being as deterministic as you thought it was.

I have seen this pattern enough times to treat it as a specific class of CI failure, not just a flaky test. The interesting part is that the cache change is often only the trigger. The real root cause may be stale browser binaries, an updated lockfile, a restored node_modules tree from a different branch, a reused build output, or a test that accidentally depends on generated state from a previous run.

When a test starts failing only after a cache strategy changes, assume the cache is exposing a pre-existing reproducibility problem, not creating one from scratch.

This guide is about debugging that problem systematically. The goal is not to retry your way out of it. The goal is to identify what changed, what got reused, and which layer of the pipeline is no longer deterministic.

What usually changes when GitHub Actions cache changes

GitHub Actions caching can affect several parts of your pipeline, and failures often come from one of these layers:

node_modules or package manager cache reuse
Playwright browser binaries cache reuse
build artifact reuse, for example dist, .next, or compiled test fixtures
dependency installation behavior changing because the lockfile or package manager version changed
test data or generated files persisting between jobs or steps

The tricky part is that cache changes are not always obvious. You may have edited a cache key, switched from npm to pnpm-style caching, added a path, removed a path, or changed the order of restore keys. That can cause one branch to restore a slightly different environment than another branch, even though the workflow file looks correct at first glance.

For Playwright specifically, the common failure modes are:

browser version mismatch between install time and test time
a missing or partial browser install in the cache
code under test built against one dependency set, but the tests executed against another
an old artifact or generated fixture being reused after source changes
test timing changing because the CI environment is now subtly different

If you use Playwright in CI, the official docs are still the best reference for the expected install and execution flow: Playwright documentation.

Start by proving what actually changed

Before modifying tests, I try to answer three questions:

Did the code under test change, or just the CI environment?
Did the installed dependencies change, or just the cache key?
Did the test run from fresh source, or did it reuse build output from a previous state?

The fastest way to get clarity is to print the exact environment into the job logs.

- name: Print versions
  run: |
    node --version
    npm --version
    npx playwright --version
    git rev-parse --short HEAD
    git status --short

If you are using pnpm or yarn, print those versions too. A lot of CI cache issues are really package manager issues, because package manager behavior changed while the workflow stayed the same.

You should also record the cache hit behavior. GitHub Actions exposes cache outcome metadata, and the workflow logs tell you whether the cache was restored, partially restored, or missed entirely. Read the cache step output carefully, not just the green checkmark.

GitHub Actions documentation explains the caching model and workflow primitives here: GitHub Actions docs.

Build a reproducibility map

When I debug this class of failure, I divide the pipeline into four zones:

1. Dependency resolution

This is where package versions are installed from the lockfile. If your cache restores node_modules, you may bypass normal resolution behavior and keep stale packages longer than intended.

2. Browser provisioning

Playwright installs browser binaries separately from your npm dependencies. If this cache is stale or incomplete, your test runner may use browsers that do not match your code or the Playwright version.

3. Application build output

If you run tests against a built app, any cached build directory can hide missing source changes or stale environment variables.

4. Runtime artifacts

This includes screenshots, traces, videos, generated test data, temp files, and server state. Artifacts should be treated as disposable unless you have explicitly designed them to persist.

Once you know which zone is failing, debugging gets much easier. Do not start by changing waits or increasing retries, because that usually masks the wrong layer.

The most common root causes

1. A restored `node_modules` directory no longer matches the lockfile

This is probably the most common source of CI cache issues. A cache hit can restore a dependency tree that was created under a slightly different lockfile, platform, or package manager version.

Symptoms include:

tests failing at import time in CI but not locally
browser automation APIs behaving differently after a dependency update
package-level transitive changes that only appear after cache invalidation

If the package manager installs from package-lock.json, pnpm-lock.yaml, or yarn.lock, prefer caching the package manager’s download cache instead of the whole node_modules tree unless you have a strong reason to cache installed modules directly.

A safer pattern is:

- uses: actions/setup-node@v4
  with:
    node-version: 20
    cache: npm

run: npm ci

npm ci is deterministic by design, and it will fail if the lockfile and package.json disagree. That is often a better failure mode than silently reusing an old dependency tree.

2. Playwright browser binaries are cached inconsistently

Playwright downloads browser binaries separately, and those binaries must line up with the installed Playwright package. If the cache changes caused browser binaries to be restored from a different run, your tests can start failing in unusual ways, including startup crashes or browser-specific behavior changes.

If you are caching Playwright browsers manually, check these things:

the cache key includes the Playwright version
the cache key includes the OS and architecture
the cache is not being reused across incompatible Node or Playwright versions
the browser install step still runs when needed

A simple and robust pattern is to install browsers after dependency installation, and let the package version determine the browser set.

- run: npx playwright install --with-deps

That may cost more time than a perfect browser cache, but it is often the right tradeoff if your team values correctness over shaving a minute off the pipeline.

3. A build artifact is being reused after source changes

This one is easy to miss. Your test may pass locally because your local build is fresh, but fail in CI because the workflow restored an artifact from a previous commit.

Typical examples:

.next in Next.js
dist from a frontend build
compiled CSS or generated TypeScript output
test fixtures generated in an earlier job and reused in a later one

If a cache contains build output, ask whether that output is truly content-addressed. If the answer is no, do not cache it unless the invalidation strategy is airtight.

If a cached artifact changes behavior when source changes, but the cache key does not change with the source, the pipeline is lying to you.

4. The test depends on generated state from a previous run

Sometimes the app state is not cached, but the test still assumes some generated file or backend record exists. Once the cache changes, a previous “lucky” order of operations disappears and the suite breaks.

Examples:

a test expects a seeded database record that is created by another test
a fixture file was generated once and then silently reused
a local dev server writes temp files that persist between runs

This is not really a Playwright issue. It is a test isolation issue.

A debugging workflow that works

When I need to isolate a failure like this, I use a binary elimination approach.

Step 1, rerun the exact CI command locally

Do not just run the test file from your IDE. Try to reproduce the actual install and test commands used in GitHub Actions.

If your workflow does this:

- run: npm ci
- run: npx playwright test

then reproduce exactly that sequence locally in a clean environment, ideally inside Docker or a fresh shell with no local caches.

Step 2, disable caches one layer at a time

Do not remove all caching forever. Instead, isolate which cache is correlated with the failure.

Try the following in separate runs:

disable package cache, keep browser cache
disable browser cache, keep package cache
disable build artifact cache entirely
disable only restore, keep save disabled

If the failure disappears when one cache is removed, you now know where to look.

Step 3, compare installed versions in passing and failing runs

Log these values in both jobs:

Node version
package manager version
Playwright version
browser version if available
OS image version

A surprising number of “flaky” failures are really version skew. The cache did not break the test, it changed which version was present.

Step 4, inspect whether test artifacts are reused

Look for directories that should be regenerated but are not, such as:

dist
.cache
.playwright
test-results
playwright-report

Some of these are safe to upload as artifacts after a run, but they should not usually be restored as input to the next test run.

A practical GitHub Actions pattern for Playwright

For many teams, the safest approach is to keep the workflow simple and deterministic.

name: tests
on: [push, pull_request]

jobs: e2e: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4

  - uses: actions/setup-node@v4
    with:
      node-version: 20
      cache: npm

  - run: npm ci
  - run: npx playwright install --with-deps
  - run: npx playwright test

This is not the fastest possible pipeline, but it gives you a clean baseline. Once that is stable, you can add selective caching back in and validate each cache with a failure-oriented mindset.

If you are caching build output, make the key obviously tied to the source state and toolchain state.

- uses: actions/cache@v4
  with:
    path: |
      .next/cache
    key: $-next-$

That example is intentionally conservative. The important point is not the exact key shape, it is that you know what invalidates the cache.

Why retries are not the first fix

Retrying a flaky browser test can be useful when the failure is caused by true timing variance, such as a temporary network delay or a slow animation. But if the failure only appears after cache changes, retries can hide the real defect.

A retry may succeed because it lands on a different cached state, not because the underlying issue disappeared.

That is why I treat retries as a diagnostic tool, not a primary solution. If a retry fixes the run, ask whether the second attempt used a different environment, a different server state, or a different artifact set. If so, the flake is environmental, not random.

Specific things to check in Playwright projects

Check your fixture and project setup

If you use Playwright fixtures to set up auth state, API stubs, or test data, verify that those fixtures are created per run and not restored from disk unexpectedly.

Check local storage and auth state files

Auth files are often generated once and then reused. If they are cached incorrectly, tests may fail after a dependency or browser update because the persisted state no longer matches the app.

Check screenshots and trace output handling

Artifacts like screenshots and traces are great for debugging, but they should be output, not input. If you accidentally restore them as part of a cached folder, you can end up mixing old and new debugging data.

Check headed versus headless assumptions

If the cache change coincided with a browser binary update, headless behavior can shift enough to expose timing assumptions in your locators or waits.

Playwright’s locator and auto-waiting model is strong, but it cannot fix a test that relies on stale DOM assumptions. If a selector only works because an element is present during a previous cached state, the test is still fragile.

How to tell cache problems from real product bugs

A real product bug usually reproduces across environments when the application version is the same. A cache problem usually has one or more of these traits:

the failure follows the workflow, not the code
a clean install makes the issue disappear
changing one cache key restores green builds
logs show different dependency or browser versions between runs
the failure is isolated to CI, while local runs stay stable

If you can make the issue vanish by deleting caches and build output, that is a strong signal that the test suite is relying on something mutable.

A decision tree for fixing the problem

Use this order:

Reproduce the failure with all caches disabled.
Confirm the exact Node, package manager, and Playwright versions.
Verify whether browser binaries are version-aligned.
Remove cached build artifacts from the test input path.
Replace node_modules caching with deterministic installs where possible.
Tighten cache keys to include all relevant inputs.
Only then consider test-level retries or additional waits.

That sequence matters because it avoids spending time stabilizing a symptom while the underlying environment stays unstable.

Cache strategy guidelines I trust

Here is the short version of how I approach CI cache design for browser tests:

cache downloads, not installed dependency trees, unless you have a strong reason not to
keep browser binaries tied to the exact Playwright version
do not cache build output unless the invalidation strategy is explicit and reviewed
do not let test artifacts become input artifacts
treat any CI cache hit as a potential source of hidden state, not a guarantee of correctness

The tradeoff is simple. More aggressive caching can improve speed, but it increases the surface area for dependency drift and stale artifacts. For browser automation, I usually optimize for reproducibility first, then add caching back only where I can explain the invalidation rules in one sentence.

Final debugging checklist

If your Playwright tests fail after GitHub Actions cache changes, check these in order:

Does the lockfile match the installed dependencies?
Are Playwright browsers installed fresh or restored from a compatible cache?
Is any build output being reused across commits?
Are test fixtures or auth files being persisted when they should be regenerated?
Do local and CI runs use the same Node, package manager, and Playwright versions?
Can the failure be reproduced with all caches disabled?

If the answer to the last question is no, your problem is almost certainly cache-related, even if the visible failure is a timeout, a selector miss, or a browser crash.

Closing thought

The phrase “flaky browser tests” is often too vague to be useful. When a suite passes locally but starts failing after a GitHub Actions cache change, I try to name the actual bug, dependency drift, stale browser binaries, reused build artifacts, or bad test isolation.

That naming step matters, because the fix follows the cause. If you get the cache boundaries right, Playwright becomes much more predictable in CI, and your tests stop depending on hidden state you never meant to keep.

For broader context on the testing discipline behind this, the general concepts of software testing, test automation, and continuous integration are still useful references when you need to explain these failure modes to the rest of the team.