June 29, 2026
How I Use Playwright Traces, Console Logs, and Network Timing to Reproduce CI-Only Browser Failures
A practical Playwright debugging workflow for reproducing CI-only browser failures using traces, console logs, request timing, and environment diffs.
When a browser test passes locally and fails in CI, the first instinct is often to rerun it and hope the failure disappears. That usually buys time, not answers. The better move is to treat the failure as a forensic problem: collect the right artifacts, line them up, and narrow the gap between local execution and CI execution until the bug becomes reproducible.
My default workflow for these cases is built around three signals from Playwright, traces, console logs, and network timing. Each one tells a different part of the story. The trace shows what the browser saw and did, console logs show what the app and browser complained about, and network timing shows whether the page was waiting on something slower or different in CI.
If you are using Playwright for browser automation, this workflow is one of the most practical ways to reproduce CI-only browser failures with Playwright without guessing at root cause.
Why CI-only failures are different
A failure that appears only in CI is often not a true “works locally, fails remotely” contradiction. More often, CI changes the shape of execution in small but important ways:
- slower CPU or less memory
- different viewport size or browser version
- stricter network conditions
- different test ordering or shared state
- missing fonts, locale data, or OS dependencies
- parallel execution causing race conditions
- auth, cache, or service startup timing differences
In other words, the bug may already exist locally, but your laptop is hiding it by being faster, warmer, more stable, or simply less loaded.
The goal is not to make CI look like your laptop, it is to make the failure explain itself.
That means you want artifacts that capture behavior, not just a pass/fail status. Playwright is good at this because it can record traces, screenshots, videos, and structured logs, and it integrates naturally into CI pipelines.
My debugging order of operations
When I hit a CI-only browser failure, I use this order:
- Reproduce the test in CI with maximum useful artifacts enabled.
- Open the Playwright trace and identify the exact step where the browser diverged.
- Read console logs around that step for runtime errors, warnings, and framework noise.
- Correlate network timing with the failure window to see whether the page or API was slow, blocked, or inconsistent.
- Compare the CI environment to local execution, then reduce the gap until the issue is reproducible on demand.
This is much more effective than immediately rewriting waits or sprinkling retries everywhere. Retries can hide symptoms, but they do not explain why a step failed.
Start by collecting better artifacts in CI
If your CI run only stores a JUnit report, you are flying blind. At minimum, I want the following when a failure happens:
- Playwright trace for the failed test
- browser console output
- request and response metadata, especially around the failing action
- screenshots on failure
- video if the issue looks visual or timing-related
A common setup in Playwright Test looks like this:
import { defineConfig } from '@playwright/test';
export default defineConfig({ use: { trace: ‘retain-on-failure’, screenshot: ‘only-on-failure’, video: ‘retain-on-failure’, }, });
That is usually enough to preserve useful evidence without generating huge artifact volumes for every passing test.
For CI pipelines, I also like to keep the browser output attached to the job. If the test runner prints a useful stack trace, network error, or uncaught exception, I want to see it in the job logs without downloading artifacts first.
Use the trace viewer as the primary timeline
The Playwright trace viewer is the fastest way I know to move from “the test failed somewhere” to “it failed on this step for this reason.” I use it like a timeline, not like a screenshot gallery.
What I look for first:
- the exact action before the failure, for example click, fill, navigation, or wait
- whether the page was still loading resources
- whether the locator resolved to the element I expected
- whether the DOM changed between action and assertion
- whether a navigation happened implicitly after a click
A lot of CI-only failures happen because the app is slightly slower, so the test reaches an assertion before the UI is ready. The trace will show whether the browser was still on the old page, whether an overlay blocked the click, or whether a SPA route transition was still pending.
What trace evidence usually means
Some patterns show up again and again:
- Action timeout on click: often overlay, animation, or element not yet actionable
- Assertion timeout on text or visibility: UI rendered later in CI, or request dependency was slower
- Navigation mismatch: the click triggered navigation locally, but not in CI, usually because the app state differed
- Detached element: the DOM rerendered and the locator became stale between actions
The trace viewer is especially useful when a test fails in an assertion that looks simple, such as toHaveText, but the underlying problem is that the page never reached the intended state.
Add browser console logs to catch the hidden runtime signal
The browser console often contains the first clue that the app is unhealthy before the test actually fails. I always wire console capture into the test runner when debugging CI-only issues.
Here is a compact Playwright example:
import { test } from '@playwright/test';
test.beforeEach(async ({ page }) => {
page.on(‘console’, msg => {
console.log([browser:${msg.type()}] ${msg.text()});
});
page.on(‘pageerror’, error => {
console.log([pageerror] ${error.message});
});
});
This gives me three useful categories:
console.errormessages from the applicationpageerrorexceptions that may not bubble into the test assertion- browser warnings that indicate polyfill, CSP, or network issues
What I pay attention to in console output
Not every warning matters, so I filter mentally:
- ReferenceError, TypeError, or uncaught exceptions usually matter immediately
- Failed to load resource messages may point to missing assets or backend routes
- CSP violations can explain why something works locally but not behind a stricter CI host or preview environment
- Hydration warnings can indicate framework mismatch that causes flakiness during initial render
A useful habit is to correlate console timestamps with trace steps. If an error appears right before a locator timeout, the application may have silently entered a bad state and the UI failure is just a downstream symptom.
Network timing is where a lot of CI-only failures hide
If the app depends on API calls, third-party assets, authentication redirects, or feature flags, network timing can easily explain why CI behaves differently from your laptop. CI environments frequently have colder DNS caches, slower external connectivity, or more constrained throughput.
Playwright gives you enough hooks to inspect request and response timing without writing a lot of infrastructure. I often start by logging slow or failing requests:
import { test } from '@playwright/test';
test.beforeEach(async ({ page }) => {
page.on(‘requestfailed’, request => {
console.log([requestfailed] ${request.method()} ${request.url()} ${request.failure()?.errorText});
});
page.on(‘response’, async response => {
if (response.status() >= 400) {
console.log([response] ${response.status()} ${response.request().method()} ${response.url()});
}
});
});
For deeper timing questions, I inspect whether the request is simply slow, whether the backend returned an error, or whether the browser was waiting on chained requests before the UI could update.
The timing questions I ask
- Did the critical request start at all?
- Did it succeed, but later than the test expected?
- Did it return a different status code in CI?
- Was a redirect involved?
- Did one failed call cause a retry, timeout, or fallback path in the app?
That last one matters more than people think. A CI-only failure might not be the root network request itself, but a fallback path that only executes after latency crosses a threshold.
Reconstructing the failure with a local replay mindset
Once I have trace data, console output, and timing logs, I try to replay the failure locally in a controlled way. The objective is to remove uncertainty, not to blindly rerun the entire suite.
A good reconstruction includes:
- the same browser channel, if possible
- the same viewport size
- the same base URL or environment
- the same auth state or test data shape
- the same test order, if ordering is relevant
- the same headless mode as CI
This matters because tiny differences often change timing enough to hide a race.
For example, if CI runs in headless Chromium with a narrow viewport and your local run uses headed mode on a large monitor, the layout and scroll behavior may differ. A locator that is visible locally might be offscreen, covered, or delayed in CI.
A practical triage checklist
When I open a failing trace, I usually ask these questions in order:
1. Was the locator correct?
If the locator points to the wrong element, the failure is not timing, it is selector quality. The trace viewer can show exactly what Playwright matched. This is one reason I prefer locators that express intent, such as role-based selectors, when the app supports them.
2. Did the page reach the expected state?
If a click happened but the UI did not change, I check whether the app was waiting on data, blocked by an overlay, or still mounting. A test that assumes synchronous rendering will often fail only when CI is slower.
3. Did the browser log a runtime error?
A silent frontend exception can leave the UI half-rendered without an obvious test failure until a later assertion.
4. Did any request fail, stall, or redirect unexpectedly?
This often points to missing test data, expired auth, environment config drift, or backend instability.
5. Is the failure deterministic under the same conditions?
If I can reproduce it locally by matching browser, viewport, and network conditions, I know the path to root cause is real, not just incidental.
When I turn a CI-only failure into a local failure
The fastest way to solve a CI-only browser failure is often to make local execution behave more like CI.
Here are the most common adjustments I make:
- run headless locally
- use the same browser engine and version where possible
- reduce viewport to CI defaults
- clear storage and cookies before the test
- disable local mock shortcuts that CI does not use
- throttle network or add artificial latency to expose races
- run the single failing test, not the whole suite
That last point is important. Suite-level noise can mask the problem. I want a minimal reproduction that isolates the failing flow.
If the app is sensitive to timing, I sometimes use request interception or backend test doubles to simulate slow dependencies. The point is not to fake success, it is to prove that the failure is tied to a specific timing threshold.
Avoid the trap of overusing waits and retries
When a CI-only test is flaky, the easy fix is to add a longer wait, or wrap the assertion in a retry. Sometimes that is justified, but only after you know what you are waiting for.
A better order is:
- determine which request or DOM state the test depends on
- wait for that condition explicitly
- verify the app actually reached the intended state
- only then consider reducing false positives with retries
For example, a vague fixed wait is less helpful than an assertion that proves the data has loaded:
typescript
await page.getByRole('heading', { name: 'Orders' }).waitFor();
await expect(page.getByTestId('orders-table')).toBeVisible();
If the table depends on a network call, I may wait for the specific response or a visible loading state to disappear. The key is to synchronize with behavior, not time.
Compare local and CI environment signals systematically
A useful debugging pattern is to write down exactly what differs between local and CI. I usually compare:
- OS and container image
- browser channel and version
- CPU and memory limits
- timezone and locale
- viewport and device emulation settings
- environment variables
- network access and proxy settings
- seeded data and account state
You do not need a perfect match to find the bug. You need enough equivalence to make the failure reproducible.
Most “mysterious” CI failures stop being mysterious once you compare the runtime inputs, not just the code.
A small example of a debugging fixture
This pattern is simple, but it helps me gather evidence without cluttering test logic:
import { test } from '@playwright/test';
test.beforeEach(async ({ page }) => {
page.on(‘console’, msg => console.log([console:${msg.type()}] ${msg.text()}));
page.on(‘pageerror’, err => console.log([pageerror] ${err.message}));
page.on(‘requestfailed’, req => console.log([requestfailed] ${req.url()} ${req.failure()?.errorText}));
});
I keep this in a shared debugging helper and enable it only for failing specs or CI reruns. That keeps normal runs quiet while preserving enough observability when things break.
Knowing when the bug is in the app, not the test
One of the hardest parts of browser test debugging is admitting that the test may be revealing a product bug, not a test bug. The clues are usually in the combination of artifacts:
- trace shows the interaction happened correctly
- console shows a frontend exception or hydration problem
- request timing shows the data arrived late or failed
- assertion fails because the UI never completed the intended state
At that point, rewriting the test is the wrong fix. The test is doing its job by catching a real issue.
This is why I prefer to document the failure path clearly before changing anything. It helps separate test fragility from application instability.
What I change after I understand the failure
Once I know the root cause, I choose the smallest reliable fix.
Possible outcomes:
- improve locator quality
- wait on the correct state instead of a guessed delay
- fix test data seeding or cleanup
- isolate shared state between tests
- adjust CI browser configuration or viewport to match intended coverage
- fix an app race condition or rendering issue
- replace a brittle assertion with a more meaningful one
If the issue is a genuine race in the application, I fix the application. If the issue is a test that assumes an instant transition, I fix the test. If the issue is environment drift, I fix the environment and lock it down.
A simple mental model for future failures
I have found it useful to think about CI-only browser failures in three layers:
- What the browser did, captured by trace
- What the app said, captured by console logs
- What the network did, captured by request timing
If all three agree, the root cause is usually obvious. If they disagree, that disagreement is often the clue.
For example:
- trace says the click happened, console says an exception occurred, network says the API returned 500, the app bug is likely real
- trace says the element never became actionable, console is clean, network is slow, the problem is probably timing or loading state
- trace says the navigation happened, console shows an auth error, network shows redirect loops, the issue is likely environment or session setup
That is the practical value of combining these signals, they give you a reproducible explanation instead of a vague flake.
Final thoughts
The best way I know to reproduce CI-only browser failures with Playwright is to stop treating them like random flakes and start treating them like evidence collection problems. Traces show the sequence, console logs show runtime health, and network timing shows whether the app was actually ready when the test moved forward.
If you build a habit of preserving those artifacts on failure, you will spend less time guessing, less time adding brittle waits, and more time fixing the real issue, whether that issue lives in the test, the app, or the CI environment.
For teams doing serious browser test debugging, this workflow also improves code review quality. It makes failure discussions concrete. Instead of debating whether a test is flaky, you can point to the exact request, console error, or DOM transition that caused the test to fail.
That is the difference between reacting to CI and actually understanding it.