June 30, 2026
How to Store Playwright Traces, Videos, and Screenshots in CI So Triage Stops Being Slow
Learn how to store Playwright traces in CI, capture videos and screenshots, and organize CI artifacts so failure triage is faster and more reliable.
If you have ever opened a failed CI run and found only a stack trace, a timeout, and a bunch of logs with no clear clue, you already know why artifact storage matters. A Playwright failure is often not really a code failure, it is an evidence problem. The test saw something unexpected, but without a trace, screenshot, or video, triage turns into reruns and guesswork.
I like Playwright because it gives us the right debugging primitives out of the box, but those artifacts only help if they are captured consistently and stored where the team can actually use them. This is where many pipelines fall apart. The tests are already running in CI, the failures are already expensive, and then the evidence gets lost because the job container is ephemeral, the artifact retention is too short, or the upload rule is too broad.
This article is a practical setup guide for people who want to store Playwright traces in CI, keep videos and screenshots with the right failure context, and make triage faster without drowning the pipeline in noise.
What you are trying to capture, and why each artifact matters
Playwright gives you three main types of debugging evidence that are worth storing in CI:
- Traces, which include the page timeline, actions, DOM snapshots, console messages, network information, and screenshots along the way
- Videos, which show the test run visually, especially useful for timing issues or UI motion
- Screenshots, which give a fast visual checkpoint at the moment of failure or expectation mismatch
Playwright’s official docs are the right place to start if you want the complete feature set and current syntax, especially the Playwright introduction.
The goal is not to save everything all the time, the goal is to save enough evidence that one failure does not require three reruns.
Here is how I think about the tradeoff:
- Trace is the highest value artifact for most UI failures
- Video is useful when timing, animation, or state transitions matter
- Screenshot is lightweight and good for quick visual confirmation, but it rarely explains the full story alone
If your team only stores screenshots, you will eventually get stuck on issues like selectors that matched the wrong element, hidden overlays, delayed API responses, or a modal that opened and closed before the screenshot was taken. The trace makes those problems visible.
The CI problem, in plain terms
A local run and a CI run are not the same environment. CI adds several sources of variability:
- fresh containers or VMs
- slower or more contended CPUs
- different fonts and browser dependencies
- parallel execution
- network latency to the app under test or dependencies
- cleanup between jobs
That means the artifact strategy should assume failure will be intermittent. If a test fails once in fifty runs, the value of the evidence is highest at the exact failure moment. If you only capture artifacts on rerun, you may never reproduce the same state.
Continuous integration, as a practice, is about integrating changes frequently and validating them automatically, but test evidence is part of that system too. A CI pipeline without useful artifacts is really only half an observability system.
A sane default policy for Playwright artifacts
If you want a starting point, I recommend this default policy:
- Traces: capture on first retry or failure, depending on how noisy the suite is
- Videos: capture on failure only, or on first retry for unstable suites
- Screenshots: capture on failure, and optionally on assertion-heavy tests
For most teams, the sweet spot is to store traces only when something goes wrong, rather than for every passing run. That keeps storage costs and upload time under control.
For teams with heavy flakiness, I sometimes start with retain-on-failure for both traces and videos, then tighten later once the suite is more stable.
Configure Playwright to keep the right evidence
Playwright Test supports artifact configuration in the test config. A common pattern is to keep traces and videos only when a test fails, and to capture screenshots on failure.
import { defineConfig } from '@playwright/test';
export default defineConfig({ use: { trace: ‘retain-on-failure’, video: ‘retain-on-failure’, screenshot: ‘only-on-failure’ } });
This is usually enough to get started. If you need even more control, you can tune behavior per project, per test type, or per environment.
For example, I often make the artifact policy stricter in CI than locally, because local runs are interactive and CI failures are the ones that need durable evidence.
import { defineConfig } from '@playwright/test';
export default defineConfig({ use: { trace: process.env.CI ? ‘retain-on-failure’ : ‘on-first-retry’, video: process.env.CI ? ‘retain-on-failure’ : ‘off’, screenshot: ‘only-on-failure’ }, retries: process.env.CI ? 1 : 0 });
Why retries matter here
Retries are not just about smoothing over transient issues. They also give you a useful boundary for when to keep artifacts. If a test fails once and passes on retry, the first failure is often the one you want to inspect, because it may contain the original broken state.
That is why I prefer trace-on-first-retry or trace-on-failure over blanket always-on trace collection. Always-on traces are great for small suites, but they create more storage and upload overhead than many teams actually need.
Store artifacts where the job cannot lose them
The most common mistake I see is assuming the CI job workspace is enough. It is not. The workspace usually disappears when the job ends, the runner is recycled, or the container gets destroyed.
You want artifact storage to be explicit:
- Save the Playwright output to a stable directory
- Upload that directory as a CI artifact
- Retain it long enough for triage and investigation
- Make it easy to download from the job summary or test report
A simple directory layout helps a lot:
text playwright-report/ artifacts/ traces/ videos/ screenshots/
You can also keep the default Playwright output directory and upload it directly, but separating failure artifacts from the HTML report often makes CI jobs easier to scan.
A GitHub Actions example that actually preserves evidence
Here is a realistic GitHub Actions setup for storing Playwright traces, videos, screenshots, and the HTML report.
name: e2e
on: push: pull_request:
jobs: playwright: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: 20 - run: npm ci - run: npx playwright install –with-deps - run: npx playwright test env: CI: true - uses: actions/upload-artifact@v4 if: always() with: name: playwright-artifacts path: | playwright-report/ test-results/ retention-days: 7
A few details matter here:
if: always()ensures you upload artifacts even if the test step failsretention-daysshould match your triage window, not just a defaulttest-results/is where Playwright stores many failure artifacts by defaultplaywright-report/keeps the HTML report available for later inspection
If you use matrix builds, include the OS and browser in the artifact name so the output does not overwrite itself.
- uses: actions/upload-artifact@v4
if: always()
with:
name: playwright-artifacts-$-$
path: |
playwright-report/
test-results/
Make the failure output easy to navigate
The raw files are useful, but triage gets faster when the artifacts are organized in a predictable way. I like the following conventions:
- one test run, one artifact bundle
- include the CI run number or commit SHA in the artifact name
- keep the browser name in the path or artifact name
- preserve the test title or file name in the artifact structure
For example, a failure bundle might look like this:
text playwright-artifacts-chromium-1042/ playwright-report/ test-results/ checkout.spec.ts-payment-flow-fails/ trace.zip screenshot.png video.webm
That structure means a developer can move from a failed job to the exact test evidence without scrolling through unrelated files.
When to store traces on every run
There are cases where storing traces on every run makes sense:
- a tiny suite where artifact volume is trivial
- a critical flow with a high business cost of failure
- short-lived debugging during a release stabilization period
- compliance or audit requirements that demand a stronger audit trail
But be careful, because always-on traces can become expensive in three ways:
- Storage growth, especially for large suites or long-running jobs
- Upload latency, which slows the pipeline
- Signal dilution, because passing artifacts can hide the failures that matter
If you do decide to store traces always, I would still separate long-term retention from immediate triage. For example, keep full artifacts for 7 days, but keep a compact report or summary for longer if you need trend analysis.
Screenshots are useful, but they are not enough by themselves
Screenshots are fast, cheap, and often good enough for a visual assertion failure. They are also easy to attach to a CI report. But they have a narrow field of view.
A screenshot tells you what was on the page at a single instant. It does not tell you:
- whether the page loaded slowly
- whether the wrong request returned data
- whether an overlay was present briefly
- whether the click happened before the page became actionable
- whether an assertion failed because the DOM changed milliseconds earlier
That is why I treat screenshots as a companion artifact, not the primary debugging record.
In Playwright, this still works well for specific checks:
import { test, expect } from '@playwright/test';
test('checkout summary stays visible', async ({ page }) => {
await page.goto('/checkout');
await expect(page.getByRole('heading', { name: 'Order summary' })).toBeVisible();
await expect(page).toHaveScreenshot('checkout-summary.png');
});
When the screenshot fails, the trace is usually what explains why.
Videos help when timing is the bug
I like video artifacts when a failure seems related to animation, drag and drop, modal timing, focus management, or delayed rendering. A video is often the quickest way to answer questions like these:
- Did the button exist when the click happened?
- Did the spinner disappear too early?
- Did the page scroll unexpectedly?
- Did a toast cover a control?
Videos are especially useful when a trace tells you what happened, but not quite how the user perceived it. That visual layer matters for UI problems with motion or state transitions.
The downside is size. Videos can become the heaviest artifact in the pipeline, so I usually keep them only on failure. If your suite is large, storing every video for every passing run is often not worth it.
A better triage loop starts in the test itself
The fastest way to make artifact storage useful is to label tests well. Good test names and scoped steps are a triage multiplier.
If the test output says only “should work”, the artifact may still be useful, but the person investigating has extra work. If the test name says exactly what user journey failed, the screenshot or trace can be interpreted much faster.
I also like to use test.step for critical interactions, because those step labels appear in the trace and make it easier to identify where the failure occurred.
import { test, expect } from '@playwright/test';
test('user can place an order', async ({ page }) => {
await test.step('open checkout page', async () => {
await page.goto('/checkout');
});
await test.step(‘submit payment details’, async () => { await page.getByLabel(‘Card number’).fill(‘4242424242424242’); await page.getByRole(‘button’, { name: ‘Pay now’ }).click(); });
await expect(page.getByText(‘Order confirmed’)).toBeVisible(); });
That kind of structure turns the trace into a readable story instead of a pile of actions.
Keep the artifacts close to the failure, but not too close to the code
One mistake I see in teams is attaching files only to the test code repository, while the actual CI run data lives somewhere else and expires quickly. Another mistake is the opposite, keeping everything in generic blob storage with no easy link back to the build that generated it.
A good setup usually has three layers:
- CI job summary, which gives immediate access to the current run’s artifacts
- longer-term storage, which holds the last few runs or failure bundles
- test reporting system, which indexes failed tests, reruns, and evidence links
That way, if a developer sees a flaky test in the morning, they can inspect the latest failure without reconstructing the pipeline by hand.
How much retention do you really need?
Artifact retention is a policy decision, not just a storage setting. Ask these questions:
- How long does it usually take someone to investigate a failure?
- Do failures get discussed in the same day, or do they sit until the next sprint?
- Do you need artifact history for compliance or postmortems?
- Does the team rerun failures often, making older evidence less useful?
For many teams, 7 to 14 days is enough for CI failure artifacts. For regulated systems or release-heavy teams, you may need longer. What matters is being deliberate. If you keep everything forever, people stop trusting the artifact system because it becomes too noisy and too expensive.
Common pitfalls that slow triage down
Here are the mistakes I see most often when teams try to store Playwright artifacts in CI:
1. Uploading artifacts only on success
This is the fastest way to make evidence disappear exactly when you need it.
2. Forgetting if: always() or equivalent logic
If the upload step is skipped after a test failure, the job can fail before the evidence is saved.
3. Capturing too much
Always-on videos and traces for huge suites can make the pipeline slower and more expensive than necessary.
4. Capturing too little
A single screenshot without trace or test step context often forces a rerun.
5. Using unclear artifact names
If the artifact is called results.zip, nobody wants to open it.
6. Not matching artifact policy to suite behavior
A stable smoke suite and a flaky cross-browser suite should not necessarily share the same artifact retention rules.
When artifact storage and flake analysis work together
Flaky tests are often treated like a binary problem, pass or fail, but the evidence tells a richer story. If a test fails sporadically in CI, the trace can help answer whether the issue is:
- a bad locator
- a timing problem
- an environment-specific rendering issue
- a backend dependency that responded slowly
- a genuine product regression
That is why I like pairing artifact capture with a small amount of metadata, such as browser, branch, commit SHA, and retry count. Once you can correlate failures across runs, debugging becomes pattern recognition instead of detective work.
The more deterministic the artifact naming and retention policy, the less time people spend arguing about whether a failure is reproducible.
A practical checklist for your pipeline
If you want a quick implementation checklist, this is the one I use most often:
- enable trace capture on failure or first retry
- enable screenshots on failure
- enable videos on failure for the tests that need them
- upload
playwright-report/andtest-results/as CI artifacts - make the upload step run even when the test job fails
- include browser, OS, and run identifier in artifact names
- set a retention window that matches your triage process
- keep artifact paths predictable so developers do not hunt for files
If your team uses multiple CI providers, the same principles still apply. The syntax changes, but the core idea is stable, preserve evidence automatically and make the failure bundle easy to find.
My default recommendation
If I were setting this up for a team today, I would start with this policy:
- traces:
retain-on-failure - videos:
retain-on-failure - screenshots:
only-on-failure - upload artifacts on every failed job
- keep a 7-day retention window initially
- name bundles by browser and run ID
That setup gives you enough debugging evidence to stop rerunning failures blindly, without turning the pipeline into an artifact warehouse.
Once the team starts using the traces consistently, you can tune the policy based on real behavior. If most failures are visual, keep the screenshots and videos. If most failures are network or selector problems, the trace will do most of the work. If storage is growing too fast, tighten retention or reduce artifact capture on stable suites.
Closing thought
The real value of Playwright artifacts is not just that they exist, it is that they shorten the path from failure to explanation. When you store Playwright traces in CI, keep videos and screenshots alongside them, and upload them in a predictable way, triage stops feeling like archaeology.
You do not need a huge observability stack to get there. You need a consistent policy, a reliable upload step, and enough discipline to keep the evidence attached to the failure that produced it.
That is usually the difference between a flaky test that sits unresolved for days and a test that gets fixed before lunch.