How I Debug Flaky Playwright Tests Caused by Animation, Transitions, and Layout Shift

I have lost more time than I care to admit to a test that only failed when I was trying to prove it was fixed. The pattern was always the same: the test passed locally a few times, failed in CI at random, and then passed again after I added a blind wait or reran the job. That is usually how flaky Playwright tests caused by animation start to announce themselves. The failure looks like a selector problem, a timing issue, or some mysterious CI-only instability, but the real culprit is often motion in the UI itself, CSS transitions, or a layout shift that changes the target before Playwright finishes interacting with it.

I want to walk through how I debug these failures in practice. This is not a theory piece, it is the sequence I use when a Playwright test is unstable and the DOM seems “fine” on paper. The short version is that I stop thinking about the test first and start thinking about the page as a moving system. If the UI is animated, fading, sliding, expanding, or reflowing, then the test is not acting on a static target, even if the test code makes it look that way.

The symptoms vary, but the underlying theme is consistent. The test is trying to click, type, or assert on an element while the element is still moving or its layout position is still changing.

Common failures include:

A click that intermittently times out because the element is not considered stable yet
A click that lands on the wrong element after a reflow
An assertion that reads the DOM before the final content size settles
A screenshot diff that changes because a fade-in is still in progress
A locator that matches the right node, but the node is visually obscured by an overlay animation

Playwright is generally strong at waiting for actionability, which helps a lot, but actionability is not magic. A page can still be “visible” while a transition is moving it, or a list can be render-ready while its height is still changing. If you want the official framing, Playwright documents its waiting and actionability behavior well in the Playwright docs.

The important detail is not whether the element exists, it is whether the UI has reached a state where your intended interaction is actually meaningful.

First, prove the problem is motion, not the locator

When a test fails, my first instinct is not to rewrite the test. I want evidence. I usually ask three questions:

Is the element present at the time of failure?
Is it present but moving, resizing, or obscured?
Is the failure tied to a specific transition, animation, or responsive breakpoint?

A good quick check is to run the test in headed mode and slow it down just enough to see movement. If I can see a menu slide in, a card expand, or a toast push content downward, I already have a likely cause.

Here is a minimal Playwright example that often exposes the issue quickly:

import { test, expect } from '@playwright/test';

test('opens settings and saves changes', async ({ page }) => {
  await page.goto('/settings');
  await page.getByRole('button', { name: 'Open settings' }).click();
  await page.getByRole('button', { name: 'Save' }).click();
  await expect(page.getByText('Saved')).toBeVisible();
});

That looks innocent, but if the settings panel animates in from the side and the Save button is inside a container that is still sliding, the second click can become flaky. Sometimes the button is clickable, sometimes the click happens before the panel is fully stable, and sometimes an overlay intercepts the action.

To confirm this, I often inspect the trace, screenshot, and video artifacts in CI. If the test failed on a click, I look at the state right before the action. If the target shifts between frames, the bug is almost certainly motion-related.

Why CSS transitions create unstable tests

CSS transitions are a common source of frontend test timing problems because they can make a valid state look ready before the UI is actually settled.

Examples include:

Buttons that grow on hover
Side panels that slide in from offscreen
Accordions that animate height
Modals that fade in and scale up
Toasts that push content down after appearing

These are good UX patterns in production, but they can be awkward for tests because the DOM structure and visual state are not always aligned at the instant the test checks them.

A typical example is a modal with a fade and scale transition. The modal exists immediately after click, but the close button may not be in its final position yet. If the test tries to click the close button too quickly, Playwright may wait for actionability, then still fail because the overlay or transform has not completed.

I have found that the most fragile tests are the ones that make implicit assumptions like these:

“If the element is visible, it is safe to click”
“If the network request finished, the UI is ready”
“If the text exists, the layout is stable”

Those assumptions are often wrong when motion is involved.

Layout shift is more dangerous than it sounds

Layout shift testing matters because layout shift does not always look dramatic. Sometimes a subtle font load, image dimension change, or late-rendered banner moves the target just enough to break the test. The page may look okay to a human, but an automated click at the wrong millisecond can miss or hit the wrong element.

I think about layout shift in three buckets:

1. Late content injection

An API response adds content above or near the target element, changing the vertical flow of the page.

2. Asset-driven reflow

Images without reserved dimensions, web fonts, or icon packs cause the layout to adjust after initial render.

3. Animation-driven repositioning

Elements move because of transitions, expanding panels, sticky headers, or collapsible regions.

When a test becomes flaky, I try to identify which bucket I am dealing with. The fix depends on whether the UI is moving because of data, assets, or animation.

The debugging workflow I actually use

When I suspect flaky Playwright tests caused by animation, I work through the problem in this order.

1. Reproduce it with trace and video

I enable tracing in the test run if it is not already enabled. This is the fastest way to see what the browser saw.

import { defineConfig } from '@playwright/test';

export default defineConfig({ use: { trace: ‘on-first-retry’, video: ‘retain-on-failure’ } });

Then I look at the moments before the failure. I am trying to answer:

Was the element visible yet?
Was an overlay still fading out?
Did the page jump because something loaded above the target?
Did the element move after hover or focus?

Trace artifacts are especially useful for hover-driven menus and toast-heavy screens because the visual state often explains what the DOM alone does not.

2. Check whether Playwright is waiting for stability or fighting the UI

Playwright does a lot of waiting for you, but if the application keeps changing, the action can still fail. I try to narrow the problem by temporarily splitting the action into smaller steps.

typescript

const button = page.getByRole('button', { name: 'Save' });
await expect(button).toBeVisible();
await expect(button).toBeEnabled();
await button.click();

If the click still fails intermittently, that tells me the target is not stable enough at the moment of interaction. If the assertion fails, the UI may not have reached the state I assumed.

3. Look for transforms and transitions in the component

I open the browser devtools and inspect the computed styles of the moving element. I want to know if the component uses:

transition
transform
opacity
height, max-height, or width animation
sticky or fixed positioned siblings that may overlap

A common culprit is transform: translateY(...) or scale(...) used to animate modals or dropdowns. Even when the element appears on screen, it may still be in motion, and the bounding box may not reflect the moment the test assumes.

4. Try to disable motion in test mode

If the application supports a reduced-motion or test-only mode, I use it. That is usually the cleanest fix, because tests should validate behavior, not animation timing.

A simple approach is to inject a stylesheet during tests:

typescript

await page.addStyleTag({
  content: `
    *, *::before, *::after {
      transition-duration: 0s !important;
      animation-duration: 0s !important;
      animation-delay: 0s !important;
      scroll-behavior: auto !important;
    }
  `
});

This does not solve every problem, but it is very effective for test environments where motion is not part of what you need to verify.

5. Verify whether the issue is a waiting problem or a design problem

Sometimes the test is bad. Sometimes the app is bad for testability. The distinction matters.

If the application relies on a long entrance animation before the target is usable, that is a product design concern as much as a test concern. I do not want my test suite to depend on a 300ms transition ending at exactly the right time. If the UI can expose a stable ready state, it should.

Prefer stable states over arbitrary sleeps

I almost never want to fix these failures with waitForTimeout. It can make a flaky test look better without making it reliable.

typescript

await page.waitForTimeout(500); // usually a smell

That line is easy to write and hard to justify. If the app needs time, I want to wait on something meaningful, such as:

A network response
A DOM attribute that signals readiness
The end of a specific overlay dismissal
A visible state change that the app explicitly exposes

For example:

typescript

await Promise.all([
  page.waitForResponse(resp => resp.url().includes('/api/settings') && resp.ok()),
  page.getByRole('button', { name: 'Save' }).click()
]);

This is still not enough if the UI animates after the response, but it is better than sleeping blindly. The key is to wait for the thing that actually matters to the user or the component.

How I test a page that animates by design

Some motion is unavoidable or even useful. In those cases, I do not fight the animation directly, I adapt the test strategy.

Use assertions that match the final state, not the intermediate state

If a toast fades in, I assert that it becomes visible and carries the right text. I do not assert on intermediate opacity values or transient positions unless the animation itself is part of what I am testing.

typescript

await expect(page.getByText('Profile updated')).toBeVisible();

That is enough for most functional checks.

Use explicit readiness markers in the app

If the app has a component that loads data and then animates into view, I like a deterministic signal such as data-ready="true" or a test id on the final interactive state.

typescript

await expect(page.locator('[data-ready="true"]')).toBeVisible();
await page.getByTestId('save-settings').click();

This is not about making tests dependent on implementation details for their own sake. It is about creating a stable seam between dynamic UI behavior and test logic.

Reserve visual checks for visual behavior

If I am testing animation itself, then I use visual assertions intentionally. In that case, I want the animation timing, so I treat it as a visual contract. But most functional tests should not depend on exact frame timing.

Fixing the application often beats fixing the test

There are times when I patch the test, but the deeper fix is in the frontend code. A few changes can dramatically reduce instability:

Reserve space for content

If images, banners, or async widgets appear above the main content, reserve their height so the page does not jump after render.

Use consistent dimensions for media

Avoid loading images or embeds without known size. Layout shift often comes from content that arrives without a reserved box.

Keep animation separate from interactivity

If a dropdown becomes clickable only after a motion-heavy entry animation completes, consider making the interactive state ready sooner. The animation can still play, but the control should not depend on the animation for usability.

Reduce overlap during transitions

Overlay elements that linger during fades can intercept clicks even after they visually look gone. I have seen this especially with modals and drawers that animate out but remain in the DOM for a short period.

A test that reveals a bad transition is not a nuisance, it is feedback that the UI is harder to use than it should be.

A practical checklist I use during triage

When I get a flaky failure report, I run through this list:

Does the failure reproduce in headed mode?
Is there a trace or video that shows movement near the failure?
Does the locator point to a moving, resizing, or overlayed element?
Are CSS transitions or animations enabled on the target or its parent?
Is content being injected above the target after initial render?
Is there a web font, image, or async widget causing reflow?
Can I replace a timeout with a wait on a meaningful app state?
Can I disable motion in test mode without changing production behavior?
Is the component testable without depending on animation timing?

That list usually gets me to the cause faster than staring at the failing assertion line.

When I intentionally disable animations in CI

For many teams, the simplest path to Playwright stability is to reduce motion in the CI environment. I am comfortable with this when the test’s purpose is functional coverage, not animation verification.

There are a few ways to do it:

Inject a global style sheet during tests
Provide a test-specific theme or CSS flag
Honor prefers-reduced-motion in the app and set it in the test browser context

typescript

const context = await browser.newContext({
  reducedMotion: 'reduce'
});

This is a nice compromise because it keeps the app behavior realistic while making motion less likely to interfere with testing.

I do not recommend disabling animation everywhere by default unless the team agrees it will not hide meaningful regressions. Some motion-related bugs are real product issues. The point is to be deliberate.

The edge cases that still trip me up

Even after years of working with frontend flakiness, a few edge cases still deserve special attention.

Sticky headers and scroll-driven shifts

A click target can be in view, but a sticky header may cover it after scroll. The locator is correct, the page is visible, and the click still fails. This is not always obvious from the DOM.

Hover menus that disappear when focus changes

A menu that opens on hover can close when the pointer moves a fraction during the test. Playwright’s pointer behavior is good, but hover-based UI can still be fragile if it is overly sensitive.

Virtualized lists

A row may exist only after scrolling, then get recycled. If the list also animates row insertion, the test can interact with the wrong index or stale node.

Font loading and text wrapping

A font swap can change line height and wrapping, which shifts buttons or truncates labels. This is a classic cause of layout shift testing failures that people blame on selectors.

What I change in my test style after I find the cause

Once I confirm motion is the problem, I usually make one of these changes:

Prefer locators that target the final interactive element, not an intermediate wrapper
Wait for a stable state or explicit readiness signal
Disable motion in test runs when animation is not part of the assertion
Remove arbitrary sleeps and replace them with app-driven conditions
Add test-friendly attributes or statuses in the UI when the product can support them

This is how I keep the suite useful without turning it into a pile of timeout hacks.

A small pattern that helps a lot

If I know a component has an animation, I often create a helper that expresses the intended readiness more clearly than raw clicks sprinkled everywhere.

typescript

async function openSettings(page) {
  const button = page.getByRole('button', { name: 'Open settings' });
  await expect(button).toBeVisible();
  await button.click();
  await expect(page.getByRole('dialog', { name: 'Settings' })).toBeVisible();
}

This keeps the test readable and localizes the stability logic. If the opening motion changes, I update one helper instead of twenty tests.

The bigger lesson

Flaky Playwright tests caused by animation are not really about Playwright. They are about the mismatch between a human-friendly interface and a machine that needs stable, observable states. A UI can look responsive and polished while still being hard to test if the motion is not accounted for.

I have learned to treat layout shift, CSS transitions, and animation flakiness as design constraints, not just test annoyances. If I can make the app easier to settle, the tests get better, the CI pipeline gets quieter, and I spend less time defending false negatives.

That is the real payoff. Stable tests are not only faster to trust, they are better documentation of how the app is supposed to behave when the screen is still moving.

Final takeaway

If a Playwright test fails intermittently, do not immediately assume the locator is wrong or that CI is “just flaky.” Check for animation, transitions, overlays, and layout shift first. Then decide whether the right fix is to wait on a meaningful state, reduce motion in test mode, or improve the frontend so the interactive state is more deterministic.

That one habit has saved me from a lot of unnecessary test rewrites, and it is usually the fastest path from mystery failure to a stable suite.

What flaky motion-related failures usually look like

First, prove the problem is motion, not the locator

Why CSS transitions create unstable tests

Layout shift is more dangerous than it sounds

1. Late content injection

2. Asset-driven reflow

3. Animation-driven repositioning

The debugging workflow I actually use

1. Reproduce it with trace and video

2. Check whether Playwright is waiting for stability or fighting the UI

3. Look for transforms and transitions in the component

4. Try to disable motion in test mode

5. Verify whether the issue is a waiting problem or a design problem

Prefer stable states over arbitrary sleeps

How I test a page that animates by design

Use assertions that match the final state, not the intermediate state

Use explicit readiness markers in the app

Reserve visual checks for visual behavior

Fixing the application often beats fixing the test

Reserve space for content

Use consistent dimensions for media

Keep animation separate from interactivity

Reduce overlap during transitions

A practical checklist I use during triage

When I intentionally disable animations in CI

The edge cases that still trip me up

Sticky headers and scroll-driven shifts

Hover menus that disappear when focus changes

Virtualized lists

Font loading and text wrapping

What I change in my test style after I find the cause

A small pattern that helps a lot

The bigger lesson

Final takeaway