How I Decide Whether to Mock, Stub, or Hit Real Services in Playwright E2E Tests

When I write Playwright tests, I am not trying to make every dependency disappear. I am trying to prove the app behaves correctly at the boundary that matters for the scenario I am testing. Sometimes that means mocking a network call, sometimes stubbing a narrow response, and sometimes letting the browser talk to real services because that is the only way to catch the bug I care about.

That choice sounds simple until you are maintaining a suite with hundreds of tests, a few flaky specs, and a team that keeps asking why one checkout flow is mocked in one test and real in another. My rule is not “mock everything” or “never mock.” My rule is: choose the smallest boundary that still exercises the risk you need coverage for.

This is the framework I use when deciding whether to mock stub or hit real services in Playwright tests. It is opinionated, but it is not dogmatic. I want fast tests, but I also want failures that mean something.

The question I ask before writing any Playwright test

Before I touch the test code, I ask one practical question:

What bug would I miss if I replaced this dependency with a fake?

If the answer is, “I would miss an integration bug that has happened before or is likely to happen,” I lean toward using the real service, or a very thin fake at a lower layer.

If the answer is, “I only need to verify that my UI reacts to a specific response shape,” I usually stub the network response.

If the answer is, “I need to isolate the UI from an unstable or expensive dependency, but I still want the same contract,” I reach for a mock or a controlled stub, depending on how much assertion I need on the dependency itself.

That sounds abstract, so I keep a simple mental model:

Mock when I care about interaction, call count, arguments, or behavior at the boundary.
Stub when I care about returning a specific response and do not want the dependency behavior to matter.
Real service when the integration itself is the thing under test.

Playwright gives me enough control over routing, request interception, and browser context isolation to implement all three approaches without turning the suite into a pile of brittle hacks. The official docs are a good baseline if you want the supported primitives, especially around Playwright’s test framework and browser automation model.

My decision framework, from highest value to lowest isolation

I think about test boundaries as a spectrum.

1. Use the real service when the contract is the risk

I keep real services in tests when the scenario depends on actual behavior that would be painful to recreate accurately. A few examples:

authentication redirects
payment or pricing calculations
search indexing behavior
feature flags affecting server-side rendering
email delivery triggers, if the app shows different UI states based on send success or failure
eventual consistency after a write

If I mock these too aggressively, I can ship a test suite that stays green while production breaks in ways the tests could have caught.

A classic example is an onboarding flow that depends on a backend creating a user profile, then reading it back. If I stub the read call and hardcode the profile response, I am not verifying persistence, serialization, or server-side defaults. That may be fine for a UI-only smoke test, but it is not enough for confidence in the flow.

Here is how I usually structure a real-service E2E in Playwright:

import { test, expect } from '@playwright/test';

test('user can complete onboarding', async ({ page }) => {
  await page.goto('https://app.example.com');
  await page.getByLabel('Email').fill(process.env.TEST_USER_EMAIL!);
  await page.getByRole('button', { name: 'Start onboarding' }).click();
  await expect(page.getByText('Welcome to your dashboard')).toBeVisible();
});

That test is intentionally sparse. The value is in the real boundary, not in excessive assertions.

2. Stub when the data matters, not the dependency behavior

Stubbing is my default for most Playwright UI tests that sit above the component level but below true end-to-end coverage.

I stub when I want to control a specific response, usually through page.route(), and I do not care about the internals of the API or backend service. For example:

rendering an empty state
showing validation errors from an API
testing pagination and filtering UI states
verifying a loading spinner disappears when data arrives
simulating a 500 response or timeout

This gives me deterministic tests without making the app dependent on live data or an external environment.

A simple route stub looks like this:

import { test, expect } from '@playwright/test';

test('shows empty state when no results are returned', async ({ page }) => {
  await page.route('**/api/search?q=*', async route => {
    await route.fulfill({
      status: 200,
      contentType: 'application/json',
      body: JSON.stringify({ results: [] }),
    });
  });

await page.goto(‘http://localhost:3000/search’); await page.getByLabel(‘Search’).fill(‘unlikely-term’); await page.getByRole(‘button’, { name: ‘Search’ }).click();

await expect(page.getByText(‘No results found’)).toBeVisible(); });

This is not a mock in the strict interaction-testing sense. It is a stubbed response. That distinction matters because I am not asserting that the app called the endpoint exactly once, only that the UI handles the returned data correctly.

I prefer stubs over complex mocks when the only thing I need is a stable input to drive the UI.

3. Mock when the interaction itself is part of the contract

I mock when the behavior I care about includes how the app talks to the dependency. In browser-based tests, that often means asserting things like:

the right endpoint was called
the request included the expected payload
a third-party call only happens after a user action
retry logic was invoked
a telemetry event was emitted at the right moment

In Playwright, I rarely use “mock” in the same way I would in a unit test framework. Instead, I inspect network requests, intercept them, and assert on the observable effect.

For example, if a form submission should send a specific payload, I might do this:

import { test, expect } from '@playwright/test';

test('submits the correct payload', async ({ page }) => {
  let body: any;

await page.route(‘**/api/profile’, async route => { const request = route.request(); body = request.postDataJSON(); await route.fulfill({ status: 200, body: JSON.stringify({ ok: true }) }); });

await page.goto(‘http://localhost:3000/profile’); await page.getByLabel(‘Display name’).fill(‘Ada’); await page.getByRole(‘button’, { name: ‘Save’ }).click();

expect(body).toMatchObject({ displayName: ‘Ada’ }); });

I use this sparingly. The more I assert on call mechanics, the more my test becomes sensitive to implementation details. That is fine if the interaction is the point. It is a bad tradeoff if I am only trying to verify user-visible behavior.

The biggest mistake I see, over-mocking the boundary

The most common failure mode in E2E test strategy is not under-mocking. It is over-mocking.

Over-mocking happens when the suite no longer exercises the thing that actually breaks in production. I see it when teams do all of the following:

stub every API endpoint
replace every third-party integration
bypass authentication with a fake cookie
hardcode deterministic dates, currency formats, and permissions
ignore caching, retries, and network errors entirely

The tests become very fast, but they stop protecting the product from integration bugs.

For example, I once saw a checkout test that always returned a successful payment response, regardless of card type, coupon state, or tax jurisdiction. The test passed for months. The app still had a bug in the transition from payment success to order confirmation, but the mocked response never exercised the real post-payment redirect and server-side state update. The suite was green, the user path was broken.

The lesson I take from that is simple: if the dependency changes user-visible state across multiple requests, I want at least one test that uses the real path.

My practical boundary rules

I use a few rules to keep myself honest.

Rule 1, mock the unstable thing, not the important thing

If a service is flaky, slow, rate limited, or expensive, I am more willing to stub it. But I ask whether it is also the important thing.

A payment provider may be unstable in test environments, but it is also high risk. So I usually do not fully mock it away. Instead, I split coverage:

a mocked or stubbed UI test for form behavior
a service-level test or sandbox test for the payment integration
one or two true end-to-end tests that cover the real checkout path in a controlled environment

This layered approach is closer to software testing as a strategy than a single-layer “E2E solves everything” mindset. If you want the broader definition of software testing, this is exactly the kind of boundary thinking it refers to.

Rule 2, keep one test close to reality for every critical journey

For login, signup, checkout, password reset, file upload, and any journey that crosses multiple systems, I want at least one real integration test in the suite.

Not 20. One or a few are usually enough, as long as they are reliable and representative.

These tests tend to be slower and more brittle than stubs, so I keep them focused:

do not assert every UI element
avoid unnecessary waits
seed known data
use stable test accounts
isolate the environment as much as possible

Rule 3, stub the side effects you do not want to pay for repeatedly

If a flow triggers analytics, notifications, or non-essential side effects, I usually stub those out. I do not need a real analytics vendor to tell me the button click happened if my goal is to test the form submit path.

This is one place where test doubles in browser tests help a lot. The browser test should cover the user experience, not become a vendor integration test by accident.

Rule 4, do not let CI topology decide test strategy for you

A lot of teams pick mocks because their CI environment is weak, not because the test boundary is correct. I get it, CI can be painful. But if the pipeline is forcing you to mock every backend dependency, the real fix may be improving test environments, not narrowing your tests forever.

CI is part of the testing architecture, not a separate concern. Continuous integration, by definition, is about integrating changes frequently so problems surface early. If you want a basic reference, continuous integration is a useful starting point.

A simple framework I use for each dependency

When I am unsure, I ask five questions.

1. What is the risk if this dependency is wrong?

If the risk is high, I move toward real service coverage.

2. Do I need the dependency’s behavior, or only its output?

If I only need output, stub it. If I need behavior, mock or integrate.

3. Is the dependency deterministic enough for CI?

If it introduces noise, but the integration is important, I look for a dedicated environment or a fake backend that is closer to reality than a static stub.

4. Will a fake hide contract drift?

This matters a lot when the frontend and backend evolve independently. A stub can keep the test green while the real API changes shape.

5. Can I verify the same risk at a lower level more cheaply?

Sometimes the browser test does not need to carry all the weight. I may cover one branch in Playwright and the rest in API or contract tests.

That last point is important. I do not use Playwright as a substitute for all testing. I use it to validate user journeys, not to retest every function in the stack.

What I usually mock, what I usually stub, and what I usually keep real

Here is my default bias.

I usually stub:

read-only API responses for list pages
error states and validation responses
feature-flag variations that do not need real flag infrastructure
time-sensitive UI states, when the date itself is not the risk
third-party widgets that are not central to the user journey

I usually mock:

analytics events
background jobs kicked off by UI actions, when I only need to know the trigger happened
email or webhook calls, when the email delivery itself is not under test
internal calls where I want to assert request shape

I usually keep real:

auth flows
core CRUD flows that depend on server state
checkout and billing paths
file uploads and processing
permission checks
anything where serialization, caching, or persistence bugs are likely

Real-world scenarios and how I choose

Scenario: a search page with filters

For search results, I usually stub the API. I care that filters update the query string, the loading state appears, and the empty state renders correctly.

I do not need the real search engine in every test. I do need a separate test somewhere that the backend search API returns the expected schema, but that belongs closer to the service layer.

I keep this close to real, because auth is one of the easiest places for browser tests to lie to you. Cookies, redirects, CSRF, session expiry, and cross-origin behavior are all areas where stubs can become dangerously fake.

If I must bypass auth in some tests, I do it deliberately and label those tests as non-authenticated setup helpers, not as a substitute for true login coverage.

Scenario: payments

I rarely fully mock payments in the most important E2E tests. I might stub the payment gateway for edge cases, but I want at least one path that proves the app talks to the provider correctly in the environment I trust.

Scenario: uploads

Uploads are tricky because they involve browser APIs, network transfer, server validation, and often asynchronous processing. I may stub the processing callback in a UI test, but I keep one path real so I can catch failures in multipart handling, MIME detection, or storage permissions.

How I keep Playwright tests maintainable when I do mock or stub

The more I intercept, the easier it is to create a mess. So I use a few habits.

Use route helpers, not inline interception everywhere

If I find myself repeating page.route() logic in many tests, I extract a helper.

import { Page } from '@playwright/test';

export async function mockUserProfile(page: Page, name = ‘Ada’) { await page.route(‘**/api/me’, route => route.fulfill({ status: 200, contentType: ‘application/json’, body: JSON.stringify({ name }), }) ); }

This keeps the test readable and makes the fake easier to update when the API contract changes.

Keep fakes realistic

A fake response should look like something the real service could return. If the backend returns nested objects, pagination metadata, and error codes, my stub should preserve those shapes unless the point of the test is explicitly to cover malformed input.

Make the test name reflect the boundary

I like names that tell future me what is real and what is not.

user can submit profile form with mocked save response
user can complete checkout against sandbox payment service
search results page renders empty state from stubbed API

That small bit of specificity prevents confusion during debugging.

How I think about flaky tests in this decision

A lot of flaky tests are not flaky because Playwright is bad. They are flaky because the boundary is wrong.

If a test depends on a real service that has variable latency, that test is a candidate for isolation. But I do not instantly stub everything just because one run failed. I inspect the failure pattern.

I ask:

Was this a timing issue in the UI?
Was the API actually slow?
Did the backend return a response shape I did not account for?
Did the test wait on a DOM state that never meant the operation was complete?

Sometimes the correct fix is better waiting logic or a stronger assertion. Sometimes the correct fix is to stub the dependency. Sometimes the correct fix is to move the test boundary down or up.

This is why I do not treat mocking strategy as separate from flakiness strategy. They are connected. If a dependency keeps causing unstable test outcomes and the dependency is not the point of the test, I will isolate it. If the dependency is the point, I will stabilize the environment instead.

My default Playwright strategy in practice

If I had to summarize my approach in one sentence, it would be this:

I stub by default for UI state coverage, I mock when I need to assert interaction, and I hit real services for high-risk journeys that would hide bugs if faked.

In practice, that means I build a layered suite:

Fast, isolated browser tests with stubs for most UI behavior.
A smaller set of real E2E tests for critical paths and contract-sensitive flows.
Lower-level API or contract tests for service behavior that is too expensive to validate in the browser every time.

This mix usually gives me the best signal-to-noise ratio. It keeps the suite fast enough for CI, but still close enough to reality to catch integration bugs.

A short checklist I use before committing a Playwright test

Before I merge a new test, I run through this checklist:

Am I testing a user journey or an implementation detail?
If I fake this service, what bug could slip through?
Does this dependency need a stub, a mock, or the real thing?
Will the fake stay close to the production contract?
Is there already coverage at another layer for the thing I am skipping here?
Will this test still be useful if the backend implementation changes?

If I cannot answer those questions confidently, I usually do less, not more. A smaller test with the right boundary is better than a detailed test that proves nothing meaningful.

Closing thought

Choosing whether to mock, stub, or hit real services in Playwright is not about finding one universal rule. It is about choosing the boundary that gives you the most confidence for the least fragility.

If you remember only one thing from this article, make it this: do not let convenience turn every E2E test into a fake. Real bugs live at integration boundaries, and your test strategy should reflect that.

When I get this right, my Playwright suite does three things at once, it stays fast enough for CI, it fails for real reasons, and it catches bugs that would otherwise survive until production. That is the balance I am aiming for every time I decide whether to mock, stub, or hit real services.