How to Generate Playwright Tests with Cursor

Cursor can speed up test authoring in a very practical way. If you already have a Playwright setup, it can generate boilerplate, convert a manual test idea into code, and help you refactor selectors or structure. But if you use it as a replacement for test design, locator strategy, and reliability work, you usually end up with a bigger pile of AI generated Playwright code that still flakes in CI.

That is the real story behind how to generate Playwright tests with Cursor. It is a good accelerator, not a finished automation strategy. In this article, I will show how I use Cursor Test automation workflows, where it helps, where it fails, and what still needs a human SDET, developer, or QA engineer to own.

What Cursor is good at in Playwright work

Cursor is strongest when you already know what test you want and you need to turn that idea into code faster. It can help with:

creating a test file from a plain-English scenario,
filling in standard Playwright scaffolding,
turning a messy test into a page-object or helper-based structure,
suggesting better assertions,
adjusting locators after a UI change,
converting repetitive actions into reusable functions.

The value is not magic. It is leverage. If your team already understands Playwright, Cursor can reduce the time spent on syntax and boilerplate so you can focus on behavior and failure modes.

Cursor is useful when the problem is, “write this faster,” not when the problem is, “design a maintainable test system.”

That distinction matters. Playwright is still a code framework, which means you own the test architecture, runtime configuration, browser strategy, CI integration, and maintenance model. Cursor can assist with those tasks, but it does not remove them.

Before you let Cursor generate anything, set up the basics

If you start with a vague prompt and no project structure, Cursor will often produce plausible code that does not match your repo conventions. Before generating tests, I recommend making sure these basics are already in place:

Playwright is installed and runs locally,
the test folder structure is clear,
base URL and environment configuration are defined,
you know your selector strategy,
CI can run the current suite without manual steps,
you have a decision on whether to use page objects, fixtures, or a lightweight helper model.

A simple Playwright setup might look like this:

import { defineConfig } from '@playwright/test';

export default defineConfig({ use: { baseURL: ‘https://example.com’, trace: ‘on-first-retry’ } });

That configuration is not flashy, but it makes Cursor-generated tests much easier to validate. AI generated Playwright code works better when the project already tells it what good looks like.

A practical prompt for generating a test

The best prompts are specific and bounded. Do not ask Cursor to “write e2e tests for my app”. Ask for one user flow, one test file, one set of assertions, and one known structure.

For example:

Generate a Playwright test for the login flow. Use TypeScript, keep selectors resilient, and assert that the dashboard heading is visible after sign-in. If possible, keep the test readable and avoid long chained selectors.

From there, Cursor will usually generate a workable first draft. A typical result might resemble this:

import { test, expect } from '@playwright/test';

test('user can log in', async ({ page }) => {
  await page.goto('/login');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('Password123!');
  await page.getByRole('button', { name: 'Sign in' }).click();
  await expect(page.getByRole('heading', { name: 'Dashboard' })).toBeVisible();
});

This is the kind of output that makes teams think the hard part is solved. It is not. The code is only as good as the assumptions embedded in it.

What to review immediately in AI generated Playwright code

When Cursor writes a Playwright test, I review it in this order:

1. Selector quality

The most important question is whether the locators reflect the product, or whether they reflect incidental HTML structure.

Good signs:

getByRole, getByLabel, getByPlaceholder, getByText with context,
test ids used intentionally and consistently,
selectors tied to user-visible semantics.

Bad signs:

long CSS chains,
brittle nth-child usage,
XPath copied from DOM inspection,
selectors based on classes that change during redesigns.

If Cursor gives you this:

typescript

await page.locator('div.container > div:nth-child(2) > button').click();

rewrite it. Do not let generated code normalize brittle patterns.

2. Assertion quality

Many generated tests only assert the final page loaded. That is not enough for many workflows.

A good test checks that the user intent completed, not just that navigation happened. For a checkout flow, maybe the order confirmation number is visible. For an account settings flow, maybe the change is reflected after reload.

3. Data dependence

Cursor may invent data values, login credentials, or preconditions. Replace those with environment variables, seeded test accounts, or API setup steps.

4. Wait strategy

Generated tests often depend on default auto-waiting alone. Playwright does a lot here, but AI still sometimes adds unnecessary waitForTimeout calls or assumes timing that does not exist.

Avoid this unless you truly need it:

typescript

await page.waitForTimeout(3000);

Prefer waiting on a state that matters:

typescript

await expect(page.getByRole('heading', { name: 'Dashboard' })).toBeVisible();

5. Test boundaries

One generated test can accidentally try to cover too much. Split flows if needed. A login test should not also validate profile editing, billing, and logout in one giant scenario.

A better workflow for Cursor Playwright tests

The most effective way to generate Playwright tests with Cursor is to use it in a loop, not as a one-shot writer.

Step 1: Capture the scenario clearly

Write the scenario in plain English with the exact user intent, preconditions, and expected outcome.

Example:

user has a valid account,
user opens the login page,
user logs in with valid credentials,
dashboard appears,
logout button becomes visible.

Step 2: Let Cursor draft the test

Ask it to generate the test with your preferred patterns, such as TypeScript, getByRole, or fixture-based setup.

Step 3: Refactor for maintainability

Move repeated actions into helpers or page objects if your codebase uses them. If your team prefers simpler tests, keep it flat but consistent.

Step 4: Run locally and inspect failures

The first execution is the real review. Cursor can be surprisingly decent at code shape, but execution exposes reality quickly.

Step 5: Teach Cursor with context

Point it at existing files. Ask it to follow the patterns used in the repo. If your suite uses fixtures, custom test data builders, or auth state reuse, include those examples.

Cursor improves dramatically when it can imitate your existing structure instead of inventing one.

How to turn a manual test into Playwright faster

This is one of the best use cases. If a QA engineer has a manual test case, Cursor can convert it into code if you give it the right ingredients.

For example, a manual case like this:

open product page,
add item to cart,
open cart,
verify item and quantity,
proceed to checkout.

You can prompt Cursor to generate a Playwright version using stable locators and clear assertions.

A reasonable test might look like this:

import { test, expect } from '@playwright/test';

test('customer can add product to cart', async ({ page }) => {
  await page.goto('/products/widget');
  await page.getByRole('button', { name: 'Add to cart' }).click();
  await page.getByRole('link', { name: 'Cart' }).click();
  await expect(page.getByText('Widget')).toBeVisible();
  await expect(page.getByText('Quantity 1')).toBeVisible();
});

Then you need to ask a harder question, is this test worth keeping as an E2E test, or should part of it move to an API or integration layer?

That is where human judgment still matters more than the code generator.

Where Cursor tends to make mistakes

Cursor is helpful, but there are predictable failure modes.

It overfits to the visible DOM

Generated selectors often track the page structure you happen to have today. That is a future maintenance problem.

It assumes happy paths

It will often generate only successful scenarios. In real suites, you also need:

invalid password handling,
disabled states,
duplicate submission prevention,
auth expiration,
empty state behavior,
permission-based UI differences.

It ignores test data cleanup

If a test creates records, AI may not add cleanup or isolation. That can cause hidden coupling between tests.

It misses environment differences

Local, staging, preview, and CI often behave differently. Cursor will not automatically know which app flags, seeded accounts, or API mock layers your organization uses.

It can produce over-engineered abstractions

Sometimes Cursor aggressively introduces page objects or helper functions even for a tiny test suite. That can slow teams down more than it helps.

Manual engineering work that still cannot be skipped

This is the part people underestimate. Generated tests are code, and code needs design.

1. Test architecture

You still need to decide:

page objects or not,
feature-based folders or page-based folders,
shared auth state or isolated logins,
when to mock and when to hit the UI.

2. Reliability standards

Your team must define what a passing test means, which failures are acceptable, and how retries work in CI.

3. Selector governance

Someone needs to own locator conventions. If every engineer and every AI prompt invents selectors differently, the suite gets brittle fast.

4. CI integration

Playwright test generation is not the same as a maintainable pipeline. You still need browser install strategy, artifact collection, trace upload, parallelization, and failure triage.

5. Review culture

If Cursor is writing a lot of tests, you need code review standards for test code. Otherwise the suite silently accumulates low-value checks.

A simple pattern that works well with Cursor

For many teams, the best compromise is a lightweight structure with one helper module and one test file per flow.

// helpers/auth.ts
import { Page } from '@playwright/test';

export async function login(page: Page, email: string, password: string) { await page.goto(‘/login’); await page.getByLabel(‘Email’).fill(email); await page.getByLabel(‘Password’).fill(password); await page.getByRole(‘button’, { name: ‘Sign in’ }).click(); }

Then a test stays readable:

import { test, expect } from '@playwright/test';
import { login } from '../helpers/auth';

test('user sees dashboard after login', async ({ page }) => {
  await login(page, 'user@example.com', 'Password123!');
  await expect(page.getByRole('heading', { name: 'Dashboard' })).toBeVisible();
});

Cursor is pretty good at generating this kind of structure once you show it the pattern. The key is to keep abstractions simple and obvious.

CI/CD concerns you should not ignore

Playwright is often introduced in local development, but its real value is in Continuous integration. Continuous integration, in the classic sense, is about keeping changes verifiable as code moves through the pipeline, and test automation is part of that discipline.

A basic GitHub Actions example might look like this:

name: playwright

on: [push, pull_request]

jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: 20 - run: npm ci - run: npx playwright install –with-deps - run: npx playwright test

Cursor can generate this too, but the real engineering questions are:

how do you cache browsers,
do you shard tests,
how do you store traces,
what is the retry policy,
which failures should block merges,
who investigates flakes.

If you do not answer those questions, AI generated Playwright code just gives you more tests to maintain, not more confidence.

When Cursor is a good fit, and when it is not

Cursor works well when:

the team already knows Playwright,
the app has stable semantics and test IDs,
the scope is small to medium,
you want faster implementation of known test cases,
you plan to review and refactor generated code.

Cursor is a weaker fit when:

the team wants low maintenance rather than code ownership,
QA does not want to manage a TypeScript or Python framework,
the test suite is growing faster than the team can support it,
browser infrastructure and CI are already causing operational drag.

That last category is important. If your problem is not writing tests, but operating them, a code-first approach may not be the best answer.

Where Endtest fits if you want less code to own

If your real goal is test automation results without expanding a Playwright codebase, Endtest is worth a look. It takes a different approach, using an agentic AI loop and a managed platform so teams can create and maintain tests without owning the full framework stack.

That matters for teams where the bottleneck is not authoring one more test, but keeping the whole system healthy. In contrast to Playwright, which is a library you still have to wire into a runner, reporters, CI, browser management, and maintenance routines, Endtest is built as a managed platform with editable platform-native steps. Endtest also positions itself as a Playwright alternative for teams that want broader collaboration, less infrastructure to own, and a lower-code workflow.

If you are a CTO or engineering manager deciding between “generate more Playwright tests with Cursor” and “reduce the code we have to maintain,” that is the real tradeoff.

A decision framework I use

Here is the practical filter I use when deciding whether to generate Playwright tests with Cursor or choose a different approach.

Use Cursor if:

you already have a Playwright stack,
your developers are comfortable reviewing test code,
you want to accelerate authoring, not replace ownership,
your suite is still small enough to keep tidy,
you can enforce locator and structure standards.

Prefer a more managed approach if:

you want QA, product, or design to contribute directly,
you do not want to maintain a growing codebase,
browser automation infra is wasting engineering time,
your team values fast creation plus low maintenance,
you would rather spend effort on coverage strategy than framework upkeep.

Common mistakes I see with Cursor test automation

A few patterns show up again and again:

Generating too many tests at once. Start with one flow and validate the pattern.
Accepting the first draft. First draft code is usually a starting point, not production quality.
Using flaky selectors because they are easy. Easy now means expensive later.
Skipping test design. AI cannot decide what your product guarantees should be.
Treating generated code as a no-maintenance asset. Every test has a lifecycle.

Final thoughts

To generate Playwright tests with Cursor is genuinely useful, especially if you already think like an SDET. It can shorten the time from scenario to runnable code, speed up refactors, and help developers produce better test coverage with less typing.

But the value stops where engineering discipline starts. Cursor does not own your test architecture, your locators, your CI stability, or your long-term maintenance cost. It can help write the code, but it cannot decide whether the code is the right strategy for your team.

If you want to move fast and you are prepared to own Playwright like any other codebase, Cursor is a strong assistant. If you want dependable testing outcomes with less framework ownership, a managed platform such as Endtest may be the better path.

For teams comparing options, I would start with the question, do we want to generate more Playwright code, or do we want to reduce how much test code we have to carry forward?