If you have ever pasted a user flow into ChatGPT and asked it to generate Playwright tests, you already know the appeal. You get a runnable-looking test file in seconds, often with selectors, assertions, and even a basic project structure. For a quick prototype, that can be genuinely useful.

The catch is that generated tests are not the same thing as maintainable Test automation. ChatGPT can help you move faster, but it does not understand your app’s stability constraints, your CI environment, or the cost of brittle locators in a real test suite. I use generated code as a starting point, then I review it like any other code artifact, because that is exactly what it is.

In this article, I will show a practical way to generate Playwright tests with ChatGPT, the prompts that work best, the mistakes to look for, and the situations where AI-generated test code becomes more work than it saves. I will also explain why, when the real goal is reliable automation rather than more generated code to debug, a managed workflow like Endtest can be the better choice.

What ChatGPT is good at, and what it is not

ChatGPT is useful for four things in test automation:

  1. Turning a user journey into test steps.
  2. Drafting repetitive boilerplate.
  3. Suggesting locator strategies.
  4. Providing a first pass at assertion logic.

It is not good at knowing whether those steps are stable in your application. It cannot inspect your DOM live. It does not know which selectors are dynamic, which elements are behind feature flags, or which parts of the app need special waits because of animation, network activity, or client-side hydration.

That distinction matters. Playwright itself is excellent, and its locator and auto-waiting model is one reason many teams adopt it. The official docs are worth reading if you are new to the framework: Playwright introduction. But even a good framework can produce bad tests if the generated code is careless.

AI can draft the test. It cannot validate the test strategy.

The best prompt structure for generating Playwright tests

If you ask ChatGPT, “Write me a Playwright test,” you usually get a generic example with made-up selectors. That is not very useful. The model needs context. The more concrete your prompt, the more usable the output.

A good prompt should include:

  • The application type and critical user flow
  • The framework and language, usually Playwright with TypeScript
  • The browser targets
  • The expected test data
  • The relevant selectors or page structure
  • Any known asynchronous behavior
  • The style of assertions you want

Here is a prompt pattern I actually recommend:

Generate a Playwright test in TypeScript for this flow:
1. Open the login page.
2. Log in with valid credentials.
3. Verify the dashboard loads.
4. Confirm the user name appears in the header.

Use Playwright test runner syntax. Use stable locators first, prefer role and label selectors. Assume the app uses data-testid only for the main form fields. Do not invent APIs or helper functions that are not shown. Include only the test file, no explanation.

That prompt works because it constrains the model. It tells ChatGPT to avoid creative invention, and creativity is usually the enemy of reliable tests.

A simple example of ChatGPT-generated Playwright code

Suppose I ask ChatGPT to generate a login test. The result might look like this:

import { test, expect } from '@playwright/test';
test('user can log in', async ({ page }) => {
  await page.goto('https://example-app.com/login');

await page.fill(‘#email’, ‘user@example.com’); await page.fill(‘#password’, ‘secret123’); await page.click(‘button[type=”submit”]’);

await expect(page.locator(‘h1’)).toHaveText(‘Dashboard’); });

This is a decent starting point, but it is not production-ready yet.

Why not?

  • #email and #password might not be stable selectors.
  • button[type="submit"] can break if the form changes.
  • h1 is too generic if the page has multiple headings.
  • There is no assertion that the user is actually authenticated beyond a single heading.
  • There is no handling for redirects, loading states, or error cases.

The code is syntactically plausible, which is exactly why teams sometimes trust it too quickly.

How I review AI generated Playwright code

When I review AI generated Playwright code, I look at it in the same way I review human-written automation. I ask four questions:

1. Are the selectors stable?

Prefer locators that reflect user intent, not implementation detail.

Good candidates:

  • getByRole('button', { name: 'Sign in' })
  • getByLabel('Email')
  • getByTestId('login-submit')

Riskier candidates:

  • .page > div:nth-child(2) > button
  • #root > main > section > div > div:nth-of-type(3)
  • generic CSS classes that come from a CSS framework and may change

A better version of the earlier test might be:

import { test, expect } from '@playwright/test';
test('user can log in', async ({ page }) => {
  await page.goto('/login');

await page.getByLabel(‘Email’).fill(‘user@example.com’); await page.getByLabel(‘Password’).fill(‘secret123’); await page.getByRole(‘button’, { name: ‘Sign in’ }).click();

await expect(page.getByRole(‘heading’, { name: ‘Dashboard’ })).toBeVisible(); await expect(page.getByText(‘user@example.com’)).toBeVisible(); });

2. Does it wait for the right thing?

One of the most common mistakes in generated tests is unnecessary sleep-based waiting. If ChatGPT suggests waitForTimeout, I usually delete it unless it is part of a debugging step.

Avoid this:

typescript

await page.waitForTimeout(3000);

Prefer assertions and state-based waits:

typescript

await expect(page.getByRole('heading', { name: 'Dashboard' })).toBeVisible();

Playwright’s auto-waiting is strong, but it still depends on good locators and realistic assertions. If the app uses XHR-heavy navigation, you may need to wait for specific UI state or a network response, not arbitrary time.

3. Is the test scoped correctly?

ChatGPT sometimes turns one user journey into one giant test. That is fine for a quick example, but not ideal for a suite.

A test should usually verify one business behavior. If the generated code logs in, updates a profile, adds an item to the cart, and checks out all in one file, split it.

4. Does it assume too much?

Models love to invent helper functions, page objects, fixtures, or selectors that were never provided. If you have not already standardized on a test architecture, you can end up with generated code that is internally inconsistent.

That is why I like to give ChatGPT a known pattern, then ask it to fit inside it.

A better workflow: prompt, generate, then constrain

The most effective way to generate Playwright tests with ChatGPT is not to ask for a perfect full suite. It is to ask for one narrow slice of the test, then refine it.

I use a three-step workflow:

Step 1, generate the skeleton

Start with the user flow and a page structure.

text Write a Playwright test for the checkout flow. The app has these accessible labels:

  • Email
  • Password
  • Cart
  • Checkout
  • Place order After successful checkout, the confirmation page shows a heading called Order confirmed. Use TypeScript and Playwright test runner syntax.

Step 2, enforce your team’s conventions

Ask ChatGPT to adapt the result to your standards.

text Revise the test to use only getByRole, getByLabel, and getByTestId locators. Remove any hard-coded sleep or arbitrary timeout. Keep the test under 40 lines if possible.

Step 3, add reliability checks

Ask for error handling, setup, or preconditions.

text Add assertions that verify the user is logged in before checkout starts. Assume the app may redirect after login. Wait for a cart badge count to appear before clicking Checkout.

This workflow works better than trying to get ChatGPT to generate everything in one shot, because each pass can remove a class of mistakes.

Common failure modes in ChatGPT Playwright tests

There are a few recurring problems I see in ChatGPT Playwright tests.

Invented selectors

The model may generate selectors for elements that do not exist. This happens more often when the prompt is vague.

Fragile CSS paths

AI often falls back to long CSS chains, especially if you do not tell it to prioritize accessible locators.

Overuse of page.locator

page.locator('div') is not a strategy. It is a placeholder.

Bad assumptions about auth

The model may assume a static username/password flow, even when your app uses SSO, MFA, magic links, or an API-backed test login.

Missing test data setup

Generated tests often start by navigating to the UI without considering fixtures, database state, or API seeding.

Ignoring CI realities

A test that passes locally can fail in CI because of viewport size, slow network, authentication state, or browser differences. ChatGPT does not know your pipeline.

If your team runs tests in Continuous integration, remember that CI is not just a machine that runs code, it is part of the test environment. If the environment changes, the test changes.

How to make prompts more specific

Here are a few prompt upgrades that make a real difference.

Give the DOM hints, not just the journey

Instead of this:

text Write a test for the signup flow.

Try this:

text Write a Playwright test for the signup flow. The form has labels for Full name, Email, and Password. The submit button is named Create account. After success, the page shows a heading Welcome aboard. Use accessible locators only.

Tell it what not to do

text Do not use waitForTimeout. Do not use xpath. Do not invent page object classes. Do not introduce helpers that are not needed.

Ask for edge cases separately

Do not ask for happy path and validation errors in the same prompt if you want clean output. Generate separate tests:

  • happy path signup
  • invalid email validation
  • duplicate email error
  • password policy error

That leads to smaller, more readable files.

Where ChatGPT helps a lot in test automation

I do not want to make it sound like ChatGPT is not useful. It is useful, especially for first drafts.

It can help you:

  • Translate manual test cases into executable steps
  • Draft fixture setup and teardown
  • Suggest assertion patterns
  • Convert an old Selenium idea into Playwright syntax
  • Generate data-driven test cases
  • Refactor repetitive steps into helper functions

For example, if you already have a Selenium test in Python, ChatGPT can often help you map the idea into Playwright TypeScript. But you still need to review the result, because framework translation is not the same as system design.

If you want a broader framework comparison, I also recommend reading Playwright vs Selenium in 2026.

Why generated code often becomes maintenance debt

The biggest problem with prompt-generated test code is not that it is wrong on day one. It is that it ages poorly.

Here is what maintenance debt looks like in practice:

  • Locators tied to layout instead of behavior
  • Repeated login steps in every file
  • Inconsistent naming across tests
  • Overly long tests with multiple responsibilities
  • Fake waits that mask timing issues until CI starts failing
  • Generated helpers that nobody on the team understands

Once that happens, you do not just maintain your application, you maintain the prompts, the generated code, and the assumptions that produced the code.

That is why I often say AI-generated tests are cheap to create and expensive to own.

If a test suite is hard to explain, it is usually hard to trust.

When ChatGPT is the right tool

Use ChatGPT for Playwright when you need speed, a prototype, or a learning aid.

Good use cases:

  • You are exploring Playwright for the first time.
  • You need a quick example for a new page flow.
  • You already have a test architecture and want to fill in repetitive code.
  • You are converting a manual scenario into a draft automated test.
  • You want help rewriting a selector or assertion pattern.

In those cases, the model can save time.

When you should stop generating code and change the workflow

If your team spends more time repairing generated tests than getting value from them, the problem is probably not the prompt. The problem is the workflow.

You should reconsider AI-generated test code when:

  • Non-developers need to contribute tests
  • The suite needs to be maintained by QA, product, and engineering together
  • You want less framework ownership, not more
  • Your tests keep breaking because of selector churn
  • You are spending CI time debugging the test harness instead of product behavior

This is where a managed platform can be a better fit than a code-first approach. If your real objective is reliable test automation, not a larger pile of generated Playwright files, a platform like Endtest’s agentic AI workflow can be the more practical path. Instead of producing more code to babysit, Endtest creates editable platform-native steps inside the product, which can be easier for the whole team to maintain.

That difference matters. Playwright gives you a powerful library and full control. Endtest gives you a full comparison point against Playwright when you want automation that is less dependent on writing and owning framework code.

A practical decision matrix

Here is how I think about it.

Choose ChatGPT plus Playwright when:

  • Your team is comfortable reviewing code
  • You already have a stable Playwright setup
  • You need custom logic or deep integration with app internals
  • You are okay owning framework maintenance

Choose a platform-first workflow when:

  • You want faster authoring with less code ownership
  • You need broader team participation
  • You care more about resilience than framework flexibility
  • You want to reduce the amount of generated code that must be debugged later

This is why I often compare prompt-generated Playwright code to a maintenance shortcut. It solves the immediate problem, but it can create a second-order problem if nobody owns the output carefully. I wrote more about that tradeoff in AI Playwright testing, shortcut or maintenance trap?.

A final example, from prompt to improved test

Let’s say your first prompt produces this:

import { test, expect } from '@playwright/test';
test('checkout', async ({ page }) => {
  await page.goto('https://shop.example.com');
  await page.click('text=Login');
  await page.fill('#email', 'user@example.com');
  await page.fill('#password', 'secret');
  await page.click('button[type="submit"]');
  await page.click('text=Cart');
  await page.click('text=Checkout');
  await expect(page.locator('h1')).toContainText('Thank you');
});

A more maintainable version might be:

import { test, expect } from '@playwright/test';
test('logged-in user can complete checkout', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('link', { name: 'Login' }).click();
  await page.getByLabel('Email').fill(process.env.E2E_EMAIL ?? 'user@example.com');
  await page.getByLabel('Password').fill(process.env.E2E_PASSWORD ?? 'secret');
  await page.getByRole('button', { name: 'Sign in' }).click();

await expect(page.getByRole(‘navigation’)).toContainText(‘Cart’); await page.getByRole(‘link’, { name: ‘Cart’ }).click(); await page.getByRole(‘button’, { name: ‘Checkout’ }).click();

await expect(page.getByRole(‘heading’, { name: ‘Order confirmed’ })).toBeVisible(); });

The second version is still not perfect, but it is much closer to something I would be willing to keep in a shared suite.

My rule of thumb

If ChatGPT helps you get from zero to a good test faster, use it.

If ChatGPT helps your team understand the flow, use it.

If ChatGPT becomes the source of brittle selectors, confusing helpers, and constant rewrites, stop using it as a code generator and reconsider the workflow.

For teams that want test automation to be reliable and collaborative, not just code-heavy, platform-based approaches can be easier to live with. That is especially true when the broader team, not only developers, needs to create and maintain tests.

Conclusion

You can absolutely generate Playwright tests with ChatGPT, and in the right context it is a useful productivity boost. The trick is to treat the result as a draft, not a finished asset. Give the model strong constraints, use accessible locators, avoid arbitrary waits, and review the output with the same skepticism you would apply to any unfamiliar test code.

But if your core problem is not writing code faster, and instead building dependable automation that the team can maintain without constantly debugging the framework, then prompt-generated code may be the wrong abstraction. In that case, a managed, agentic AI testing workflow such as Endtest can be a better fit than generating more Playwright files to own.