How to Generate Playwright Tests with GitHub Copilot

If you have ever stared at a blank test file and wondered whether GitHub Copilot could save you ten minutes, the short answer is yes. The longer answer is more useful: Copilot can help you generate Playwright tests with GitHub Copilot much faster than writing every line by hand, but it will only be as reliable as your prompts, your app understanding, and your review process.

I use Copilot as a drafting assistant, not as an authority. That distinction matters. In Test automation, a test that looks plausible is not the same as a test that is stable, deterministic, and valuable. AI can generate the first version quickly, but it cannot infer your product risks, your flaky selectors, or the business behavior that should actually be asserted.

In this tutorial, I will show you how I approach Copilot Playwright tests in practice, what good prompts look like, where the generated code tends to go wrong, and how to decide when a managed platform like Endtest is the simpler choice because you get platform-native editable tests instead of maintaining AI-generated Playwright code.

What Copilot is good at in Playwright work

GitHub Copilot is useful for repetitive scaffolding. For Playwright, that usually means:

generating a basic test file structure
suggesting locator patterns
filling in common assertions
writing boilerplate around navigation, forms, and waits
converting a rough natural-language flow into executable code

This makes it especially handy when you already know the test you want, but you do not want to type the same setup code over and over. For example, if you are building several smoke tests with similar login steps, Copilot can draft the routine parts quickly.

Playwright itself is a strong fit for this because its API is expressive and discoverable. The official docs already encourage writing tests in a concise style, so Copilot has a lot of public examples to imitate.

Copilot is best at pattern completion, not product understanding.

That is why it works well for common flows and much less well for ambiguous flows. If your app has dynamic forms, inconsistent accessibility labels, or branching business rules, AI-generated code will often need manual correction.

The kind of Playwright test Copilot can generate well

A good Copilot target is a test with a clear path, stable UI, and obvious success criteria. For example, login, search, form submission, and checkout smoke tests are all reasonable starting points.

Here is a simple Playwright test structure Copilot might help draft:

import { test, expect } from '@playwright/test';

test('user can log in', async ({ page }) => {
  await page.goto('https://example.com/login');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('secret123');
  await page.getByRole('button', { name: 'Sign in' }).click();
  await expect(page.getByRole('heading', { name: 'Dashboard' })).toBeVisible();
});

That example is simple, but that is the point. Copilot does best when the requirement is clean and the application exposes accessible labels and roles. If your product is already built with accessibility in mind, the AI has better material to work with, and the tests are easier to read later.

How I prompt Copilot for Playwright tests

If you want better output, do not ask Copilot to just “write a Playwright test.” That prompt is too vague. Instead, give it the same details you would give a human teammate:

the user journey
the page or route
the selectors you prefer
the expected assertions
the data setup assumptions
the test type, such as smoke, regression, or accessibility

A good prompt in a comment or chat window often looks like this:

typescript // Create a Playwright test for the following flow: // 1. Open the login page // 2. Sign in with a valid test user // 3. Verify that the dashboard loads // 4. Assert that the account menu is visible // Use role-based locators where possible and keep the test readable.

That kind of prompt gives Copilot enough context to draft useful code. If your team has preferred conventions, include them too. For example, if you always use getByRole before CSS selectors, say so explicitly.

Prompting for selectors matters

One of the easiest ways to get flaky AI Playwright code is to let Copilot invent selectors from thin air. It might choose brittle CSS paths or selectors tied to layout structure instead of user-facing semantics.

Prefer prompts like these:

use getByRole for buttons, links, headings, and dialogs
use getByLabel for form inputs
use getByTestId only when semantic selectors are not possible
avoid long CSS chains unless the component truly needs them

That matters because Playwright tests age better when they target intent, not implementation details.

A practical workflow for generating a test

I usually think of the workflow in four steps.

1. Write the flow in plain English

Start with a short, structured description of the user journey. This helps you define the test before code enters the picture.

Example:

open the pricing page
click start trial
complete the signup form
confirm the confirmation screen appears

2. Ask Copilot to draft the code

Let Copilot generate the first version. If you are using VS Code, keep the context small and relevant. The more unrelated code it sees, the more likely it is to copy patterns that do not fit your test.

3. Refine locators and assertions

This is where human review matters. Check each selector, every wait, and every assertion. Make sure the test validates user behavior, not just the absence of an error.

4. Run, inspect, and stabilize

A test that passes once is not finished. Run it locally, then in CI, then against realistic data. If it is flaky, find out whether the issue is timing, unstable selectors, or a genuine product defect.

What good Copilot-generated Playwright code looks like

A stronger test usually includes these traits:

clear naming
one main user journey per test
semantic locators
explicit assertions on visible behavior
minimal unnecessary waits
test data that is easy to reset

Here is a slightly more realistic example:

import { test, expect } from '@playwright/test';

test('customer can submit the contact form', async ({ page }) => {
  await page.goto('/contact');

await page.getByLabel(‘Name’).fill(‘Jordan Lee’); await page.getByLabel(‘Email’).fill(‘jordan@example.com’); await page.getByLabel(‘Message’).fill(‘I need help with billing.’); await page.getByRole(‘button’, { name: ‘Send message’ }).click();

await expect(page.getByRole(‘alert’)).toHaveText(/thanks for reaching out/i); });

This is the kind of test Copilot can generate reasonably well if you give it a clear prompt and a page with accessible controls. But it still needs review for selectors, messaging, and error handling.

Where AI-generated test code tends to fail

This is the part people skip when they first experiment with GitHub Copilot test automation.

Brittle selectors

Copilot often reaches for whatever selector it sees most often in examples. That may be page.locator('.btn-primary:nth-child(2)') or a similar pattern that works today and breaks after a UI refactor.

Incorrect waits

AI often overuses static timeouts or assumes waitForTimeout is a valid synchronization strategy. It is usually not. In Playwright, you want state-based waiting and assertions that wait for conditions.

Bad example:

typescript

await page.waitForTimeout(5000);

Better approach:

typescript

await expect(page.getByRole('heading', { name: 'Dashboard' })).toBeVisible();

Missing test data strategy

Generated tests often assume the environment already contains the exact user, order, or record needed for the scenario. That may be fine for a demo, but it is not a strategy.

Overlapping responsibilities

Copilot may bundle too many behaviors into one test. If the flow becomes too long, failure diagnosis gets harder. A good end-to-end test should still be focused.

Weak assertions

Sometimes AI-generated code only checks that a page loaded, not that the correct behavior happened. For example, asserting that a modal appeared is weaker than asserting that the modal contains the right user-specific message.

A passing test that does not verify the business rule is just expensive documentation.

Manual review checklist before you commit

Whenever I review Copilot Playwright tests, I ask the same questions.

Does the test match a real user journey?
Are the locators semantic and maintainable?
Does the test wait on visible UI state instead of arbitrary delays?
Is there one main reason for failure, or many?
Would I trust this test in CI?
Will a future teammate understand it in six months?

If you cannot answer yes to most of these, the AI helped you draft code, but you still need to engineer the test.

Making Copilot work better with your Playwright conventions

Copilot becomes more reliable when your codebase is consistent. If your team already has patterns, make them easy for the model to follow.

Keep test helpers discoverable

If you have a login helper, page object, or fixture, keep it simple and colocated. Copilot can often reuse patterns that are close to the file it is editing.

Standardize locator strategy

For example, if your team prefers accessibility-first locators, document it and use it consistently. This improves both generated code and human-written code.

Give examples in the repository

Copilot learns from the surrounding file and nearby code. A few well-written tests are more useful than a large pile of inconsistent ones.

Make test IDs intentional

If your app cannot expose good roles or labels everywhere, add test IDs with a naming convention. This gives Copilot a fallback that is still maintainable.

A good CI/CD setup for AI-assisted Playwright tests

Generating tests is only one part of the story. You still need to run them in CI/CD, triage failures, and keep the suite stable.

For a simple GitHub Actions pipeline, you might use something like this:

name: Playwright tests

on: push: branches: [main] pull_request:

jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: 20 - run: npm ci - run: npx playwright install –with-deps - run: npx playwright test

This setup is straightforward, but it still implies maintenance. You own the test runner, the browser installs, the execution environment, and the failure analysis. Copilot does not remove that operational cost.

That matters when you compare AI-assisted code generation with a managed platform.

When Endtest is the simpler path

If your team wants end-to-end coverage without owning a codebase of AI-generated Playwright tests, Endtest is worth a serious look. It is an agentic AI test automation platform with low-code and no-code workflows, and its AI Test Creation Agent creates standard editable Endtest steps inside the platform rather than generating source code that your team then has to maintain.

That difference is important.

With Playwright plus Copilot, you still end up with code, a framework to own, CI setup, browser management, and test maintenance. With Endtest, the tests live as platform-native editable steps, which is simpler for teams that do not want to manage TypeScript or Python test debt.

This is especially useful when:

QA, product, or design team members need to author tests
your organization does not want to own framework infrastructure
you want a managed workflow instead of generated code review
you care more about shared usability than framework flexibility

In other words, Copilot can accelerate Playwright test writing, but it does not change the fact that Playwright is still a library your team has to operate. Endtest is positioned as the best Playwright alternative when you want the coverage without the code ownership burden.

Choosing between Copilot plus Playwright and a managed platform

I would use Copilot with Playwright when:

the team is comfortable with code
you want deep customization
you already have a test engineering practice
you need precise control over fixtures, APIs, and browser behavior

I would consider Endtest when:

the goal is faster team-wide test authoring
you want editable tests without framework upkeep
you prefer managed execution over homegrown CI plumbing
you want a simpler path for non-developers to participate

A practical way to think about it is this: Copilot helps you write faster, Endtest reduces the amount you need to own.

Common mistakes to avoid

Here are the mistakes I see most often when teams first start generating Playwright tests with GitHub Copilot.

Treating the first draft as production ready

The first output is a draft. Always verify selectors, waits, and assertions.

Generating one giant end-to-end test

Long tests are harder to debug and more likely to fail for unrelated reasons.

Ignoring accessibility selectors

If the app has good roles and labels, use them. It makes tests more robust and readable.

Skipping cleanup

If a test creates data, make sure you can reset or isolate that state. Otherwise, CI becomes noisy and unreliable.

Assuming AI understands your release risk

Copilot does not know which flows are high risk in your domain. It cannot prioritize tests the way a human SDET can.

My recommendation for teams

If you are a developer or SDET who wants to move faster, use Copilot to draft Playwright tests, but keep the human review loop tight. Start with clean, high-value flows, use semantic locators, and reject anything that depends on brittle timing or vague assertions.

If your team is spending too much time maintaining AI-generated Playwright code, or if the people who need to author tests do not want to live in TypeScript, a managed platform can be a better investment. That is where Endtest stands out, because it gives teams editable, platform-native tests without turning test creation into another code maintenance problem.

For a broader discussion of the tradeoffs between AI-generated Playwright code and maintainability, I also recommend reading AI Playwright Testing: Useful Shortcut or Maintenance Trap?.

Final takeaway

To generate Playwright tests with GitHub Copilot effectively, give it a precise flow, review the locators and assertions, and run the result like any other production-quality test. Copilot can save time, but it is not a substitute for test design.

The real value comes when you combine AI assistance with sound SDET judgment. That is what keeps your suite useful instead of just large.