Selenium Page Object Model Tutorial

If you have maintained a Selenium suite for more than a few months, you already know the usual pain points: duplicated locators, brittle selectors, and tests that break because a button moved to a different component. The Page Object Model is the pattern most teams reach for to tame that complexity. It is not magic, but when used well, it gives you a cleaner boundary between test intent and UI mechanics.

This Selenium Page Object Model tutorial walks through the pattern from a practical SDET point of view. I will show why it helps, what it does not solve, and how to implement it in both Python and Java. I will also cover common mistakes, where to place waits, and when a different approach, including a no-code alternative like Endtest, an agentic AI [Test automation](https://en.wikipedia.org/wiki/Test_automation) platform,, can be a better fit for some teams.

What the Page Object Model actually is

Page Object Model, often shortened to POM, is a design pattern for test automation where each page, screen, or meaningful UI surface is represented by a class. That class stores locators and exposes behaviors, while the test itself focuses on business flow.

At a high level:

Tests say what the user is trying to do.
Page objects know how to do it in the UI.
Locators and wait logic live in one place instead of being scattered across many tests.

The real goal of Selenium POM is not abstraction for its own sake, it is reducing the cost of UI change.

A good page object should answer questions like:

How do I enter credentials on the login page?
How do I submit the form?
How do I detect that the page is ready?

A bad page object becomes a dumping ground for every utility method in the codebase. That is when the pattern starts to look heavier than the problem it solves.

Why Selenium suites become hard to maintain without POM

Directly scripting Selenium actions in tests is fine when you are exploring a product or writing a tiny proof of concept. It becomes painful when the suite grows.

Consider this style of test:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

browser = webdriver.Chrome() browser.get(“https://example.com/login”)

browser.find_element(By.ID, “email”).send_keys(“qa@example.com”) browser.find_element(By.ID, “password”).send_keys(“secret”) browser.find_element(By.CSS_SELECTOR, “button[type=’submit’]”).click()

WebDriverWait(browser, 10).until( EC.visibility_of_element_located((By.ID, “dashboard”)) )

This is readable enough for one test. Now imagine 80 tests with the same login sequence, and the login form changes. You will update the same locators everywhere, and if a wait is slightly off, debugging becomes tedious.

POM helps because the login flow becomes a reusable object, and the test files stay focused on behavior.

What belongs in a page object, and what does not

A page object should usually contain:

Locators for elements on that page or component
Interaction methods, such as login(), search(), add_to_cart()
Assertions that are tightly tied to page state, sometimes
A page-ready check, such as waiting for a key element

A page object should usually not contain:

Test data setup
Cross-page business flows that belong in the test or a workflow layer
Assertions for unrelated application behavior
Environment-specific configuration
Low-level driver creation

One practical rule I use is this:

If a method reads like a user action on a specific page, it likely belongs in the page object. If it reads like a business scenario spanning multiple screens, it probably belongs in the test or a flow class.

Selenium POM example in Python

Here is a small but realistic Python implementation using Selenium. I am using a base page for shared wait helpers, a login page object, and a test that reads like a user flow.

Base page

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

class BasePage: def init(self, driver): self.driver = driver self.wait = WebDriverWait(driver, 10)

def click(self, locator):
    self.wait.until(EC.element_to_be_clickable(locator)).click()

def type(self, locator, text):
    element = self.wait.until(EC.visibility_of_element_located(locator))
    element.clear()
    element.send_keys(text)

from selenium.webdriver.common.by import By
from base_page import BasePage

class LoginPage(BasePage): EMAIL = (By.ID, “email”) PASSWORD = (By.ID, “password”) SUBMIT = (By.CSS_SELECTOR, “button[type=’submit’]”) DASHBOARD = (By.ID, “dashboard”)

def open(self, url):
    self.driver.get(url)
    return self

def login(self, email, password):
    self.type(self.EMAIL, email)
    self.type(self.PASSWORD, password)
    self.click(self.SUBMIT)
    return self

def is_logged_in(self):
    return self.wait.until(lambda d: d.find_element(*self.DASHBOARD).is_displayed())

Test

from selenium import webdriver
from login_page import LoginPage

def test_user_can_log_in(): driver = webdriver.Chrome() try: page = LoginPage(driver) page.open(“https://example.com/login”).login(“qa@example.com”, “secret”) assert page.is_logged_in() finally: driver.quit()

This is already a meaningful improvement over repeating raw Selenium calls in every test. The test itself now tells a story: open the login page, log in, assert success.

Selenium Page Object Model in Java

Java teams often use the same pattern, sometimes with PageFactory, sometimes with explicit By locators. I prefer explicit locators for most modern suites because they are easier to debug and less magical.

Base page

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.WebDriverWait;

import java.time.Duration;

public class BasePage { protected WebDriver driver; protected WebDriverWait wait;

public BasePage(WebDriver driver) {
    this.driver = driver;
    this.wait = new WebDriverWait(driver, Duration.ofSeconds(10));
}

protected void click(By locator) {
    wait.until(ExpectedConditions.elementToBeClickable(locator)).click();
}

protected void type(By locator, String text) {
    WebElement element = wait.until(ExpectedConditions.visibilityOfElementLocated(locator));
    element.clear();
    element.sendKeys(text);
} }

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;

public class LoginPage extends BasePage { private final By email = By.id(“email”); private final By password = By.id(“password”); private final By submit = By.cssSelector(“button[type=’submit’]”); private final By dashboard = By.id(“dashboard”);

public LoginPage(WebDriver driver) {
    super(driver);
}

public LoginPage open(String url) {
    driver.get(url);
    return this;
}

public LoginPage login(String userEmail, String userPassword) {
    type(email, userEmail);
    type(password, userPassword);
    click(submit);
    return this;
}

public boolean isLoggedIn() {
    return wait.until(d -> d.findElement(dashboard).isDisplayed());
} }

Test

import org.junit.jupiter.api.Test;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

import static org.junit.jupiter.api.Assertions.assertTrue;

public class LoginTest { @Test void userCanLogIn() { WebDriver driver = new ChromeDriver(); try { LoginPage page = new LoginPage(driver); page.open(“https://example.com/login”).login(“qa@example.com”, “secret”); assertTrue(page.isLoggedIn()); } finally { driver.quit(); } } }

The locator strategy matters more than the pattern name

A lot of teams adopt Selenium POM and still end up with flaky tests because the locators are poor. The pattern does not fix bad selector strategy.

Prefer stable locators in this order when possible:

data-testid or similar test-only attributes
Unique IDs
Accessible roles and labels when supported by your framework and app structure
Well-scoped CSS selectors
XPath only when necessary

If your page object relies on long absolute XPath expressions, you are not buying much maintainability. You are just centralizing fragility.

A small example of a better locator choice:

python LOGIN_BUTTON = (By.CSS_SELECTOR, “[data-testid=’login-submit’]”)

This usually ages better than a selector tied to layout structure or a translated label.

Where waits should live in a POM design

Waits are one of the most important topics in Selenium architecture. If you scatter sleep() calls through tests, you are building flakiness into the suite. If you hide all waits deep inside utilities with no clear contract, debugging becomes difficult.

My default approach is:

Use explicit waits in page objects for element readiness
Keep tests free of raw timing logic
Avoid implicit waits unless your team has a very specific reason

The page object should know how long it takes for its important state to appear. For example, a dashboard page object might wait for a key widget or heading before returning control to the test.

This is especially helpful in CI/CD pipelines, where test timing can change due to environment load. If you want a refresher on the broader concept behind this, the Selenium documentation is the right place to check framework-specific wait behavior.

Modeling pages versus components

For older applications with simple page structures, a one-class-per-page approach works fine. For modern apps with repeated UI patterns, component objects can be more useful than page-only classes.

Examples of reusable components:

Navigation bars
Modal dialogs
Date pickers
Tables and filters
Toast notifications

A page can compose components instead of owning everything itself.

For example, a checkout page may own the form fields, but delegate the address selector to an AddressModal component. This keeps page objects smaller and avoids copy-paste when the same component appears in multiple places.

Common Page Object Model mistakes

Here are mistakes I see often when reviewing Selenium codebases.

1. Putting assertions everywhere

A page object can expose state checks, but if it becomes an assertion-heavy mini test suite, you lose clarity. Let tests own the scenario-level assertions.

2. Returning raw WebElements

If the test code starts manipulating WebElement objects directly, the abstraction is leaking. Prefer methods such as submit() or select_country("US") instead of returning element handles all the time.

3. Creating a page object for every HTML file

A page object should match a meaningful unit of interaction. One class per route is not always useful, especially in single-page applications where the visible state changes without a full navigation.

A small base page is fine. A giant inheritance tree usually becomes hard to reason about. Prefer composition when a helper is not truly shared behavior.

5. Mixing framework setup with page logic

Browser creation, grid configuration, screenshots, and test data setup belong elsewhere. Keep page classes focused on UI behavior.

How POM helps with flaky tests

POM does not eliminate flakiness by itself, but it makes flakiness easier to diagnose and reduce.

Why?

Locator changes are centralized
Wait policies are centralized
Page readiness checks are reusable
Tests have fewer moving parts

This means if a login page starts failing, you can inspect one LoginPage class instead of thirty tests. That alone saves a lot of time.

In practice, a maintainable Selenium suite is less about clever abstractions and more about reducing the number of places where timing and selectors can go wrong.

POM in CI/CD pipelines

Once your Selenium suite runs in CI, page object quality becomes even more important. Headless browser differences, slower environments, and parallel execution all expose weak abstractions.

A simple GitHub Actions example might look like this:

name: ui-tests

on: pull_request: push: branches: [main]

jobs: selenium-tests: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: “3.12” - run: pip install -r requirements.txt - run: pytest tests/ui

This is not a full production setup, but it highlights the practical point: if your page objects are brittle, CI will expose it quickly. Good POM structure makes failures more readable and updates less painful.

Page Object Model Python versus Page Object Model Java

Both languages support the pattern well, but they tend to feel different in practice.

Python strengths

Less boilerplate
Fast to prototype
Easier for mixed QA and developer teams to read
Works well with pytest fixtures

Java strengths

Stronger typing
Large enterprise ecosystem
Clear fit for teams already standardized on JUnit, Maven, or Gradle
Useful when you want stricter compile-time structure

The tradeoff is not about which language is objectively better. It is about what your team can maintain consistently. A small, disciplined Python suite beats a large, inconsistent Java suite every time.

When Selenium POM is a good fit

Selenium POM is a strong choice when:

You need browser automation across multiple browsers
Your team is already invested in Selenium
You have many stable UI workflows to cover
You want code-level control over waits, retries, and integrations
Developers and SDETs are comfortable maintaining test code

It is less compelling when:

The team is small and nobody wants to own framework code
Non-technical stakeholders need to review or update tests
You spend more time maintaining the framework than adding coverage
The app changes often and page classes constantly churn

In those cases, a simpler workflow may be better. For example, Endtest’s no-code testing approach lets teams build tests without maintaining page object classes, drivers, or framework setup. That can be a practical alternative when the bottleneck is framework maintenance rather than test design. Endtest also provides a migration path from Selenium for teams looking to reduce that code burden.

A practical decision framework

When deciding whether to use Selenium POM, ask these questions:

Do we have enough automation skill on the team to own the framework?
Are our UI tests stable enough to justify code-based maintenance?
Do we need custom logic, complex assertions, or integrations that low-code tools may not cover as cleanly?
Are we optimizing for developer control, or for team-wide accessibility?

If the answer is mostly about control and flexibility, Selenium POM is a good fit. If the answer is mostly about speed of authoring and reducing maintenance overhead, a no-code platform can make more sense.

A few implementation tips that save time later

Here are the habits that make the biggest difference in real projects.

Keep page object methods short and intention-revealing
Group locators near the methods that use them
Name classes after user-facing surfaces, not internal components unless they are reusable
Use a shared base class only for true cross-cutting helpers
Return the next page object after navigation when it improves flow readability
Add explicit wait helpers for known asynchronous states

A small example of page chaining in Python:

python class LoginPage(BasePage): def login(self, email, password): self.type(self.EMAIL, email) self.type(self.PASSWORD, password) self.click(self.SUBMIT) return DashboardPage(self.driver)

That style works well when navigation is deterministic, but do not overuse it. Returning the next page object is useful only when the next state is truly expected and stable.

Final thoughts

The Page Object Model is one of the most useful ideas in Selenium because it gives structure to a problem that quickly becomes messy: UI test maintainability. It works best when you use it to centralize locators, waits, and page-specific actions, not when you turn it into a giant abstraction layer.

If you are building a Selenium suite in Python or Java, start small, keep the boundary clear, and optimize for readable tests. If your team later decides the framework overhead is too high, it is also reasonable to evaluate a simpler path, including no-code tools that remove page object maintenance altogether.

Selenium POM is not the only way to build sustainable UI automation, but it is still a solid default when your team wants code-level control and is ready to own it.

What the Page Object Model actually is

Why Selenium suites become hard to maintain without POM

What belongs in a page object, and what does not

Selenium POM example in Python

Base page

Login page

Test

Selenium Page Object Model in Java

Base page

Login page

Test

The locator strategy matters more than the pattern name

Where waits should live in a POM design

Modeling pages versus components

Common Page Object Model mistakes

1. Putting assertions everywhere

2. Returning raw WebElements

3. Creating a page object for every HTML file

4. Sharing too much state through inheritance

5. Mixing framework setup with page logic

How POM helps with flaky tests

POM in CI/CD pipelines

Page Object Model Python versus Page Object Model Java

Python strengths

Java strengths

When Selenium POM is a good fit

A practical decision framework

A few implementation tips that save time later

Final thoughts