June 18, 2026
How to Test Downloaded PDFs, Invoices, and CSV Exports in Playwright Without Flaky File Checks
A practical Playwright tutorial for testing downloaded PDFs, invoices, and CSV exports by verifying real file content, avoiding flaky filename-only assertions, and handling CI file download pitfalls.
When a product exports a PDF invoice or a CSV report, the real question is not, “Did the browser fire a download event?” It is, “Did the user get the right file, with the right content, in the right format?” That distinction matters because download flows are one of the easiest places to build tests that look stable but do not actually prove much.
I have seen plenty of test suites that assert a filename, wait for a download event, and call it done. Those checks are better than nothing, but they miss real defects: empty files, broken CSV delimiters, malformed PDFs, incorrect locale formatting, truncated exports, and backend regressions that still produce a download with the expected name. If your app exports invoices, statements, reports, or receipts, you need to verify the artifact, not just the browser behavior.
In this article, I will walk through a practical way to test downloaded PDFs and CSV exports in Playwright without relying on flaky file checks. The focus is on validating the actual file contents, keeping the tests deterministic, and making them useful in CI.
What makes file export tests flaky
File download tests usually become flaky for one of these reasons:
- The test depends on timing, like waiting for a file to appear in a shared downloads folder.
- The download path changes between local runs and CI.
- The test assumes the file is fully written when the download event resolves.
- The test checks only the filename, not the content.
- The exported data is generated asynchronously, so the file may vary slightly if the test setup is not controlled.
- The suite runs tests in parallel and multiple workers collide on the same download directory.
The best way to reduce these failures is to avoid filesystem polling unless you absolutely need it, and to use Playwright’s built-in download handling wherever possible. Playwright exposes a Download object that lets you save the file to a known path and inspect it after the browser has finished the download flow. The official Playwright docs are a good reference for the core API, especially page.waitForEvent('download') and download.saveAs() from the Playwright docs.
A download test should prove that the exported artifact is correct, not just that a file-like event happened.
What to verify for each file type
Different export formats need different assertions.
PDFs
For PDF exports, you usually want to verify:
- The file is not empty.
- The file begins with a valid PDF header.
- The file contains expected visible text, such as an invoice number, customer name, or total amount.
- Optional, if needed, metadata or page count.
A pure filename assertion does none of that. For most teams, text extraction is enough to prove that the generated invoice or statement is correct.
CSV exports
For CSV files, verify:
- The file exists and is non-empty.
- The headers are correct and in the expected order, if order matters.
- The number of rows matches the data you created in the test.
- Specific rows contain expected values.
- Delimiters, quoting, and encoding are handled correctly.
CSV checks can catch subtle bugs, especially when locale settings affect decimal separators, date formatting, or escaping of commas and quotes.
Other exports, like XLSX or JSON
Even though this article focuses on PDFs and CSVs, the same principle applies to other exports. The browser event is only the first half of the test. The artifact itself is the real product.
A reliable Playwright pattern for downloads
The simplest robust pattern is:
- Trigger the export action.
- Wait for the
downloadevent. - Save the file to a test-specific temporary path.
- Parse the file contents.
- Assert on the parsed content.
Here is a concise Playwright example for a PDF download:
import { test, expect } from '@playwright/test';
import fs from 'fs/promises';
test('downloads an invoice PDF', async ({ page }, testInfo) => {
const downloadPromise = page.waitForEvent('download');
await page.getByRole('button', { name: 'Download invoice' }).click();
const download = await downloadPromise; const filePath = testInfo.outputPath(download.suggestedFilename()); await download.saveAs(filePath);
const buffer = await fs.readFile(filePath); expect(buffer.subarray(0, 4).toString()).toBe(‘%PDF’); expect(buffer.length).toBeGreaterThan(1000); });
This is already better than checking the filename. It confirms that the file exists and looks like a PDF, but it still does not verify the visible text inside the document. For that, you need PDF parsing.
Testing PDF exports by reading actual text
PDFs are binary documents, so you cannot treat them like plain text. The file may contain compressed streams, embedded fonts, and layout instructions. If you want to assert on invoice content, use a PDF parsing library that extracts text from the document.
A practical approach is to use a Node library such as pdf-parse or another equivalent parser in your test environment. The exact library matters less than the principle, which is to inspect the content, not the download metadata.
Example:
import { test, expect } from '@playwright/test';
import fs from 'fs/promises';
import pdfParse from 'pdf-parse';
test('invoice PDF includes the order total', async ({ page }, testInfo) => {
const downloadPromise = page.waitForEvent('download');
await page.getByText('Export invoice').click();
const download = await downloadPromise; const filePath = testInfo.outputPath(‘invoice.pdf’); await download.saveAs(filePath);
const pdfBuffer = await fs.readFile(filePath); const parsed = await pdfParse(pdfBuffer);
expect(parsed.text).toContain(‘Invoice #INV-1042’); expect(parsed.text).toContain(‘$149.00’); });
This test is much more meaningful. It verifies that the invoice is not just a valid file, but one that contains the expected business data.
PDF caveats that matter in real suites
PDF text extraction is not always perfect. Depending on how the PDF is generated, text may appear in odd order or split across lines. That means your assertions should be resilient.
Good patterns:
- Check for stable fragments, like invoice numbers or totals.
- Avoid asserting on large blocks of layout-sensitive text.
- Normalize whitespace before checking if your parser inserts extra spaces or line breaks.
- Keep the fixture data deterministic, so the same order always produces the same visible values.
If the generated PDF is image-based or scanned, text extraction may not work well. In that case, you may need OCR or a different verification strategy, but that should be the exception, not the default.
For PDFs, the goal is usually content verification, not pixel-perfect document validation.
Testing CSV exports with structured parsing
CSV files are simpler than PDFs, but they can still fail in subtle ways. The safest strategy is to parse the file into rows and compare against expected data.
Here is a Playwright example for a CSV export:
import { test, expect } from '@playwright/test';
import fs from 'fs/promises';
import { parse } from 'csv-parse/sync';
test('exports the customer report as CSV', async ({ page }, testInfo) => {
const downloadPromise = page.waitForEvent('download');
await page.getByRole('button', { name: 'Export CSV' }).click();
const download = await downloadPromise; const filePath = testInfo.outputPath(download.suggestedFilename()); await download.saveAs(filePath);
const csv = await fs.readFile(filePath, ‘utf8’); const rows = parse(csv, { columns: true, skip_empty_lines: true });
expect(rows).toHaveLength(2); expect(rows[0]).toMatchObject({ name: ‘Alice Smith’, status: ‘active’, }); expect(rows[1].email).toBe(‘bob@example.com’); });
This version checks real data rather than the download shell. That is what you want when you test CSV export functionality.
CSV details to watch
CSV bugs often hide in areas that are easy to overlook:
- Column order changes unexpectedly.
- Quotes are missing around fields containing commas.
- Encodings break non-ASCII characters.
- Empty rows appear at the end of the file.
- Date or decimal formatting changes based on locale.
If your application supports international users, add fixtures with accents, commas, quotes, and non-English characters. The export should survive real-world data, not only clean sample rows.
Avoiding filesystem race conditions
A common source of flakiness is the test polling the filesystem before the file is fully written. Playwright’s download event already waits for the browser to produce the file, which eliminates much of that problem. Still, you should avoid shared paths and global cleanup logic that can interfere across tests.
Use testInfo.outputPath() for every download. That gives each test its own isolated output directory and prevents collisions in parallel runs.
Do not use a hard-coded path like /tmp/download.csv across every test worker. That is a reliable way to create unpredictable failures when parallel execution is enabled.
If you need to inspect the download directory itself, keep the scope narrow and avoid assumptions about file creation order. But in most cases, download.saveAs() is the cleaner option.
Handling browsers and download configuration
Depending on your app and browser context, you may need to set acceptDownloads: true. That is usually required for download handling in Playwright contexts.
Example context setup:
import { test } from '@playwright/test';
test.use({ acceptDownloads: true, });
If your suite creates custom browser contexts, keep download handling consistent across them. A missing acceptDownloads flag can produce confusing failures that look like app bugs but are really test setup issues.
Also note that some apps generate the file only after a backend call finishes. In those cases, you may want to wait for the response that powers the download, especially if the UI can trigger multiple network requests. That is not about waiting on the download itself, it is about making the test setup deterministic.
Choosing between UI-triggered downloads and direct file retrieval
You have two main strategies for testing exports.
UI-triggered download tests
Use the browser and click the export button when you want to validate the real user flow. This is the best choice if you care about:
- Permission checks
- Button state
- Toast notifications
- Download initiation from the UI
- Integration across frontend and backend
This is the flow most teams should cover at least once per export type.
API-driven file retrieval tests
Sometimes it is more efficient to call the export endpoint directly, then inspect the returned file body. This can be useful when the frontend is not the thing you are trying to test, or when the export endpoint has many data combinations and you want a faster, more focused check.
However, endpoint-level tests are not a replacement for a true UI download test. They tell you the backend works, but not that the browser flow is wired correctly.
A good testing strategy usually combines both, one end-to-end UI test to prove the user journey, and a smaller set of API or integration tests to cover content edge cases.
What not to test
It is tempting to over-assert on export files, especially PDFs. Resist that impulse.
Avoid tests that:
- Assert exact byte-for-byte equality on generated PDFs unless you have a very controlled generator and a strong reason to do so.
- Depend on timestamp fields that are supposed to change.
- Verify only the filename and ignore content.
- Fail if the order of unrelated CSV rows changes when the product does not promise ordering.
- Parse the entire visual layout of a PDF when only the text content matters.
The best file export tests are usually narrow, content-driven, and business-focused.
Making file export tests stable in CI
CI adds another layer of failure modes, especially around file paths, worker isolation, and slower infrastructure. A few simple practices help a lot.
Use isolated test output paths
Always save downloads into the per-test output directory.
Keep fixtures deterministic
If the invoice total depends on tax rules, currency conversion, or current dates, freeze the relevant inputs. The fewer moving parts in the generated export, the easier it is to make reliable assertions.
Prefer explicit waits only when needed
Use Playwright’s waitForEvent('download') for the file itself. If the export button triggers asynchronous preparation, you may also need to wait for a backend request or a UI state change before clicking. Do not add arbitrary sleep calls.
Include export tests in the right test layer
Not every export scenario belongs in a full browser test. If you have dozens of CSV variations, you may get better coverage by combining a few browser tests with lower-level tests that validate the export generator directly.
That balance is part of good test automation. The browser proves the user experience, while lower-level tests make edge cases cheaper to cover.
A practical testing matrix for exports
If you are deciding what to automate first, this matrix works well:
For PDFs
- One happy-path invoice export test
- One test for a canceled or zero-value invoice, if supported
- One test for locale-sensitive formatting, if your product is internationalized
For CSVs
- One happy-path export test
- One test with special characters, commas, and quotes
- One test with empty result sets or filtered data returning no rows
For both
- Assert the file is not empty
- Assert the file type or content structure is correct
- Assert a few business-critical fields
- Avoid unrelated layout assertions
This gives you meaningful coverage without turning the suite into a maintenance burden.
Debugging failed download tests
When a file export test fails, the first step is not to guess. Inspect the saved artifact.
Useful debugging steps:
- Save the file to the test output directory.
- Print the suggested filename and file size.
- Log a small excerpt of parsed text or the first few CSV rows.
- Attach the file to your CI artifacts so you can inspect it later.
Example logging snippet:
typescript
const stats = await fs.stat(filePath);
console.log('Downloaded file size:', stats.size);
For CSVs, printing parsed rows is often enough to spot wrong headers or missing data. For PDFs, the extracted text is usually the fastest signal that tells you whether the generator or the test is wrong.
A few implementation tradeoffs worth remembering
There is no single perfect way to test exports. The right choice depends on what risk you are trying to catch.
If your concern is broken download wiring, a UI download test is enough.
If your concern is corrupted business data in the document, parse the file and assert on content.
If your concern is PDF rendering fidelity, you may need more specialized checks, but that should be reserved for truly critical documents.
If your concern is performance or large report generation, a browser test alone may be too slow, and you should test the generator at a lower level too.
The key is to align the test with the failure mode you actually care about.
A concise checklist you can use today
Before you ship a download test for PDFs or CSVs, ask:
- Does the test verify the real file content, not just the download event?
- Is the download stored in a unique path per test run?
- Are assertions based on stable business data?
- Have I avoided timing-based filesystem polling?
- Does the test still work in CI with parallel workers?
- Have I covered special characters, formatting, and edge cases?
If the answer is yes, you are probably testing the right thing.
Final thoughts
If you want to test downloaded PDFs and CSV exports in Playwright, the most important shift is to stop treating the download event as the finish line. The event only tells you that the browser started a download. It does not tell you whether the invoice is valid, whether the CSV has the right rows, or whether the exported file is actually usable.
Use Playwright’s download API, save files into per-test output paths, parse the artifact, and assert on the content that matters to the business. That approach gives you stronger signal, fewer flaky checks, and tests that fail for reasons developers can act on.
For most teams, that is the difference between a test that looks impressive in a demo and a test that keeps regressions out of production.
If you want to go deeper into the browser automation side, the Playwright documentation is the best place to start, and understanding the basics of continuous integration will help you make these tests reliable outside your local machine.