Dev.to
Playwright For Full-Stack Testing: Auth, Fixtures, Mocking, Snapshots, And Parallel Runs Without The Flake
Here's a Playwright test that looks completely reasonable and silently lies to you:
tests/dashboard.spec.ts
import { test, expect } from "@playwright/test";
test.use({ storageState: "playwright/.auth/user.json" });
test("dashboard shows the user's name", async ({ page }) => {
await page.goto("/dashboard");
await expect(page.getByTestId("user-name")).toHaveText("Nazar");
});
It logs in once, saves the auth state, reuses it across every test. Textbook. Except storageState saves cookies and localStorage. It does not save sessionStorage. If your app stores its JWT in sessionStorage (which a lot of SPAs do, because it dies on tab close and product wants that), every test in your suite is silently running as an unauthenticated user that happens to land on /dashboard and follow the redirect to /login. Your assertions don't fail loudly. They just match the wrong page. The fix is documented in one sentence on the Playwright auth page. Almost nobody reads it.
This is the shape of full-stack testing with Playwright: the surface API is delightful, and the failure modes hide one level below it. Let's walk through what actually keeps tests green in CI (authentication, fixtures, API mocking, visual checks, and parallel runs) and the gotchas that quietly take suites down.
Set Up Authentication Once, Not On Every Test
The naive approach is beforeEach that fills the login form. Don't. A 60-test suite at 800ms of login per test is 48 seconds of pure setup that you pay every CI run, for nothing. Playwright's storageState lets you log in once, dump cookies and localStorage to a JSON file on disk, and load that file into every test as a starting context.
The recommended shape uses project dependencies. You declare a setup project that runs a single auth.setup.ts file before everything else, and your real test projects depend on it:
playwright.config.ts
import { defineConfig } from "@playwright/test";
export default defineConfig({
projects: [
{ name: "setup", testMatch: /.*\.setup\.ts/ },
{
name: "chromium",
use: { storageState: "playwright/.auth/user.json" },
dependencies: ["setup"],
},
],
});
The setup file does the actual sign-in once:
tests/auth.setup.ts
import { test as setup, expect } from "@playwright/test";
const authFile = "playwright/.auth/user.json";
setup("authenticate", async ({ page }) => {
await page.goto("/login");
await page.getByLabel("Email").fill("e2e@example.com");
await page.getByLabel("Password").fill(process.env.E2E_PASSWORD!);
await page.getByRole("button", { name: "Sign in" }).click();
// Verify we actually got in before saving — this catches CAPTCHA, MFA, broken envs.
await expect(page.getByTestId("user-menu")).toBeVisible();
await page.context().storageState({ path: authFile });
});
The verification line is not optional. If your login flow ever fails (wrong env, expired test account, a new "verify it's you" challenge) and you skip the assertion, you save an unauthenticated state to disk and ship 200 tests that all hit the login page. The whole suite reports green if your assertions happen to also pass on /login. Trust me, this is how zero-coverage suites get born.
The sessionStorage / IndexedDB Trap
Back to the opener. storageState captures cookies and localStorage by design. If your auth lives anywhere else, you have to do extra work:
sessionStorage: never persisted. There's no flag for it. Apps that store tokens here have to script the storage write themselves after loading the saved state, or move the token to localStorage (with the security tradeoff that implies).
IndexedDB: added in Playwright 1.51 with storageState({ indexedDB: true }). If your app is built on top of a client database like RxDB, Dexie, or Firebase's offline cache, you want this flag on or your saved state is missing huge chunks of the app's actual state.
The fix for sessionStorage looks like this:
tests/auth.setup.ts (sessionStorage variant)
setup("authenticate", async ({ page }) => {
await page.goto("/login");
// ... sign in flow ...
await expect(page.getByTestId("user-menu")).toBeVisible();
// Pull the token out so we can replay it later.
const token = await page.evaluate(() => sessionStorage.getItem("jwt"));
await page.context().storageState({ path: "playwright/.auth/user.json" });
// Stash the token separately — storageState won't save it.
await import("node:fs/promises").then((fs) =>
fs.writeFile("playwright/.auth/token.json", JSON.stringify({ token }))
);
});
Then a fixture re-injects it on every test (we'll get to fixtures in a moment). It's ugly, but the alternative is an entire test suite hallucinating signed-in behavior.
Multiple Roles Without The Setup Tax
Real apps have admin / editor / viewer / billing-only / whatever. The temptation is to chain them all in one setup project. Don't. Every test run pays for every role, even if your shard only touches the admin tests.
A cleaner pattern is one storage file per role, each generated lazily by its own fixture, only when a test actually asks for it. That's the topic of the next section, but here's the spoiler: a worker-scoped fixture per role lets each shard pay only for the auth it uses.
Use Fixtures To Move The Repetition Out Of Your Tests
@playwright/test ships its own fixture system that has almost nothing in common with Jest's beforeEach style. Instead of setup hooks scattered across files, you define a fixture as a function, declare it once, and Playwright wires it into any test that names it.
A minimal fixture that gives every test a logged-in API context:
tests/fixtures.ts
import { test as base, request } from "@playwright/test";
type Fixtures = {
api: Awaited<ReturnType<typeof request.newContext>>;
};
export const test = base.extend<Fixtures>({
api: async ({}, use) => {
const ctx = await request.newContext({
baseURL: process.env.API_URL,
extraHTTPHeaders: { Authorization: `Bearer ${process.env.E2E_TOKEN}` },
});
await use(ctx); // tests run here
await ctx.dispose(); // teardown after every test
},
});
Now every test that imports test from ./fixtures.ts instead of @playwright/test can do async ({ page, api }) => ... and call api.post("/seed/orders", { data: ... }) to set up backend state before driving the browser. No beforeEach, no module-level globals, no leaks between tests. Playwright disposes the context after every test on its own.
Test-Scoped vs Worker-Scoped: The Performance Knob
By default fixtures are test-scoped: they run before and after every individual test. That's the right default for anything that holds mutable state (an API context, a seeded database row, a temp file). It's the wrong default for expensive read-only setup like "spin up a fresh Postgres schema".
For those, mark the fixture as worker-scoped:
tests/fixtures.ts (worker-scoped DB)
type WorkerFixtures = { dbSchema: string };
export const test = base.extend<{}, WorkerFixtures>({
dbSchema: [
async ({}, use, workerInfo) => {
const schema = `e2e_${workerInfo.parallelIndex}`;
await execSql(`CREATE SCHEMA ${schema}`);
await runMigrations(schema);
await use(schema);
await execSql(`DROP SCHEMA ${schema} CASCADE`);
},
{ scope: "worker" },
],
});
workerInfo.parallelIndex is a small integer that's unique per parallel worker but reused across workers as they're recycled. Most "isolate per worker" patterns key off it: schema names, mailbox addresses, port numbers, fake-user emails. The full key with retries is workerInfo.workerIndex, which keeps incrementing; parallelIndex stays bounded.
A Per-Worker Auth Fixture For State-Mutating Tests
Tests that mutate data (change a user's profile, place an order, archive a workspace) need their own user account, or they race each other. The pattern is one user per worker, authenticated once per worker:
tests/fixtures.ts (per-worker auth)
export const test = base.extend<{}, { storageState: string }>({
storageState: [
async ({ browser }, use, workerInfo) => {
const file = `playwright/.auth/user-${workerInfo.parallelIndex}.json`;
if (!existsSync(file)) {
const ctx = await browser.newContext();
const page = await ctx.newPage();
await page.goto("/login");
await page.getByLabel("Email").fill(`e2e+${workerInfo.parallelIndex}@example.com`);
await page.getByLabel("Password").fill(process.env.E2E_PASSWORD!);
await page.getByRole("button", { name: "Sign in" }).click();
await expect(page.getByTestId("user-menu")).toBeVisible();
await ctx.storageState({ path: file });
await ctx.close();
}
await use(file);
},
{ scope: "worker" },
],
});
Now each worker logs in exactly once, for exactly the role its tests need, and never collides with another worker's data. A 5-worker run with admin + viewer + member roles spread across tests pays for 5 logins (one per worker, for whichever role it happens to need first), not 15.
Mock The API Layer When It Matters, And Don't When It Doesn't
This is where opinions get loud. The orthodox e2e position is "mock nothing, hit the real stack". The CI-cost position is "mock everything, hope your contracts hold". The honest answer is that a full-stack suite needs both, in different tests, deliberately chosen.
Playwright's mocking primitive is page.route(pattern, handler). It hooks the browser's network layer and lets you intercept anything before it leaves:
tests/checkout-error.spec.ts
test("shows a friendly error when payment is declined", async ({ page }) => {
await page.route("**/api/payments", (route) =>
route.fulfill({
status: 402,
contentType: "application/json",
body: JSON.stringify({ error: "card_declined" }),
})
);
await page.goto("/checkout");
await page.getByRole("button", { name: "Pay" }).click();
await expect(page.getByRole("alert")).toHaveText(/card was declined/i);
});
That's the move for error-path tests. You cannot reliably trigger a real 402 from Stripe on demand, and you don't want your CI suite making real test-mode charges anyway. Mock the route, drive the UI, assert the user-visible behavior.
The same primitive lets you do partial mocking, where the real backend handles most of a response and you patch one field:
tests/feature-flag.spec.ts
await page.route("**/api/me", async (route) => {
const response = await route.fetch();
const body = await response.json();
body.flags = { ...body.flags, new_dashboard: true };
await route.fulfill({ response, body: JSON.stringify(body) });
});
This pattern is gold for testing feature-flagged UI without actually flipping a flag in your config service. Real auth, real user, real DB, one tiny patch on the response.
HAR Files: Record Once, Replay Forever
For pages that pull from a dozen endpoints, hand-writing mocks is miserable. Playwright's routeFromHAR captures every network request the first time the test runs, stores it in an HTTP Archive file, then replays from disk on subsequent runs:
tests/landing.spec.ts
test("landing page", async ({ page }) => {
// First run: pass { update: true } to record.
// After that: omit it, and requests are served from disk.
await page.routeFromHAR("hars/landing.har", { url: "**/api/**" });
await page.goto("/");
await expect(page.getByRole("heading", { name: "Welcome" })).toBeVisible();
});
Run it once with { update: true }, commit the HAR file, and the test is now hermetic. No backend dependency, no flake from a slow upstream, no API quota burn.
The trap: HAR matching is strict on URL and HTTP method, and for POST requests it also matches the request payload. If your test sends a POST with a timestamp, a UUID, or anything else that changes between runs, the replay misses, and by default Playwright aborts the unmatched request (notFound: 'abort'), so your test dies on a confusing network error. Set notFound: 'fallback' and misses fall through to your other route handlers and, from there, the real network, which is arguably worse because now it's silent. There are long-standing GitHub issues about exactly this failure mode for state-mutating requests. The pragmatic answer is: use HAR for GET-heavy read paths, and write explicit page.route mocks for anything that POSTs.
When To Reach For Each Tool
A working heuristic:
No mocking: happy-path smoke tests that prove the whole stack actually integrates. Keep a handful of these. They're slow, they're flaky, they're worth it.
page.route with fulfill: error states, edge cases, anything you can't reliably trigger live.
page.route with fetch + patch: feature flags, A/B variants, anything where the response shape is mostly real but one field needs forcing.
routeFromHAR: read-heavy pages with lots of upstream calls and stable responses.
APIRequestContext: backend-only assertions, or seeding state before a UI test. Doesn't drive a browser, doesn't pay the browser cost.
The mistake is going all-in on any one of them. A pure no-mock suite is brittle and slow; a pure mock suite drifts from reality the day your API changes. Pick per-test based on what you're actually trying to verify.
Visual Checks Without The Flake
toHaveScreenshot is the assertion that tempts you with "just snapshot the page", and then teaches you over the next month why visual diffing is a discipline, not a one-liner.
The baseline call is short:
tests/visual.spec.ts
test("pricing page matches baseline", async ({ page }) => {
await page.goto("/pricing");
await expect(page).toHaveScreenshot("pricing.png", { fullPage: true });
});
First run, Playwright writes pricing-chromium-linux.png to your test folder. Every subsequent run, it diffs the live screenshot against that baseline. The match is per-platform: Linux Chromium and macOS Chromium render differently at the subpixel level because of font rendering, so your local-vs-CI snapshots will diverge unless you generate both.
The Three Tolerance Knobs
The defaults are not generous, and tightening or loosening them without understanding the difference is the most common mistake:
threshold (default 0.2): a 0-to-1 color-difference threshold per pixel. 0 means exact pixel match; 1 means anything goes. This controls how different a pixel has to be before it counts as a diff. Anti-aliasing and font hinting move pixels by tiny amounts, so a strict 0 will fail on benign rendering differences.
maxDiffPixels: an absolute integer. "Allow up to 500 pixels to differ before failing." Useful when you know your page has a small dynamic region.
maxDiffPixelRatio: a fraction of total pixels (0 to 1). "Allow up to 0.1% of pixels to differ." Scales with image size.
Setting threshold higher hides real visual bugs because it lets every pixel drift a little. Setting maxDiffPixels higher is usually safer: it caps the area of allowed difference rather than weakening the per-pixel comparison. The two combine: a diff fails only if more than maxDiffPixels pixels each exceed the threshold color delta.
Killing The Three Causes Of Flake
Visual tests fail for three reasons that have nothing to do with your code:
Animations still running: pause them. await page.addStyleTag({ content: "*{animation: none !important; transition: none !important;}" }) is the brutal but effective version.
Fonts not loaded: wait for them. await page.evaluate(() => document.fonts.ready) blocks until web fonts have actually rendered. Without it, the first run captures the system fallback font and every subsequent run that loads the web font fails.
Dynamic content: timestamps, randomized testimonials, ad slots, the user's own avatar. Mask them with { mask: [page.getByTestId("clock"), page.getByTestId("hero-ad")] }. Playwright paints a solid color over the masked regions on both baseline and live, so they're identical by definition.
toHaveScreenshot already auto-retries until the page stabilizes: it takes a screenshot, waits, takes another, and stops when two consecutive captures match. That handles small layout shifts on load. It does not handle any of the three reasons above, because those are deterministic-but-different, not transient.
A Sane Visual-Test Default
After enough self-inflicted CI fires, the configuration that holds up across teams looks like this:
playwright.config.ts
export default defineConfig({
expect: {
toHaveScreenshot: {
threshold: 0.2, // the default — don't lower without a reason
maxDiffPixels: 100, // tiny budget for AA/hinting noise
animations: "disabled" // auto-stop CSS animations before snapshot
},
},
});
animations: "disabled" is a Playwright option, not a CSS hack: it freezes CSS animations and transitions before each screenshot. It's also already the default for toHaveScreenshot (plain page.screenshot() defaults to "allow"), so the config line is less about flipping a switch and more about pinning behavior your suite relies on. Either way, it's the cleanest answer to reason #1, no style injection of your own needed.
Parallel Runs And Sharding Without Stepping On Yourself
Playwright runs tests in parallel by default. Each worker is a separate OS process with its own browser instance: total isolation, no shared variables, no leaked cookies. The defaults are:
Test files run in parallel. Different files go to different workers.
Tests within a file run serially. Inside one file, tests share a worker process.
That second rule trips people up. A file with 20 tests all hitting the same worker means slow workers and underused parallelism. The fix is one config line:
playwright.config.ts
export default defineConfig({ fullyParallel: true });
With fullyParallel: true, Playwright distributes individual tests across workers regardless of file. The scheduling unit drops from "file" to "test". On a 4-worker box with 20 tests in one file, you finish in roughly a quarter of the time.
Isolating State Per Worker
If your tests mutate shared resources (a database, a message queue, a third-party sandbox account), parallelism turns into a race condition factory. The standard pattern is keying per-worker resources off process.env.TEST_WORKER_INDEX (or testInfo.workerIndex inside tests):
tests/fixtures.ts
export const test = base.extend<{ user: User }>({
user: async ({}, use, testInfo) => {
// Each worker gets its own email — no two parallel tests fight over the same row.
const email = `e2e-${testInfo.workerIndex}-${Date.now()}@example.com`;
const u = await api.createUser({ email });
await use(u);
await api.deleteUser(u.id);
},
});
workerIndex increments forever (1, 2, 3, ...), so retries land in a fresh worker with a fresh number. parallelIndex cycles through 0..workers-1. Use it when you want a stable index that can be reused (like the auth-per-worker storage files above).
Sharding For CI: Split The Suite Across Machines
Workers parallelize on one machine. Sharding splits the suite across machines. CLI:
npx playwright test --shard=1/4
npx playwright test --shard=2/4
npx playwright test --shard=3/4
npx playwright test --shard=4/4
Four CI jobs, each runs roughly a quarter of the suite. Playwright distributes tests deterministically based on the shard index, so you don't have to coordinate. The official docs explicitly recommend pairing sharding with fullyParallel: true: at the file level, shards risk being uneven because one file with 50 tests counts as one unit. At the test level, work splits much more evenly.
The mental model is two-dimensional: shards split tests across machines, workers split tests across CPU cores on each machine. A 4-shard / 4-worker setup gives you 16-way parallelism. The bottleneck flips from CPU to your backend's ability to handle 16 concurrent test users, which is its own conversation.
The One CI Setting That Actually Matters: Traces
If you change exactly one Playwright config when you wire it into CI, change this:
playwright.config.ts
export default defineConfig({
retries: process.env.CI ? 2 : 0,
use: {
trace: "on-first-retry",
},
});
trace: 'on-first-retry' tells Playwright to record a full trace (DOM snapshots at every action, network requests, console logs, screenshots before and after each step) only when a test fails and is being retried. The first attempt runs lean. The retry records everything. When the retry passes, the trace is discarded. When it also fails, you get a trace.zip attached to the test report.
Open it with npx playwright show-trace trace.zip. You get a timeline of every action, with a DOM snapshot at each step. You can hover the timeline and see the page change. You can click any locator call and see exactly what was on the page at that moment. The Network tab shows every request, including the 401s your auth token didn't survive into CI. The Console tab shows the JS error that fired on a slower machine.
This is the difference between "the test failed in CI but I can't reproduce locally" being a half-day investigation and a five-minute one. If you don't have retries enabled at all, swap in trace: 'retain-on-failure': same idea, fires on first failure instead of first retry.
Tip
The trace file lives in your artifacts. Wire it into your CI job to be uploaded on failure, and the Playwright HTML reporter will surface a "View trace" link in the failure report. The wiring is two lines in most CI systems; the payoff is permanent.
What Stays With You
Full-stack tests with Playwright work the way furniture works: every piece looks simple in the catalog, and the project succeeds or fails on how the pieces fit. Save authentication once with storageState, mind the sessionStorage blind spot, and prefer project dependencies for the setup step. Push everything you'd otherwise put in beforeEach into a fixture, and pick test scope vs worker scope based on whether the fixture is per-test state or per-process state. Mock the API at the layer that hurts least: page.route for error paths, HAR for read-heavy pages, the real backend for the small set of tests that prove the integration. Treat visual checks as a discipline: kill animations, wait for fonts, mask the volatile bits, leave threshold alone. Lean on fullyParallel and sharding for speed, and key every shared resource off workerIndex so parallelism never silently corrupts your data. Turn on trace: 'on-first-retry' before you ship anything to CI.
Do those seven things and the suite stops being a chore you maintain. It starts being the thing that catches the bug you would otherwise have shipped.
Originally published at nazarboyko.com.
2 hours ago