@radekmie
By Radosław Miernik · Published on · Comment on Reddit
I was setting up end-to-end tests in a new project recently. While I have like five projects with the exact same setup around to copy learn from1, I wanted a summary of why it looks like that as well (you know, for the team). As I couldn’t find one within a couple of minutes, I decided to write down my own.
The requirements are simple and short: we want to use Playwright and run the tests both locally (we’ll need a handy npm
script) as well as on GitHub Actions. While this text is rather tightly focused on the latter, it should be easy to adjust it to work with any other CI.
The result should run all tests in multiple browsers in parallel; bonus points for parallelizing the tests within one browser. It’d be nice to have a readable report in case any test fails too (because that’s what tests do, right?).
Let’s start with installing all of the dependencies and making sure everything works locally. We’ll use Playwright for Node.js, but as I checked the installation guide, it should be as easy in other environments.
First, we’ll create a new project based on the official template1:
You’ll be asked a couple of questions – whether you want to use TypeScript or JavaScript (I strongly recommend the former), where you want to put the tests (it’s up to you), whether you’d like to add a GitHub Actions workflow (yes, but we’ll modify it anyway), and finally, if you want to install the browsers (you do). If a new Playwright version asks you more questions, the default answers (capital letters) should be a good start.
This will create some basic scaffolding – package.json
with the dependencies, package-lock.json
(commit it!), playwright.config.ts
with some defaults, and example tests. It’s enough to run the example tests and see the nice report:
If anything went south in this process, checking the official documentation, StackOverflow, GitHub issues, or any search engine was enough for me every time. The community is there and is really helpful!
The default configuration is well-commented and reasonable for most of the projects out there. You may need to change things, or – like me – you want to see all the options; no worries, the documentation has it all.
Let’s focus on the projects
array now. Every element defines a TestProject
; for the sake of this text, think of them like different browser configurations. It’s not only the browser itself, but also the viewport (window size), geolocation, or touch capabilities. Start with one of the predefined devices
and configure as needed later – again, these are reasonable defaults.
The default configuration (formatted) looks like this:
;
;
;
;
We have three projects here, one for each major browser, running on desktop (large resolution, no touch capabilities). There are also two commented-out ones for mobile browsers and two more for branded variants of Chromium.
What’s important here is that the name
is our internal thing. Playwright uses it only for configuration and reporting. That makes it easy to have multiple instances of the same browsers but different capabilities (e.g., using a different viewport). You can check that by renaming them and running the tests again.
Every CI has a different way of configuring the jobs and steps, but usually, it’s as simple as a text file stored in the repository or some interface to describe it. GitHub Actions does the former, and the configuration (called “workflow”) is a YAML file. Of course, it’s well-documented too.
Let’s see what the default generated by Playwright does, line by line. First, we give the workflow a name. It’s completely optional, but it will appear in GitHub Actions UI, so it’d be nice for it to make sense.
name: Playwright Tests
Next, we say when it’s supposed to run. Here it’s on every push to either main
or master
branch and on every push in a pull request against these branches. That means the tests will check the merged as well as the “pending” code. If the tests fail, it’ll be displayed in GitHub (in the pull request, on the commit, etc.).
on:
push:
branches:
pull_request:
branches:
Finally, we configure the job – the heart of our workflow. Its name is, once again, not that important (it matters if you have more jobs in one workflow). At first, we call it test
, set a timeout of 60 minutes (default is 360 minutes!), and choose Ubuntu (Linux) as our operating system2.
jobs:
test:
timeout-minutes: 60
runs-on: ubuntu-latest
The core of the job are the steps – single units of work. These can execute shell instructions directly or use external packages called actions (yep, that’s where the name comes from).
In our case, it’s fairly simple: checkout the code, install Node.js (latest of v16), dependencies, and browsers. After that, run the tests and once they are done, upload the report and store it for 30 days (default is 90 days).
Note that the if
in the last step says always()
. That’s because the steps will stop executing once a previous one fails. We don’t want that – the report should be uploaded for both the failed and passed tests.
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: 16
- name: Install dependencies
run: npm ci
- name: Install Playwright Browsers
run: npx playwright install --with-deps
- name: Run Playwright tests
run: npx playwright test
- uses: actions/upload-artifact@v3
if: always()
with:
name: playwright-report
path: playwright-report/
retention-days: 30
With such a config, the CI will run all of our tests 3 times – once for every project (browser). Locally, these are run in parallel, as all modern computers have plenty of CPU cores to use. However, on CI, we usually have a rather limited number of these (only 2 on GitHub Actions on Ubuntu by default).
As you’ll add more and more tests, they can easily take a couple or even tens of minutes to complete. To get the test results faster, we could run the tests in different projects on different machines in parallel. The entire test run will take as much as one of the projects instead of all of them.
In GitHub Actions, we can easily do that by specifying a matrix
property on our job. It’s a set of fields where every one of them has a list of allowed values. In our case, we’d like to configure the project
there, with the three possible values, as defined in Playwright’s config.
Let’s add it to our test
job:
name: 'Test on ${{ matrix.project }}'
strategy:
fail-fast: false
matrix:
project:
When run, the CI will execute three jobs in parallel, one for each project. Please note that we also added fail-fast: false
– it means that the CI won’t stop other jobs if one of them fails (that’s the default behavior). The name
displays the project it’s running, so we can see that easily in the UI.
But we’re not done yet! Sure, we run the job multiple times, but every one of them still runs all of the projects. To change that, we’ll tell Playwright to run only the one from this job:
-run: npx playwright test
+run: npx playwright test --project=${{ matrix.project }}
This works as expected, but has one major flaw – the report no longer contains all of the tests, but only the last one. There’s a pending feature request and even a pull request for that already, but as of today, we have to work around it. Sure, we could use a tool that does that, but a “simple enough” solution is to store all of the reports separately by reconfiguring the report name:
-name: playwright-report
+name: playwright-report-${{ matrix.project }}
As I said in the previous section, given enough tests, the complete run of all tests can take a while. As we go further, running even a single project may be too long to impact our productivity. To address that, Playwright introduced “shards” (other test runners did that too, including Jest). A single shard is simply a part of our tests, i.e., the n
-th part of m
parts total is the n/m
shard.
To use that, we’ll once again use the matrix
. Let’s split the tests into 4 parts:
shard:
Make sure the name
reflects that too:
-name: 'Test on ${{ matrix.project }}'
+name: 'Test on ${{ matrix.project }} (${{ matrix.shard }}/4)'
Configure Playwright to split the tests:
-run: npx playwright test --project=${{ matrix.project }}
+run: npx playwright test --project=${{ matrix.project }} --shard=${{ matrix.shard }}/4
And once again, update the report name:
-name: playwright-report-${{ matrix.project }}
+name: playwright-report-${{ matrix.project }}-${{ matrix.shard }}
As you can see, it wasn’t that hard! With this setup, the CI will run 12 parallel jobs (3 projects, 4 shards each). Sure, it also generates 12 reports, but we can live with that for the time being.
There’s one thing that itches me, though – the shard
is configured in the matrix
, but the /4
part is not. There’s strategy.job-total
, but it’s the total number of jobs, i.e., 12. If we’d use it anyway, we’d run 1/12
, 2/12
, 3/12
, and 4/12
, so only one third of our test suite3.
However, we can work around it by creating a new matrix
parameter called shardTotal
with exactly one value, and use it instead of all the 4
s in the workflow (${{ matrix.shardTotal }}
). The overall number of jobs stays the same, as it got multiplied by 1.
shardIndex:
shardTotal:
(I also renamed shard
to shardIndex
because I really like when the names have the same length… There’s no hope for me in this manner.)
Our jobs are now highly parallelized, which leads to reduced overall waiting time. That was our goal, and we achieved that – awesome. But the thing is that every single job is entirely independent, including the setup. Even if the setup takes two minutes, if we’ll multiply it by 12 jobs, we have 24 minutes of additional charges. (It’s not a lot, but it’s something.)
We can optimize two steps in our current workflow: installing dependencies and installing browsers. The former uses the official setup-node
action, and all we’d like to do here is to enable caching of the dependencies:
cache: npm
For the latter, we can use an official Playwright Docker image. It has all of the browsers installed already, so we no longer have to do (and pay for) that. To use it, first, add the container
property to our job:
container:
image: mcr.microsoft.com/playwright:v1.30.0
It’s important to keep the version of this image in sync with the version of @playwright/test
in your package.json
; otherwise, the test may fail due to version mismatch (or worse – pass when they shouldn’t). With that done, we can get rid of the entire “Install Playwright Browsers” step, as the browsers are guaranteed to be already there.
In my brief tests, it cut roughly 30 seconds of every job. Your mileage may vary, but it’s something. Remember, GitHub Actions charge you for minutes, not seconds! (There was a discussion about that, but it seems it got shut4.)
Here’s the complete GitHub Actions workflow you can just copy and paste in your projects. As usual, getting started is quick and easy but knowing why it has to work like that takes much more time.
Having said that, this workflow is a great place to start. I had to adjust it for my needs too, e.g., pin action versions (for security reasons), add a simple Slack notification (I used this action), and configure both repository_dispatch
and workflow_dispatch
events to trigger the workflow via the API and manually.
name: Playwright Tests
on:
push:
branches:
pull_request:
branches:
jobs:
test:
name: 'Test on ${{ matrix.project }} (${{ matrix.shardIndex }}/${{ matrix.shardTotal }})'
runs-on: ubuntu-latest
container:
image: mcr.microsoft.com/playwright:v1.30.0
strategy:
fail-fast: false
matrix:
project:
shardIndex:
shardTotal:
timeout-minutes: 60
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: 16
cache: npm
- name: Install dependencies
run: npm ci
- name: Run Playwright tests
run: npx playwright test --project=${{ matrix.project }} --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }}
- uses: actions/upload-artifact@v3
if: always()
with:
name: playwright-report-${{ matrix.project }}-${{ matrix.shardIndex }}
path: playwright-report/
retention-days: 30
Playwright is a great tool! It comes with a ton of options, decent defaults, complete documentation, and a vibrant community. Similarly, GitHub Actions is my go-to CI for new projects.
There are a few things my ideal CI would do better, but that’s a separate rant post to make. Similarly, I’d like to see a few improvements in Playwright, but luckily for me, most of them are either already worked on or are filed as feature requests.
Now all I have to do is wait.
Having tens of projects and teams working on them available almost immediately is an immense benefit of working for a software house. Sure, companies “large enough” may have the same, but it’s definitely not the case for small and medium product ones.
If you’d like to install it in an existing project, the best would be to create a new one out of the template and just copy the dependencies and configuration. I recommend having it in a separate one, though (in a monorepo, if you will), to keep the dependencies independent. It may require more work to reuse the app’s code, depending on your setup, but I think it’s worth it in the long run.
You can choose Windows or macOS too, but for the vast majority of projects, it doesn’t make any sense. Playwright uses real browsers, and the operating system shouldn’t be a big factor here. There’s also the cost – all services I know of charge far more for Windows and macOS machines. For GitHub Actions, it’s 2x and 10x more respectively.
The strategy.job-total
parameter would be completely fine with sharding without parallelizing the projects
, though.
There’s only one instance of it in the Wayback Machine and it has an answer suggesting that it’s the expected behavior. Who knows, maybe it’ll change at some point. (I hope so.)