A journey through the Testing Pyramid: why your tests are bad
The path toward a successful software release has become complicated. For most customer-facing applications, it is no longer feasible to manually build an RPM file and then copy it to a folder on Production. Or make sure some DLLs are copied into the correct places on all Production servers, all at 4am on a Monday.
Testing is difficult
Customers expect a fully functional app and regular bug fixes. Web applications are vast and present a complex task for manual testing. The only way is to automate. But automating your application's orchestration, scaling, and deployment would be a good idea before those tests. Don’t forget the automated tests.
But you forgot the tests, so you added monitoring to your Production environment instead. But something doesn’t feel right. Uneasy. Like that feeling when you know one of your shoes is untied without looking to see if it is, in fact, untied. So you go back to adding tests. But the code is done, and no one feels like adding some unit tests for the code they wrote last week. So why not just add in some E2E Public API Automated tests?
The three-sided shape
If this story above sounds familiar, then it’s because it’s a project you’ve worked on. It’s a project I’ve worked on. For a QA Engineer creating automated tests, there are ways to deal with the pain of this software development nightmare.
The Testing Pyramid is one such approach to balancing Automated Testing requirements, whatever the project methodology.
Everything sounds good in theory. But in real life, what does the Testing Pyramid mean? First, here's a reminder of what the Testing Pyramid is.
And before getting into the weeds, there needs to be a clarification of some commonly misunderstood terms. The list below shows the 4 areas of automated testing. Sometimes this is represented using the heavily overused Testing Pyramid. So overused that it could be a meme. It’s a wonderfully simple example, so it’s hard to beat. I have also seen a Venn Diagram of this, hexagons, ice-cream cones, different names for the stages, a B52 Bomber shape, and even a Trophy. Below I’ve gone for a good old-fashioned list.
1. Static Analysis
- Give the least confidence that some code works
- Alongside Unit Tests, they provide a strong base
2. Unit Testing
- Focus is on a unit of work
- How one function handles some input
3. Integration Testing
- How something interacts with something else in the same system
- How a unit of code interacts with other units
- How the code connects to other services
4. End-to-End Testing
- Create confidence that an AUT (Application Under Test) is working for an End User
But in real life, what does the Testing Pyramid actually mean?
Automated End-to-End (E2E) Tests are at the top of the pyramid for a few reasons. They’re expensive to create and maintain. Feedback is slow. Generally not useful apart from regression testing or sanity checks. So there shouldn’t be too many. But really, is this true? I found out the hard way. What follows below is my experience. A few more things before we move on — in the organisation I work, all teams have their own QA experts. A Manual QA, as well as a SDET-type QA.
Lesson 1: The E2E Automated Tests were only valuable to a handful of people.
Only a few developers find them useful or remember to check the status of an Automated Test run. A decision was made early on to have the tests non-blocking which was ok at the start — the QA Team could “babysit” the tests and monitor for failures. However, this approach became increasingly painful as the application grew in size and complexity.
Lesson 2: long-running tests suck
They typically have longer execution times than unit tests or integration tests. Running a large suite of E2E tests can slow down the feedback loop and increase the time it takes to validate changes.
The tests are mainly E2E tests and focus on Customer Journeys using the API. They can take a long time. Some suites would take more than 20 minutes.
Lesson 3: Don’t undervalue the speed of feedback
Like the previous lesson, feedback on tests was slow (greater than 12 minutes on average). No one likes feedback on code quality that takes longer than 1 minute. 2 minutes at the most.
Lesson 4: Identifying problems quickly
A service must be deployed to an environment along with all the other services. So this means it wasn’t possible to reliably test anything before or after it was deployed to an environment. This was mainly due to the complex infrastructure of the software application that was being tested. This means that it’s hard, sometimes impossible to know why a failure has happened.
We have a gigantic suite of E2E Tests that sometimes fail because of a range of reasons (network issues, some upstream system has gone down, test data couldn’t be created) and fail for different reasons. It turned into a huge time-sink fixing these tests and reporting the issues. These tests have seen better days. They have provided value, but now it’s time to move on.
Lesson 5: There needs to be clear ownership
Most software developers like the idea of Automated Testing. But a combination of apathy and Feature Work means that they don’t contribute to them or take time to understand the tests.
Because of the way the teams are set up, there are no Automation Testers who are not part of a team. So probably a mixture of pride in their work (they worked on creating the E2E Test suite), and their strong belief in the benefits of Automated Testing, they choose to maintain the E2E Test suite.
The E2E tests that we have can’t be distributed between teams because of how all the services interact. Each team owns a set of services. One Customer Journey could hit a lot of different services, owned by different teams. If there is a failure in one of those services, the test will fail. Finding the reason for the failure involves a tedious triage process. It involves debugging, rerunning the test, debugging some more, looking through logs, and finding the root cause for the failure. Then they can assign it to the team that had a failure in the service. And then that team can begin to work on fixing the bug. Moreover, this debugging process is done by an Automation Engineer who could do something much better with their time.
A team having ownership means they are responsible for maintaining the tests if they make any changes to the code base. The tests could even be part of the CICD workflow for their collection of services.
The Aftermath and the Solution
TLDR; We have decided to fully embrace Integration Testing. We’re working to replace nearly all of our E2E API Tests with Integration Tests. There’ll be a few E2E Tests for critical customer journeys, such “is a user authenticated and can they perform these actions”. Of course, there’ll also be static analysis and plenty of unit tests. The Pyramid is looking balanced again.
Any interaction points are great candidates for adding Integration Tests. Such as if any external systems are called or if an upstream service is required to do something.
An ELI5 way to think about it is that software is a jigsaw puzzle. For the pieces to fit nice and flush with the other pieces, we need to make sure that the piece’s innie fits with the other piece’s outie. This “making sure” bit is the Integration Test. The Unit Test would be “are there innies and outies”, and the E2E test would be do all the piece fit together and look good.
There can be some overlap between Integration Testing, Unit Testing and E2E Testing. However, the discussion of this idea deserves another article. But the basic message is that as long as you have a clear idea of what the tests are created for, and there is a clear boundary between what is being tested and why, there won’t be a lot of overlap.
Final thoughts
To finish up this article, I want to quickly go over why Integration Tests were the answer to all of the Lessons/complaints listed above.
Lesson 1: The E2E Automated Tests were only valuable to a handful of people
Integration Tests bring value to anyone. All of the other answers below are reasons why Integration Tests bring more value than our existing E2E Test suite. Tests can run when a Pull Request is created and the results are available within a small timeframe.
Lesson 2: long-running tests have limited uses
The E2E Tests could take as long as 20 minutes. Integration Tests can be a bit long running, too, but if the time taken is pushing 5 minutes, then take that as a warning. A long-running test suite is more suitable for Regression Testing, rather than quick feature development.
Lesson 3: Don’t undervalue the speed of feedback
Everyone loves feedback on their work. Any code creator loves to see that green tick in the GitHub Action or a message in Slack saying Integration Tests PASSED on their PR. Once a developer submits a Pull Request, there’s a small window before they mentally move on to the next task. Feedback arriving during that small window is best.
Lesson 4: Identifying problems quickly
Similar to Lesson 3. Tests run whenever a single service is built and deployed, live data is used, and quality can be measured more quickly. So this all adds to the feeling of developer safety when code is being changed. That is one of the aims of automated testing
Lesson 5: There needs to be clear ownership
This means that a team can completely own a service. They can fully own their service. The horrible buzzword of Shift Left Testing means that testing happens earlier in the life of some software. Automated E2E Tests usually occur late in the software development life cycle as regression testing activities. There could be suites of E2E tests running constantly. However, this approach has a few drawbacks that are not in the scope of this article.
On the other hand, Integration Tests can be run early and often. This article mentions that we’re using them on merge and deploy actions, but we can also run the tests locally against a real environment. They can be run through a npm
script. They can be run whenever there is a commit.
The elephants in the IDE are the challenges that integration testing brings. And there are many challenges. But in my experience so far, there are more benefits. A few of these challenges:
- creating test data (ie. should real data be used in a request/validated in a response)
- always need to remember to clean up data created during testing
- need to find more creative ways to make sure testing one service doesn’t affect other connected services
- there is a learning curve to creating good Integration Tests
- the added time it takes to create an Integration Test
- there must be a great framework of activities as referenced by The Testing Pyramid. There should even a small amount of manual QA work. After all, we’re building software for humans so it’s only fitting that a human is used in testing at some point during the software development process.