Integration, digital art by DALL-E. I think I will frame this.

A journey through the Testing Pyramid: why your tests are bad

The path toward a successful software release has become complicated. For most customer-facing applications, it is no longer feasible to manually build an RPM file and then copy it to a folder on Production. Or make sure some DLLs are copied into the correct places on all Production servers, all at 4am on a Monday.

8 min readMar 14, 2024

Testing is difficult

Customers expect a fully functional app and regular bug fixes. Web applications are vast and present a complex task for manual testing. The only way is to automate. But automating your application's orchestration, scaling, and deployment would be a good idea before those tests. Don’t forget the automated tests.

But you forgot the tests, so you added monitoring to your Production environment instead. But something doesn’t feel right. Uneasy. Like that feeling when you know one of your shoes is untied without looking to see if it is, in fact, untied. So you go back to adding tests. But the code is done, and no one feels like adding some unit tests for the code they wrote last week. So why not just add in some E2E Public API Automated tests?

The three-sided shape

If this story above sounds familiar, then it’s because it’s a project you’ve worked on. It’s a project I’ve worked on. For a QA Engineer creating automated tests, there are ways to deal with the pain of this software development nightmare.

The Testing Pyramid is one such approach to balancing Automated Testing requirements, whatever the project methodology.

Everything sounds good in theory. But in real life, what does the Testing Pyramid mean? First, here's a reminder of what the Testing Pyramid is.

And before getting into the weeds, there needs to be a clarification of some commonly misunderstood terms. The list below shows the 4 areas of automated testing. Sometimes this is represented using the heavily overused Testing Pyramid. So overused that it could be a meme. It’s a wonderfully simple example, so it’s hard to beat. I have also seen a Venn Diagram of this, hexagons, ice-cream cones, different names for the stages, a B52 Bomber shape, and even a Trophy. Below I’ve gone for a good old-fashioned list.

1. Static Analysis

Give the least confidence that some code works
Alongside Unit Tests, they provide a strong base

2. Unit Testing

Focus is on a unit of work
How one function handles some input

3. Integration Testing

How something interacts with something else in the same system
How a unit of code interacts with other units
How the code connects to other services

4. End-to-End Testing

Create confidence that an AUT (Application Under Test) is working for an End User

But in real life, what does the Testing Pyramid actually mean?

Automated End-to-End (E2E) Tests are at the top of the pyramid for a few reasons. They’re expensive to create and maintain. Feedback is slow. Generally not useful apart from regression testing or sanity checks. So there shouldn’t be too many. But really, is this true? I found out the hard way. What follows below is my experience. A few more things before we move on — in the organisation I work, all teams have their own QA experts. A Manual QA, as well as a SDET-type QA.

Lesson 1: The E2E Automated Tests were only valuable to a handful of people.

Only a few developers find them useful or remember to check the status of an Automated Test run. A decision was made early on to have the tests non-blocking which was ok at the start — the QA Team could “babysit” the tests and monitor for failures. However, this approach became increasingly painful as the application grew in size and complexity.

Lesson 2: long-running tests suck

They typically have longer execution times than unit tests or integration tests. Running a large suite of E2E tests can slow down the feedback loop and increase the time it takes to validate changes.

The tests are mainly E2E tests and focus on Customer Journeys using the API. They can take a long time. Some suites would take more than 20 minutes.

Lesson 3: Don’t undervalue the speed of feedback

Like the previous lesson, feedback on tests was slow (greater than 12 minutes on average). No one likes feedback on code quality that takes longer than 1 minute. 2 minutes at the most.

Lesson 4: Identifying problems quickly

A service must be deployed to an environment along with all the other services. So this means it wasn’t possible to reliably test anything before or after it was deployed to an environment. This was mainly due to the complex infrastructure of the software application that was being tested. This means that it’s hard, sometimes impossible to know why a failure has happened.

We have a gigantic suite of E2E Tests that sometimes fail because of a range of reasons (network issues, some upstream system has gone down, test data couldn’t be created) and fail for different reasons. It turned into a huge time-sink fixing these tests and reporting the issues. These tests have seen better days. They have provided value, but now it’s time to move on.

Lesson 5: There needs to be clear ownership

Most software developers like the idea of Automated Testing. But a combination of apathy and Feature Work means that they don’t contribute to them or take time to understand the tests.

Because of the way the teams are set up, there are no Automation Testers who are not part of a team. So probably a mixture of pride in their work (they worked on creating the E2E Test suite), and their strong belief in the benefits of Automated Testing, they choose to maintain the E2E Test suite.

The E2E tests that we have can’t be distributed between teams because of how all the services interact. Each team owns a set of services. One Customer Journey could hit a lot of different services, owned by different teams. If there is a failure in one of those services, the test will fail. Finding the reason for the failure involves a tedious triage process. It involves debugging, rerunning the test, debugging some more, looking through logs, and finding the root cause for the failure. Then they can assign it to the team that had a failure in the service. And then that team can begin to work on fixing the bug. Moreover, this debugging process is done by an Automation Engineer who could do something much better with their time.

A team having ownership means they are responsible for maintaining the tests if they make any changes to the code base. The tests could even be part of the CICD workflow for their collection of services.

The Aftermath and the Solution

TLDR; We have decided to fully embrace Integration Testing. We’re working to replace nearly all of our E2E API Tests with Integration Tests. There’ll be a few E2E Tests for critical customer journeys, such “is a user authenticated and can they perform these actions”. Of course, there’ll also be static analysis and plenty of unit tests. The Pyramid is looking balanced again.

Any interaction points are great candidates for adding Integration Tests. Such as if any external systems are called or if an upstream service is required to do something.

An ELI5 way to think about it is that software is a jigsaw puzzle. For the pieces to fit nice and flush with the other pieces, we need to make sure that the piece’s innie fits with the other piece’s outie. This “making sure” bit is the Integration Test. The Unit Test would be “are there innies and outies”, and the E2E test would be do all the piece fit together and look good.

DALL·E’s attempt at creating a jigsaw puzzle. Looks pretty good. Fingers check out. But then you start to notice that the innies and outies of the pieces are not quite right….

There can be some overlap between Integration Testing, Unit Testing and E2E Testing. However, the discussion of this idea deserves another article. But the basic message is that as long as you have a clear idea of what the tests are created for, and there is a clear boundary between what is being tested and why, there won’t be a lot of overlap.

Final thoughts

To finish up this article, I want to quickly go over why Integration Tests were the answer to all of the Lessons/complaints listed above.

Lesson 1: The E2E Automated Tests were only valuable to a handful of people

Integration Tests bring value to anyone. All of the other answers below are reasons why Integration Tests bring more value than our existing E2E Test suite. Tests can run when a Pull Request is created and the results are available within a small timeframe.

Lesson 2: long-running tests have limited uses

The E2E Tests could take as long as 20 minutes. Integration Tests can be a bit long running, too, but if the time taken is pushing 5 minutes, then take that as a warning. A long-running test suite is more suitable for Regression Testing, rather than quick feature development.

Lesson 3: Don’t undervalue the speed of feedback

Everyone loves feedback on their work. Any code creator loves to see that green tick in the GitHub Action or a message in Slack saying Integration Tests PASSED on their PR. Once a developer submits a Pull Request, there’s a small window before they mentally move on to the next task. Feedback arriving during that small window is best.

Lesson 4: Identifying problems quickly

Similar to Lesson 3. Tests run whenever a single service is built and deployed, live data is used, and quality can be measured more quickly. So this all adds to the feeling of developer safety when code is being changed. That is one of the aims of automated testing

Lesson 5: There needs to be clear ownership

This means that a team can completely own a service. They can fully own their service. The horrible buzzword of Shift Left Testing means that testing happens earlier in the life of some software. Automated E2E Tests usually occur late in the software development life cycle as regression testing activities. There could be suites of E2E tests running constantly. However, this approach has a few drawbacks that are not in the scope of this article.
On the other hand, Integration Tests can be run early and often. This article mentions that we’re using them on merge and deploy actions, but we can also run the tests locally against a real environment. They can be run through a npm script. They can be run whenever there is a commit.

The elephants in the IDE are the challenges that integration testing brings. And there are many challenges. But in my experience so far, there are more benefits. A few of these challenges:

creating test data (ie. should real data be used in a request/validated in a response)
always need to remember to clean up data created during testing
need to find more creative ways to make sure testing one service doesn’t affect other connected services
there is a learning curve to creating good Integration Tests
the added time it takes to create an Integration Test
there must be a great framework of activities as referenced by The Testing Pyramid. There should even a small amount of manual QA work. After all, we’re building software for humans so it’s only fitting that a human is used in testing at some point during the software development process.

A journey through the Testing Pyramid: why your tests are bad

Testing is difficult

The three-sided shape

1. Static Analysis​

2. Unit​ Testing

3. Integration​ Testing

4. End-to-End​ Testing

But in real life, what does the Testing Pyramid actually mean?

Lesson 1: The E2E Automated Tests were only valuable to a handful of people.

Lesson 2: long-running tests suck

Lesson 3: Don’t undervalue the speed of feedback

Lesson 4: Identifying problems quickly

Lesson 5: There needs to be clear ownership

The Aftermath and the Solution

Final thoughts

Written by Kris Raven

1. Static Analysis

2. Unit Testing

3. Integration Testing

4. End-to-End Testing