Ptolemaic View of the Universe by Andreas Cellarius.
Ptolemaic View of the Universe by Andreas Cellarius. Image sourced from the Public Domain Image Archive / Stanford University Libraries

Cucumber and Test Containers; A natural fit

Intro

Span of tests, regressions, verifying behaviour, and drawbacks

A well tested system is rare: one that catches regressions, one that is easy to extend, one that inspires confidence in refactors, and whose tests are never flaky. It might seem so far away from reality that it is not even worth trying, or one might think that even starting to attempt to maintain such tests is more work than it is worth.

Writing unit tests seems trivial0, but miss the big picture. Sometimes they seem to not test more than "given the data exists, then I get the data". No, we would want to pair our unit tests with broader tests such as end-to-end tests.
End-to-end tests can be difficult to maintain, they are prone to flakiness, and sometimes breaks for no good reason. They also take quite a long time to run, making any effort to do TDD-like development feel a real chore.
However, if we narrow our focus just a little bit, and disregard testing the UI and user workflows, and instead focus on the actions and consequent effects that our users perform when interacting with our system we can achieve many of the desired qualities of systems with end-to-end tests, but we can avoid almost all of the drawbacks.

Testing in Middle and Cucumber

I have found that we often would like to position our testing in the middle of our system, quite a distance from a unit test and closer to, but not quite, an end-to-end test. This preserves the benefits of both: fast execution, fast to develop, wide enough span, and easy to maintain. A simple way of doing this is API-level testing, starting our application and sending actual HTTP request to our server. The problem is that these can be quite cumbersome to write, but even worse they are even more cumbersome to read and verify.

Tests in Plain Language

Cucumber is a tool for running BDD-style scenarios, in a structured language called Gherkin, which reads like plain text in many cases. The main benefit being communication between developer, testers, and other stake holders.


Scenario: User E-mail conflict
  When I register a new user with 
    | email              | password |
    | exists@example.com | hunter2  | 
  And I register a new user with 
    | email              | password |
    | exists@example.com | hunter3  | 
  Then I am made aware that the user is already registered

Example scenario in Gherkin

Plain text testing is what Cucumber is trying to sell as the main benefit, I would however argue that communication is not necessarily the biggest benefit, at least not to the average developer.

Rather the benefits are that regression in behaviour is easily spotted and remedied; instead of reading through where a test case might have broken, we instead focus on if the behaviour has changed.

The tests are still relatively fast to execute, execution of a singular scenario is fast enough to run to iterate as you would when running unit tests.1

They model actual user flows; compared to carefully setting up a state which you then execute on, steps are so composable and model actual actions to the system that it is easier to re-use them than to write a new complex setup for each new test.

Doing "test first"2 development at the behaviour level is easier than at the unit test level; since we often go into a problem with a set of behaviours, we might not even have an idea of how to implement the behaviour as we start writing code, but we know what the behaviour should be.

Of course, Cucumber is not the only way to achieve this. Instead you could use API-level testing with libraries like jest and similar, but to me there is nothing like reading a clean diff of good old plain language in 5 lines rather than a 100 line diff for a specific setup that has need to be re-used for any other test.

A challenge when working with Cucumber is that if offers no solution for dealing with persistance and working with dependencies to the system under test, which for most application proves to be a real challenge.

Handling state

When writing tests that handles user flows we often want to handle stateful changes in our application, such as a user logging in, entities being created in the database, you name it.

Since we are running multiple tests we should be able to reset the state between them, get a clean slate and make sure that each scenario can be ran independently, and hopefully also in parallel.

A naive approach to this is spinning up containers for the services your application connects to before running your tests, and then in the teardown, after each scenario, simply wipes them to a clean state. The obvious one being that we have a step that we need to perform before even being able to run the scenarios, some less obvious ones are that we can never be sure that we have cleaned the state properly, and that we are limited to running exactly one scenario at a time.

Another approach is to instead substitute your services with in-memory versions of them, Mongo collections can be substituted with simple arrays; Redis can be a map, etc. This will work well, and will solve the problems associated with running the containers. But to no surprise this brings other challenges. The main challenge is to maintain parity between the in-memory versions of your services and those we actually use in production. I'd however argue that using this approach we can never actually be sure that our testing environment is one-to-one with our production environment, even if we have an incredibly limited feature set available.

Ideally we would want the best of both worlds; real, disposable, and isolated dependencies.

Enter Test Containers

Testcontainers, is a relatively new technology getting started in 2015 and getting mainstream within the 2020s with official support from Docker themselves. Testcontainers is a tool supported, at the time of writing, in twelve different languages.

Testcontainers simply and elegantly handle starting small containers, port allocation to avoid conflicts between the containers themselves and other services that might be running on your machine, making sure that the containers is ready before running your test, and automatically cleaning up your containers (even if your tests exited abnormally!). Starting and running a container can be as simple as describing which image to use, which port to expose, and a waiting strategy to determine when the container is ready to interact with.

Below is an example to start a Mongo container in TypeScript:


  import { GenericContainer, Wait } from 'testcontainers';

  async function setupMongoContainer() {
    const container = await new GenericContainer('mongo:8.2.1')
      .withExposedPorts(27017)
      .withWaitStrategy(Wait.forLogMessage('Waiting for connections'))
      .start();

    const host = container.getHost();
    const port = container.getMappedPort(27017);

    return { container, host, port };
  }

How you pass the actual host, port, and other configuration to your application that is extracted from running test containers is up for you to decide. So make sure you have some way of making that possible, be it by writing to process.env or similar, you need to have some way of passing the information from the running containers to your application after they have been started.

Example in TypeScript

I have compiled a small project to act as a working example; running ExpressJs and connecting to a Mongo and Redis instance dynamically. You can find the full source code here, but for brevity I have added just the interesting parts below. In the project we handle a simple registration and login feature for users. The project is by no means meant to represent a production ready system, but rather as an example on how do successfully set your app for being easily configurable by passing the ports that were dynamically resolved from testcontainers, but also how we pass data (in this case the port we are running our server on) back to the acceptance world, and how to shutdown the application when a scenario has finished.

The TestWorld is also responsible for keeping a reference to the runnning application and started containers.


import { After, Before, setWorldConstructor } from '@cucumber/cucumber';
import { TestWorld } from './world';

setWorldConstructor(TestWorld);

// Leave some time here if the runner doesn't have the images downloaded yet.
// This will only matter the first time per machine.
Before({ timeout: 30_000 }, async function (this: TestWorld) {
  await this.start();
});

After(async function (this: TestWorld) {
  await this.stop();
}) 

  import { IWorldOptions, World as CucumberWorld } from '@cucumber/cucumber';
  import { GenericContainer, StartedTestContainer, Wait } from 'testcontainers';
  import { startServer, stopServer, StartedServer } from '../../src/server';
  import { Config, createConfig } from '../../src/config';

  export class TestWorld extends CucumberWorld {
    private started?: StartedServer;
    private mongoContainer?: StartedTestContainer;
    private redisContainer?: StartedTestContainer;

    public config?: Config;
    public lastResponse? : { status: number, body: unknown, headers: Record<string, string> };

    constructor(options: IWorldOptions) {
      super(options);
    }

    async start() {
      const [ mongoContainer, redisContainer, ] = 
        await Promise.all([
          new GenericContainer('mongo:7')
            .withExposedPorts(27017)
            .withWaitStrategy(Wait.forLogMessage(/Waiting for connections/i))
            .start(),
          new GenericContainer('redis:7-alpine') 
            .withWaitStrategy(Wait.forLogMessage(/Ready to accept connections/i))
            .withExposedPorts(6379)
            .start(),
      ]);

      this.mongoContainer = mongoContainer;
      this.redisContainer = redisContainer;

      const mongoPort = mongoContainer.getMappedPort(27017);
      const mongoHost = mongoContainer.getHost();
      const redisPort = redisContainer.getMappedPort(6379);
      const redisHost = redisContainer.getHost(6379);

      this.config = createConfig({
        port: 0, // NOTE: Let OS assign us a port
        mongoUrl: `mongodb://${mongoHost}:${mongoPort}`,
        redisUrl: `redis://${redisHost}:${redisPort}`,
        nodeEnv: 'test'
      });
      this.started = await startServer(this.config);
    }

    get appPort(): number {
      const address = this.started?.server.address();
      if (typeof address === 'string' || !address) {
        throw new Error('Expected server to be started and have a port!');
      }
      return address.port;
    }; 

    async stop() {
      if (this.started) await stopServer(this.started);
      if (this.mongoContainer) await this.mongoContainer.stop();
      if (this.redisContainer) await this.redisContainer.stop();
    } 
  } 

There are a few things I want to highlight here; firstly that we are creating and passing a configuration object and passing it to our application as we start it. This allows us to pass the correct ports and host to reach our services, and second that we are allowing the operating system to determine the port our application is supposed to run on by passing 0. These two things two details allows for complete isolation of our test worlds!

Conclusion

Using Testcontainers and Cucumber together might be exactly what you are looking for when it comes to a level of testing that is fast to develop, has wider span than unit tests, and faster to develop and execute than browser automated end-to-end tests. Combined with testcontainers and we can solve the common issue of dealing with infrastructure and persistance, confidently verify the behaviour of our system with real dependencies rather than mocks, while still maintaining fast iteration and non-flaky tests.


1. Obviously this depends on your system and language. My experience comes mostly from using for JS/TS, which doesn't necessarily need to compile multiple files.
2. I refrain from calling this Test Driven Development. The reason is two-fold; firstly that some people do not call these scenarios "tests", but insists on them being something else. And second, I fear that TDD, much like Agile, has become a concept that means something different to each person who hears it.

Appendix

Example scenarios

 
  # File: login.feature 
  Feature: Authentication 
    Background:
      Given I register a new user with 
        | email              | password |
        | exists@example.com | hunter2  | 

    Scenario: Login with unregistered user 
      When I login with email "test@example.com" and password "supersecret" 
      Then I am denied 

    Scenario: Successful login 
      When I login with email "exists@example.com" and password "hunter2" 
      Then I am logged in 

  # File: register.feature 
  Feature: Register
    Scenario: Successfully register a new user
      When I register a new user with 
        | email              | password |
        | exists@example.com | hunter2  | 
      Then I have successfully registered

    Scenario: User E-mail conflict
      When I register a new user with 
        | email              | password |
        | exists@example.com | hunter2  | 
      And I register a new user with 
        | email              | password |
        | exists@example.com | hunter3  | 
      Then I am made aware that the user is already registered

The Feature Files and Step-Definitions

As an example we have split the feature into two, registration and logging in. This is good practice as it would help isolate regressions much easier than coupling the feature together. However, as you can see and probably already assumed, registration is a pre-requisite to being able to login, but if we were to have a regression where only the login feature scenario fails, we get a bit more information that what broke is more likely to be the "logging in" part, rather than the registration.

Note the very non-technical description in each step-definition, there is no concept of Mongo, Redis, or even HTTP!

We instead leave it to the step-definitions to define what counts as a "successful registration".

 
import { Given, When, Then, DataTable } from '@cucumber/cucumber';
import { expect } from 'chai';
import { TestWorld } from '../support/world';

Given('I register a new user with', async function (this: TestWorld, dataTable: DataTable) { 
  if (!this.config) throw new Error('Config not initialized');
  const [ { email, password } ] = dataTable.hashes();
  const res = await fetch(
    `http://localhost:${this.appPort}/api/auth/register`, 
    { 
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ email, password }) 
    }
  );
  const body = await res.json();
  this.lastResponse = { status: res.status, body, headers: Object.fromEntries(res.headers.entries()) };
}); 

When('I login with email {string} and password {string}', async function (this: TestWorld, email: string, password: string) {
  if (!this.config) throw new Error('Config not initialized'); 
  const res = await fetch(`http://localhost:${this.appPort}/api/auth/login`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ email, password }) });
  const body = await res.json();
  this.lastResponse = { status: res.status, body, headers: Object.fromEntries(res.headers.entries()) };
});

Then('I am made aware that the user is already registered', function (this: TestWorld) {
  expect(this.lastResponse).to.exist;
  expect(this.lastResponse?.status).to.equal(409);
});

Then('I have successfully registered', function (this: TestWorld) {
  expect(this.lastResponse).to.exist;
  expect(this.lastResponse?.status).to.equal(201);
}); 
                    
Then('I am logged in', function (this: TestWorld) {
  const setCookie = this.lastResponse.headers?.['set-cookie'];
  expect(setCookie).to.be.an('string').that.includes('connect.sid');
}); 
                    
Then('I am denied', function (this: TestWorld) {
  expect(this.lastResponse).to.exist;
  expect(this.lastResponse?.status).to.equal(401);
}); 

As you can see we are passing this of the type TestWorld as a parameter to our test. this is an instance of a World, which is started for each scenario and is what allows us to store data between steps, and as such make requests in one of the steps and assertions in another.