Local Integration Testing With Pytest
The traditional testing pyramid encourages us to focus on writing unit tests that are fast and cheap, while avoiding slow and complex integration testing. Real-world experience also shows that poor design and complexity often go hand in hand; I've seen my share of messy and flaky integration test setups. No wonder that people have learned to avoid anything but unit testing.
There is obvious value in verifying how the units of our software integrate to serve use cases. You can't simply ignore integration testing, unless you absolutely love solving production incidents under high pressure. Fortunately, there are ways to test business logic and service integrations in a contained and controlled fashion. If we spend some time and care for the kinds of tests we write, we can achieve robust integration testing without the typical overhead and flakiness associated with higher-level tests.
A painful hike up the test pyramid
The testing pyramid is a good starting point for thinking about different types of tests: unit tests at the bottom, service or integration tests in the middle, and UI or end-to-end (E2E) tests at the top. Unit tests are fast and cheap, while moving upwards in the pyramid makes everything slower and harder to set up and maintain. It makes sense to heed the advice that the low-level unit tests should outnumber the higher level tests by an order of magnitude.
I once saw the integration test suite fail when a local username exceeded 10 characters, and a dynamically deployed message queue ran out of characters to give. Such experiences don't exactly instill confidence in running, let alone setting up and maintaining, tests that depend on anything beyond your immediate code.
Yes, tests involving multiple components, external services, and complex logic can be flaky and non-deterministic. However, you don't need to rely on E2E tests alone to verify business logic and service integrations. Many integrations can be contained and verified while avoiding complex setup and maintenance overhead.
A taxonomy of integration tests
In part we suffer from the lack of shared terminology. Integration tests can mean anything from checking a single component's interaction with a database to complex tests involving multiple services and shared external dependencies.
Let's use a loose taxonomy for discussing the different types of integration tests. We'll start by looking at unit tests that don't really "integrate" anything; moving on to component integration tests, which let components handle their own responsibilities and implementation details; stopping to review API layer tests, which validate contracts and data models at the border; and ending with a discussion of system integrations, which are likely to have some fuzziness and non-determinism.
Unit testing
A somewhat circular definition for unit tests goes: they test an "unit" of code. This begs the question of what a unit of code is.
Software is made of modules: functions, classes, programming language modules, and so on. Modules contain related functionality, reducing the context that a developer needs to reason about when working to solve a cohesive piece of a problem. Unit testing happens within the boundaries of one module, where you can focus on testing the programming logic instead of interactions with other modules.
The following unit test from my example Celery project verifies that the core logic for ciphering and deciphering text works as intended.
from celery_decipher.decipher.cipher import (
ROT13_CIPHER,
cipher,
decipher,
)
def test_decipher():
text = "cat"
ciphered = cipher(text, ROT13_CIPHER)
assert ciphered == "png"
deciphered = decipher(ciphered, ROT13_CIPHER)
assert deciphered == "cat"
The test uses a data fixture and functions imported from one Python module, which is a strong indicator that the test targets tightly related functionality. Well, either that or your modules are organized poorly. We use the public interface of the module, in this case, functions and their signatures, and avoid testing implementation details. If your unit tests break whenever the internal logic in the module changes, you should rethink the modularization of your code, e.g., ensure that the interface is narrow and doesn't leak abstractions.
Component integration tests
Working within one module is nice and comfortable, but usually doesn't cover many interesting use cases. Unless you inherited the one project, where everything was contained within a single C language function, causing me to lose my faith in humanity. Barring that, you most likely have some sort of a software architecture, where you need to integrate different components to achieve things like sending and receiving messages, uploading files, and querying and persisting data.
A basic functionality in most applications is to persist data, which can then be transformed and queried. Luckily, there isn't much effort in integrating a real database to a test setup, as we can run most of them in containers. I personally try my best to enable most services to run locally, which is a sentiment that many experienced developers agree with. This allows the tests to cover more ground instead of using test doubles for their dependencies.
The example project has a Docker Compose setup for running a local PostgreSQL server with a separate test database. This allows running component integration tests, while a connection to the test database is explicitly injected.
from celery_decipher.decipher.db import (
get_candidates,
get_source_text,
insert_source_text,
)
from celery_decipher.decipher.solver import (
POPULATION_SIZE,
initial_guess,
)
def test_initial_guess(testdb_cursor):
text = "Smoky smoke test"
source_text_id = insert_source_text(testdb_cursor, text)
initial_guess(testdb_cursor, source_text_id)
candidates = get_candidates(testdb_cursor, source_text_id)
assert candidates is not None
assert len(candidates) == POPULATION_SIZE
This example test uses database layer functions for inserting test data and fetching the results after calling the higher-level function which we are testing. When the modules are deep and their interfaces are narrow, our tests are clean: We don't need to care how the higher-level function initial_guess
does its magic, while it internally uses the same database module as we do. In fact, we have no business in looking inside initial_guess
, for example, poking for state changes within the solver
module. We should test the public interface instead, while avoiding assumptions about implementation details. In general, you shouldn't create tight couplings in tests, so that you can benefit from loose couplings between modules.
Granted, we do have to know that initial_guess
reads the source text from the database and stores results, which we query using the get_candidates
function. However, this doesn't violate the aforementioned principle per se, but is a part of the public interface, that is, the contract between the function and its caller. We should aim to keep this interface as narrow as possible and the contracts between modules clear. In the majority of cases, the interface should cover just the function signature, i.e., arguments to the function and what it returns. Nonetheless, in other cases allowing our contract to cover effects in databases and other side-effects doesn't add too much to the complexity. In our example, the affected database is provided as a parameter to the function call, so we're still being explicit about it.
In our walk up the pyramid, we are moving to higher and higher levels of abstraction. Whereas unit tests could validate internal logic within a module, component integration tests avoid looking into individual modules, test higher-level interfaces, and validate that components work together as intended. When we move to the next level in our hierarchy, we no longer care about the individual components at all, but treat the overall system as a black box.
API layer tests
The next layer of tests shifts our focus from the internal logic of a service to its external interface. In web services, this usually means testing the HTTP API layer. Martin Fowler has called such tests "subcutaneous", as the API layer sits just beneath the UI surface in a web application.
The API layer tests validate that the service adheres to its contract, i.e., it handles requests appropriately and gives valid responses. In my example project, requests and responses are validated using Pydantic models.
from uuid import UUID
from celery_decipher.decipher.models import DecipherStatusResponse
def test_ingest(http_client):
text = "Smoky smoke test"
response = http_client.post("/decipher", json={"text": text})
assert response.status_code == 200
source_text_id = UUID(response.json()["source_text_id"])
status_response = http_client.get(f"/decipher/{source_text_id}")
assert status_response.status_code == 200
status = DecipherStatusResponse.model_validate(status_response.json())
assert status.source_text_id == source_text_id
assert status.source_text == text
assert status.status == "PENDING"
In the example above, the flow of the test is simple. Our service endpoint accepts a text to be deciphered, we receive an unique ID for the request, and we can query the status of the request using that ID. We don't know how the service manages these tasks internally and what dependencies, if any, it has for performing the actual deciphering. The API covers the internal logic, providing a shared contract between the service and its clients, while allowing us to treat the service as a black box.
Often the features tested in the API layer require the service to call other services, external dependencies, databases, and so on. Sometimes, you may be able to set up a test environment for all the dependencies, and verify the full flow of the feature. Other times, it's not practical for reasons like legacy services without test environments, third-party services with restrictive licensing, complex and time-consuming operations, and so on. In such cases, you can use test doubles for the dependencies, utilizing fakes, stubs and mocks as appropriate. Just be careful! The more you mock, the more complex your test setup becomes, until you end up verifying the mocks instead of the service.
End-to-end tests
Finally, we reach the top of the pyramid, where we test complete features and use cases. In web applications, this usually means driving the UI using tools like Selenium, Playwright, or Cypress. In other applications, it may involve prompting the system from upstream services, triggering processing by sending events to message queues, and so on. The difference to API layer tests is the coverage of entire use cases, which often involve multiple services and external systems.
The upstream services may provide a natural way of receiving the results for some use cases, for example, by sending an asynchronous task to an event queue and reading the final result from another queue, or by calling a REST API endpoint and receiving a HTTP response with synchronous results. Other use cases may produce results in an external system, for example, by creating files in a shared storage, sending emails, or updating records in a third-party system. Getting and verifying the results from an external system, and then cleaning them up afterwards, may sometimes be tricky. The required effort is partly why E2E tests can be slow and expensive to maintain.
The example project does not have a frontend or an UI of any kind. The use cases are fully covered by the API layer tests, as the HTTP endpoints implement complete features. The web application drives all the use cases, functioning as the upstream and interface layer. If your application is structured this way, then testing it is straight-forward and simple. However, most real-world systems I've worked with have had multiple entrypoints for different use cases, for example, a web UI for interactive use, REST API endpoints for programmatic access, and batch data processing jobs triggered when data is received from integrations. Covering all these diverse use cases requires significant effort, since we can't apply a single template to all of them.
Final thoughts
We need tests. They allow me to sleep at night, knowing that my code works as intended. They document how functions behave, how use cases are implemented, and what are the contracts between the components of my system. They allow me to refactor and continuously improve my code, while maintaining quality and mitigating the risk of breaking things.
We need layered tests. Unit tests are great for verifying internal logic of its components, while component integration tests ensure that the components work together. Subcutaneous API layer tests validate the interface contracts, while ensuring appropriate level of abstraction that hides implementation details. Finally, E2E tests cover entire use cases, which ultimately ensure that our software provides value to its users.
We need to learn to love testing. It is a skill that requires practice, identifying patterns and matching them to appropriate test strategies. It should be a natural part of development, whether you're doing test-driven development or not.