June 7, 2007

Unit Testing the Database Access Layer

Updated February 7, 2022

The Problem

Writing unit tests for code that accesses a database is a difficult problem, and it’s one I’ve struggled with for several years. The problems come down to a balancing act between several (often competing) requirements:

The unit test must not not break due to changes outside the class under test.
The unit test must not break due to code refactoring that doesn’t break functionality.
The unit test must run very fast.
The unit test must remain valid through database refactorings.
The unit test must fail if the code doesn’t function correctly.

I’ve tried several approaches in the past and was unsatisfied with them:

Mock the Database Connection

This is the most “pure” unit testing approach because it completely isolates the class under test from all external dependencies. These examples are Java pseudo-code, but the pattern applies to any language.

Connection conn = createMock(Connection.class);
Statement statement = createMock(Statement.class);
expect(conn.createStatement()).andReturn(statement);
expect(statement.execute("SELECT ID FROM PERSON;")).andStubReturn(true);
statement.close();
expectLastCall();
replay(conn);
replay(statement);

ClassUnderTest classUnderTest = new ClassUnderTest ();
classUnderTest .findPersonIds();

verify(conn);
verify(statement);

The advantage to mocking up the database connection is that the tests run fast and don’t change due to the creation of other unit tests.

In practice, however, I’ve found these tests to be more or less worthless. They (like most mock tests) really just test the implementation of the method — not the behavior of the method — and therefore often fail due to code refactoring. The biggest problem, however, is that access to the database is really the whole point of the method and there are so many ways that can fail.

I would argue that the method logic is actually implemented in SQL with a Java wrapper around it. In the mock unit test above, only the Java code is really tested–the SQL being “tested” is often just cut and pasted from what is in the method.

Throw in changes to the database over the life of the project (the “person” table name changes to “employee”) and you end up with tests that pass, but don’t really tell you if your code will work.

Test against an Actual Database Connection

To actually test that the SQL works against a real database, the first thing you need is an actual database to test against. Tools like Docker, Test containers, and Liquibase Data work well for simplifying and standardizing the installation of databases to test.

Another option to consider for your test database is to use a lightweight in-memory database such as SQLite or H2. These are going to be different from your production database, but depending on your SQL needs they may be compatible enough to work for a selection of your testing while being significantly easier to start and faster to run than your “real” database. If you go this route, you will still want to have a mix between lightweight-database testing and actual-database testing, but if 80% or 90% of your tests can be run against the lightweight database it can have advantages.

Pre-Configuring the Database

Once you have the database up and running, you need to make sure the schema within it is set up correctly. Luckily, you have Liquibase which is all about that! All you have to do is run liquibase update against your newly created test database and you are ready to go.

That means your setup process is like this:

Start the database using a mechanism that works well for you
Run liquibase update

These steps are ideally coded into your test framework “setup” logic so it happens automatically, but they can be run separately prior to running your tests as well.

If you use something like Testcontainers or Liquibase Data, you are able to set up your database with a state already existing in it. Test containers allow you to specify an initialization SQL script while Liquibase Data starts with a version of your schema already installed and running. Similar tools may have similar options. Even if you have a way to start your database with a schema in place, you will still want to perform a liquibase update as part of your test setup process so you know you have the latest version of the schema, and also testing that the newest changesets are running as expected.

Managing Test Data

Setting up your database schema before running tests is usually not all that is needed, however. Oftentimes your tests need to ensure that the behavior is correct with different types of data.

For example:

Does your function correctly return a single page of results if there are many pages of data?
Does it return the correct page of data?
Does it work when there is exactly one page worth of row?
What about less than a page worth of rows?

To handle this, you need to manage data in the database as well. One common approach is to have a step in your tests that loads data into the database from CSV files or even hard-coded insert statements as part of the tests.

The main problem with this approach, however, is that the way the test data is stored is often very dependent on the version of the schema when it was created. So, when database structure changes your test data can no longer be inserted and your tests are now worthless. For example, if your test is created with test data that inserts rows into a “person” table, but later that table is renamed to “employee”, your insert statements will no longer execute. Depending on the schema changes, you may be able to recover your test data with a search and replace, but often the changes are too much and it has been long enough since you wrote the test that you don’t remember exactly what was supposed to be in the data.

Liquibase can solve this problem by allowing you to have your test data built up along with your schema. By integrating test data management into your changelog file, the test data is always inserted into the schema it expects, and then as the schema evolves in later changesets, the test data evolves along with it.

For example, imagine a changelog like this:

When Liquibase hits changeset “2”, it will always be good with inserting rows into a “person” table because that table isn’t renamed until the following changeset.

Notice that there is a context=”test” on changeset 2. Contexts are a way to control when certain changesets execute and when they do not.

In this case, we are defining a “test” context, so if you run “liquibase update –context=test” it will run changeset 2 and insert the test data. But, if you run with liquibase update –context=prod, it will NOT run changeset 2. Remember, that not specifying a context in your command means “run all changesets, regardless of what they have for a context attribute” so you will need to make sure to specify a context anytime you don’t want your test data inserted.