April 17, 2009

Using Liquibase to Move Test Datasets Across Database Types

Updated February 9, 2022

While the main goal of Liquibase is database change tracking, I found a very different and handy use for it: moving test datasets. In this blog, I’ll cover how I’m using Liquibase to manage and migrate test datasets.

I have an application that has been traditionally developed on MySQL, but am in the process of adding MS SQL support as well. The application uses Hibernate so it should work in theory, but does it really work on SQLServer?

Since I have unit tests already set up that actually execute the data access layer against the database, I can run them against MS SQL instead — but the tests require data and often specific data.

I wasn’t looking forward to duplicating my test data management effort, so I started looking for a way to move the data from MySQL to SqlServer. I tried mysqldump (even with the “mssql compatibility mode”) but it was still too MySQL-specific. There are MS SQL migration tools, but I wanted it to be automated and be flexible enough to move the app to Oracle, DB2, and others in the future.

The first step, regardless of how I’d be managing the test data, is to ensure my changelog file supports both my old MySQL database as well as MS SQL. While Liquibase and SQL can work cross-database, there are sometimes incompatibilities you don’t notice until you actually try to run it. Writing a cross-database changelog file is out of the scope of this post, but in my case, it mainly consisted of fixing some MySQL-specific datatypes to more generic versions. As I changed existing changesets, I had to use the <validChecksum> tag to let Liquibase know that the different versions of the changesets were expected. It took a few trial and error iterations until it ran cleanly against both, but it wasn’t too bad.

Once I was able to create the correct database structure in my new database type, I needed to be able to extract and then load the test data from MySQL.

What I ended up doing was using Liquibase’s generateChangeLog functionality to output a changelog with only <loadData> changesets which reference CSV files containing the data. Here’s the CLI command for this:

liquibase generate-changelog –changelog-file=testdata.xml –diff-types=data –data-output-directory=testdata

The advantage of this setup is that the loadData change works across all my database types. When you first run the generated changelog against the different database type, it may fail due to some database-specific datatypes being included in the loadData changes. If so, you will need to write a simple “search and replace” script to clean up any problems you hit.

Now, I can continue to manage my dataset in MySQL like I always have but now magically have my test database and dataset available for testing against using MS SQL as well. When there is a new version of my data I want to “snapshot”, it’s simply a matter of running my “generate-changelog plus search/replace” script and then saving the resulting XML and CSV files. While I would not recommend using this for a production backup/restore scenario, it has been working great for managing and migrating test datasets!

Nathan Voxland
Nathan Voxland
Share on: