June 18, 2014

Adding Liquibase to an Existing Project

Once teams start testing Liquibase out to understand how it works, they immediately want to add Liquibase to help manage database changes for their existing databases. Since every project is different with various processes and requirements, there are a couple of different approaches you can take. In this blog, we'll take a look at two of these approaches so you can use the one that works best for your team and project.

Using Liquibase for an existing database project

When adding Liquibase to an existing project there are basically two approaches:

Approach 1: Create a changelog that matches the current state of your database so it feels like you’ve always been using Liquibase. (Our recommendation because it’s the best long term option.)
Approach 2: Use a snapshot from your database backup tool to start and use Liquibase from that point forward. (Best if you're in a big hurry.)

Before you choose an approach

Liquibase helps you bring order to your database chaos. To effectively bring that order, you must first understand the chaos. What is the current state of your database? How much variation is there within your pipeline? Usually, the best place to start looking for answers is your production database. Then you can compare your production database with your test and development environments to understand how much or how little variation you will need to deal with as you introduce Liquibase. You can use Liquibase (via the liquibase diff command) or other vendor-specific tools to compare your databases.

Recommended option: Make it look like you’ve always been using Liquibase

In this approach, you’ll create a changelog file that matches the current state of your database. You can run this changelog against a blank database and the final result will be indistinguishable from your existing databases — it will feel as if you’ve used Liquibase from the beginning. This approach is the best long-term solution. There’s just a little bit more work you’ll need to do to get started, but it’s worth it!

Create a changelog

Create the changelog to match your database automatically using the generateChangeLog command. After using the generateChangeLog command, check through the generated changesets to ensure that everything looks correct. Liquibase does not always detect really complex or database-specific structures (like index storage and clustering configurations, or custom object types). Treat the generated changelog as a starting point and go through the generated changesets to ensure they are creating everything correctly. Make sure you grab a Liquibase Pro license key so you can capture stored procedures, triggers, and functions.

An alternative to using generateChangelog that works for some database types is to export the DDL for a schema to a file. The file can be used by Liquibase with an initial <sqlFile> changeset. As an example, MySQL has mysqldump that you run against your database, creating a sql file. You’d run mysqldump yourdb > bootstrap.sql and then copy that bootstrap.sql file to your version control and add <changeSet ..><sqlFile path="bootstrap.sql"/></changeSet> as your first changeset. Mysqldump has various options to control what kind of data to get and how to format it, and what options you’ll use will depend on your specific needs.

Populate the tracking table

Once you have your changelog file, you need a way to ensure that the pre-Liquibase changesets are only run on new, empty databases. Liquibase usese a tracking table (DATABASECHANGELOG) to understand which changes have already been applied. The easiest way to create a tracking table with the right information is to use the changeLogSync or changeLogSyncSQL command. This command marks all the changesets in your starter changelog file as already run. Without this step, liquibase update will try to recreate objects that already exist in your database.

As an alternative to the changeLogSync command, you can add contexts to all on the pre-Liquibase changesets. For example, <changeSet … context=”legacy”>. Then, on empty databases you run Liquibase When you run Liquibase with liquibase --context=legacy update. Then, on existing databases you run with liquibase --context=non-legacy.

Finally, you can add <precondition onFail="MARK_RAN"> tags to the generated changesets. This is the part that will be the most work upfront; It requires going into each changeset and defining a way to know if it had been run before or not. However, once that is done the deployment to all your databases is much simpler because there are no contexts or changelogSync to worry about.

For example, if you have a <createTable tableName="person"> changeset, you would add the following tags into the changeset:

Learn more about using Liquibase contexts and preconditions.

The technique you choose to populate the tracking table (DatabaseChangeLog table) depends on how divergent your schemas are. It often makes sense to use multiple approaches. For example, you may want to generate your base changelogs from the production database and use changeLogSyncSQL to be able to mark them on everything from production down. Then you can add your non-released changesets to the changelog file with a precondition checking if it has already run. This allows Liquibase to automatically figure out the correct state for all your databases from development through production.

We are going to use Liquibase starting…NOW!

Instead of building up a changelog to match your existing database, you can instead just declare “from this point forward, we are using Liquibase”. The set up is much easier because you don’t have to figure out how to get your pre-existing tables into your changelog file and you don’t have to figure out how to have those changesets marked as run. Since Liquibase only looks at the tracking table (DatabaseChangeLog table) to determine what needs to run, it doesn’t care what else might be in your database and it leaves all your existing tables alone. It just runs the new changesets.

The biggest disadvantage to this approach is that you cannot bootstrap an empty database with Liquibase alone. Instead, any new databases will need to first have a pre-Liquibase backup restored to it and then have Liquibase run from there.

Depending on how much variation you have between your schemas, you may need to rely on preconditions or changelogsync to standardize and handle those variations. For example, if your existing production database was your pre-Liquibase snapshot, changesets that correspond to changes that have already been applied to test environment will need to be marked as already run. Coordinating the start of Liquibase with a release boundary can sometimes minimize differences between test and production databases.

Summing it up: Remember people & processes

Remember that starting to use Liquibase — especially on an existing project — isn’t just about how you bootstrap your changelog file. It is also a question of how you introduce Liquibase into your existing processes and culture.

For many companies and projects, everyone realizes the problems that need fixing and are on board with the advantages of change. For others, however, there can be entrenched interests and strong resistance similar to any other process change. Liquibase provides many tools and approaches that can be used to ease it into an existing process such as SQL output, SQL formatted changelogs, diffChangeLog and more that can be combined in ways that works best for your team.

If you know that introducing Liquibase is going to be complex, either from a technical or processes standpoint, it is usually best to introduce it slowly. Start with it on a new project as a trial run and once you have a good grasp of how it works and available options, apply it to other existing projects. Our team of database experts has helped countless teams get started with database version control and incorporating database change into their CI/CD pipeline. Contact us and we'll be happy to walk you through a process that is right for your team.