Trimming changelog files

Large changelog files may become unwieldy over time. This guide offers best practices on how to address this common occurrence.

You add individual change sets to your changelog file to make changes to your database. Over time, those changes can build up to thousands of entries, many of which are now redundant or inefficient. For example, a table was created and then deleted, or a table was created and then columns were added individually instead of creating a table with all the columns.

Warning: Before continuing, verify whether simplification is necessary. Your changelog was built, run, and tested over a long period of time. Manual edits to the changelog file risk database stability. Consider whether the performance or file size concerns you have outweigh the risk of altering a functional script.

Scenario 1: changelog file size

Often the changelog file is so large that it causes performance issues in your editor, or there are too many merge conflicts. The best way to handle this is to divide your changelog file into multiple files. Instead of having a single changelog.xml file with everything in it, create a main.changelog.xml file which uses the include tag to reference other changelog files.

<databaseChangeLog
    xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:ext="http://www.liquibase.org/xml/ns/dbchangelog-ext"
    xmlns:pro="http://www.liquibase.org/xml/ns/pro"
    xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
        http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-latest.xsd
        http://www.liquibase.org/xml/ns/dbchangelog-ext http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-ext.xsd
        http://www.liquibase.org/xml/ns/pro http://www.liquibase.org/xml/ns/pro/liquibase-pro-latest.xsd">
    <include  file="com/example/news/news.changelog.xml"/>  
    <include  file="com/example/directory/directory.changelog.xml"/>  
</databaseChangeLog>

When you run liquibase update against the main.changelog.xml file, changesets in com/example/news/news.changelog.xml will run and then the changesets in com/example/directory/directory.changelog.xml will run. You can break up changesets by feature, release, or other ways.

Scenario 2: liquibase update time efficiency

Other times, the problem is that liquibase update takes too long. Liquibase tries to be as efficient as possible when comparing the contents of the DATABASECHANGELOG table with the current changelog file and even if there are thousands of already ran changesets, an update command should take just seconds to run. If you find that update takes longer than it should, watch the Liquibase log to determine why.

Perhaps there is an old runAlways="true" changeset that no longer needs to run or there are preconditions which are no longer needed. Running Liquibase with --log-level=INFO or even --log-level=DEBUG can give additional output which can help you determine which changesets are slow. Once you know what is slowing down your update, try to alter just those changesets rather than throwing out the whole changelog and starting from scratch. You will still want to retest your changelog in-depth, but it is a far less risky change.

Scenario 3: surgically alter existing changesets

Some users find that liquibase update works well for incremental updates, but creating a database from scratch takes far too long. Are you re-creating databases often enough that the risk of a change to the creation script makes sense? If you are, look for problem changesets as described above. Indexes often cause performance bottlenecks during database creation, so start with those. If you are creating and updating indexes frequently in your creation process, you may be able to combine those changesets into something more efficient.

Note: When you need to surgically alter your existing changesets, remember how Liquibase works: each changeset has an id, an author, and a file path which together uniquely identifies it. If the DATABASECHANGELOG table has an entry for that changeset it will not run it. If it has an entry, it throws an error if the checksum for the changeset in the file doesn't match what was stored on the last run.

Modify

How you modify your existing changesets will also depend on your environment and where in the changelog the problem changesets are. If you are modifying changesets that have been applied to all of your environments and are now only used on fresh database builds you can treat them differently than if they have been applied to some databases but not yet to others.

Merge

To merge or modify existing changesets you will be doing a combination of editing existing changesets, removing old changesets, and creating new ones.

Remove

Removing unneeded changesets is easy because Liquibase doesn't care about DATABASECHANGELOG rows with no corresponding changesets. Just delete out of date changesets and you are done. For example, if you have a changeset that creates the table cart and then another that drops it, just remove both changesets from the file. You must make sure, however, that there are no changesets between the create and the delete that make use of that table or they will fail on a fresh database build. That is an example of how you are introducing risk when changing your changelog file.

Example scenario

Suppose instead you have a cart table that is created in one changeset, then a promo_code column is created in another and an abandoned flag is created in another.

<databaseChangeLog
    xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:ext="http://www.liquibase.org/xml/ns/dbchangelog-ext"
    xmlns:pro="http://www.liquibase.org/xml/ns/pro"
    xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
        http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-latest.xsd
        http://www.liquibase.org/xml/ns/dbchangelog-ext http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-ext.xsd
        http://www.liquibase.org/xml/ns/pro http://www.liquibase.org/xml/ns/pro/liquibase-pro-latest.xsd">
    <changeSet  author="nvoxland"  id="1">  
        <createTable  tableName="cart">  
            <column  name="id"  type="int"/>  
        </createTable>  
    </changeSet>  

    <changeSet  author="nvoxland"  id="2">  
        <addColumn  tableName="cart">  
            <column  name="promo_code"  type="varchar(10)"/>  
        </addColumn>  
    </changeSet>  

    <changeSet  author="nvoxland"  id="3">  
        <addColumn  tableName="cart">  
            <column  name="abandoned"  type="boolean"/>  
        </addColumn>  
    </changeSet>  

</databaseChangeLog>

Option 1

This will work well if all existing databases have the cart table with the promo_code and abandoned columns already added.

  • Combine everything into a new changeset using the existing id="1" .
  • Delete the other changesets.
<databaseChangeLog
    xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:ext="http://www.liquibase.org/xml/ns/dbchangelog-ext"
    xmlns:pro="http://www.liquibase.org/xml/ns/pro"
    xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
        http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-latest.xsd
        http://www.liquibase.org/xml/ns/dbchangelog-ext http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-ext.xsd
        http://www.liquibase.org/xml/ns/pro http://www.liquibase.org/xml/ns/pro/liquibase-pro-latest.xsd">
    <changeSet  author="nvoxland"  id="1">  
        <validCheckSum>7:f24b25ba0fea451728ffbade634f791d</validCheckSum>  
        <createTable  tableName="cart">  
            <column  name="id"  type="int"/>  
            <column  name="promo_code"  type="varchar(10)"/>  
            <column  name="abandoned"  type="boolean"/>  
        </createTable>  
    </changeSet>  
</databaseChangeLog>

Note: Running Liquibase against existing databases just sees that id="1" already ran and does not commit any new script.

Option 2

Running Liquibase against a blank database will create the cart table with all the columns right away.

You must apply the validCheckSum tag in the error message to notify Liquibase that you know it changed and the new value is valid. If you omit the tag, existing databases will throw an error stating that id="1" has changed since it was run.

Option 3

If you have some databases where the promo_code and/or abandoned columns have not yet been added, follow the below instructions:

  • Update the original createTable as before.
  • Use preconditions with onFail="MARK_RAN" to handle cases where the old changesets ran.
  • Do not add the columns again if the new changesets ran.
<databaseChangeLog
    xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:ext="http://www.liquibase.org/xml/ns/dbchangelog-ext"
    xmlns:pro="http://www.liquibase.org/xml/ns/pro"
    xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
        http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-latest.xsd
        http://www.liquibase.org/xml/ns/dbchangelog-ext http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-ext.xsd
        http://www.liquibase.org/xml/ns/pro http://www.liquibase.org/xml/ns/pro/liquibase-pro-latest.xsd">				  
    <changeSet  author="nvoxland"  id="1">  
        <validCheckSum>7:f24b25ba0fea451728ffbade634f791d</validCheckSum>  
        <createTable  tableName="cart">  
            <column  name="id"  type="int"/>  
            <column  name="promo_code"  type="varchar(10)"/>  
            <column  name="abandoned"  type="boolean"/>  
        </createTable>  
    </changeSet>  

    <changeSet  author="nvoxland"  id="2">  
        <preConditions  onFail="MARK_RAN">  
            <not><columnExists  tableName="cart"  columnName="promo_code"/></not>  
        </preConditions>  
        <addColumn  tableName="cart">  
            <column  name="promo_code"  type="varchar(10)"/>  
        </addColumn>  
    </changeSet>  

    <changeSet  author="nvoxland"  id="3">  
        <preConditions  onFail="MARK_RAN">  
            <not><columnExists  tableName="cart"  columnName="abandoned"/></not>  
        </preConditions>  
        <addColumn  tableName="cart">  
            <column  name="abandoned"  type="boolean"/>  
        </addColumn>  
    </changeSet>  

</databaseChangeLog>

Option 4

On existing databases that have all three changesets already ran, Liquibase will just continue on as before. For existing databases that have the old cart definition, it will see that the columns don't exist for id="2" and id="3" and execute as usual. For blank databases, it will create the table with the promo_code and abandoned columns and then in id="2" and id="3" it will see that they are already there and mark that they have ran without re-adding the columns.

Preconditions

Note: Preconditions add a performance overhead to your update executions and are ignored in update-sql mode because Liquibase cannot know how applicable they are when changesets have not actually executed. It is best to use them only when needed.

Preconditions also add complexity to your changelog which will require additional testing so keep that in mind when deciding whether to modify your changelog logic. Sometimes it is easiest and safest to wait until all your databases have the columns and then modify the changesets to avoid the preconditions.

The cart/promo_code/abandoned example shows some basic patterns you can use when modifying existing changesets. Similar patters can be used to optimize different types of bottleneck issues. Just remember when one changeset is changed, it can affect other changesets below which may need to be modified as well. This can easily spiral out of control so be mindful of what you are doing.

If you end up finding that it will work best to completely restart your changelog, see How to set up Liquibase with an Existing Project and Multiple Environments, which describes how to add Liquibase to an existing project (even if that project was previously managed by Liquibase).