Tuesday, February 28, 2012

database schema evolution versus schema versioning

Why we have choosen to use dbdeploy for our database evolution over flyway and some other candidates.
 
Flyway as most other tools, assumes that you have a schema version, that is strictly monotonically increasing. This means, you can only evolve from a lower version to a higher version. As we are not yet doing continuous delivery, we still have a staging environment, where we test a release candidate. And from time to time it happens, that this RC needs a database change. And that is where trouble starts.
Let us assume, that the latest staging version uses db version 22.13, the trunk is currently on 23.5. If you now need a db change in staging, it will increas the schema version to 22.14. But your dev databases are allready at 23.5. So flyway will not allow to add the missing script to advance from 22.13 to 22.14 on your dev databases, as these are allready on 23.5. The only way to add the required change would be to recreate the database from scratch, which gets a little bit complicated and time consuming, if you are working an a more than 10 years old application, as we are.

The main reason I can come up with, for this behaviour is, that this guarantees a consistent order of changes. And thus solves eventually occuring dependencies between database changes. For example db version 23.5 may change a table that was introduces with 22.12. Thus 23.5 will fail if 22.12 is not yet applied.

However, with 100 people working on one big web application -which needs changing, but this is another story- most changes will not depend on each other as they affect different functional parts of the application. And often changes are developed in parallel which makes sctrictly monotonically increasing nmbering in svn difficult.

To allow an efficient development, the dbdeploy way of doing things looked more appropriate for us. Dbdeploy also uses a unique numbering scheme for naming the changes, but it does not enforce the order as strictly als flyway. If you allready applied 100, 110 and 120, you can still create a 115, and get it deployed. Dbdeploy basically removes the allready applied scripts from the set of available scripts and applies the remaining scripts in the order given by their numbers. Dependencies between scripts are at your risk.  

So the basic difference is versioning a schema versus evolving a schema by applying (hopefully small and independent) changes. The only thing we missed in dbdeploy was the capability to warn about changed scripts allready applied to the database. Thus we added a sha256 checksum to the dbdeploy changelog table, and added checksum comparison for scripts allready applied. If a allready applied script was changed, we will set up the database to a previous production version by importing an anonymized dump and apply the missing features. As this is currently our normal way of database deployment, we know how to do that. But my strong hope is that we will only have to do that in one out of a hundred cases, as this takes 15 minutes. Applying the missing changes takes less than a minute as far as we have experienced up to now.

4 comments:

  1. I do not quite get the thing with the version number. You could use the svn revision or build version of the database scripts... aren't they always incrementing?

    ReplyDelete
  2. When using SQL based migrations, Flyway uses a file naming scheme like:

    v1__myFirstTable.sql
    v1_1_addAColumnToMyFirstTable.sql
    v2__addASecondTable.sql

    and so forth. So there is a dedicated file for each version update. And files are to be applied following strict ordering. If your DB is allready at v2 and you than add a v1_2_whatever.sql script (because you have to change something on a staging version) you get into trouble. You will not be able to apply the change to your dev db, without resetting it to a pre v_2 stage. Or you apply the script as v1_1 on your staging environment and put it als v2_2 into your dev db. But than the change is allready applied to staging and possibly production, thus it may fail.
    And I want to get rid of all that db resetting that we are currently doing by default by importing dumps. As this takes approximately 15 minutes each time.
    Dbdeploy follows a very similar scheme, as each script file has to start with a number, it just applies all missing scripts, without caring about the maximum script number allready applied to your target DB. So yes, it may lead to problems, because scripts may be applied in different order, depending of the current state of your target database, but I am willing to accept that risk in exchange of a much faster db setup in most cases.
    In case of a script failure, I can still setup the database to a defined state by importing our dump and start over. One minute instead of 15 minutes seems worth ths little risk.

    ReplyDelete
  3. Hi Oliver,

    just wanted to let you know that Flyway 2.0 is out now and it does support out of order migrations. Feel free to give it a try :-)

    Cheers
    Axel

    ReplyDelete
  4. Flyway introduced this:

    # Allows migrations to be run "out of order" (default: false).
    # If you already have versions 1 and 3 applied, and now a version 2 is found,
    # it will be applied too instead of being ignored.
    # flyway.outOfOrder=

    ReplyDelete