Friday, March 2, 2012

database schema evolution followup

Having thought about the comment of pvblivs to my previous post, a substantial difference between version control of software and database changes became obvious to me. It is not a big deal, I just never thought about this before.

For software, the version control system (VCS)  represents a hopefully consistent state of your software for any revision. This state will be delivered to your dev/staging/production system. Maybe the software is build into an executable format, but basically it is taken as a whole. Maybe for optimization you only push the actuall differences to your systems, but still the final state of the target system will be represented by the revision in your VCS. If you want to know what has changed between to revisions, you ask your VCS what the differences are between these revisions, and it will show you.

With database changes, you actually write scripts that describe the transition between previous state and required state. So the db developer manually does the job that the VCS does for software. The VCS is basically used for storage and historization of these transitions, only.

In contrast to software, databases typically have two streams of evolution. Structural changes and Application data is pushed like software bottom up from development to production. The other stream is a top down stream of data added to the database by users. I think that this is the reason why we define changes and not the required target state as we have to find a way to merge the two streams.

A question I have now is, should there be a tool that automatically determines the required changes between two database states?

I remember listening to a presentation about a db migration tool, I think it was liquibase, you could point at two different databases and instruct it to migrate one to the state of the other. And I remember the bad feeling I had regarding that idea. Mainly because I did not like the idea to move changes from staging to productionthis way, because you would have to make damned sure not to accidentaly delete the production data. You would need to define very well, which application data to move and which user data to leave as it is. But maybe I should rethink that.

What do you think?

No comments:

Post a Comment