OSchmi's thoughts on IT and the rest of the world: database

Showing posts with label database. Show all posts

Saturday, March 3, 2012

dbdeploy with checksum validation

We just published a modified version of Graham Tackley's dbdeploy tool at github. Dbdeploy is enhanced with checksum validation of allready applied scripts. It will throw an Exception if an allready applied database script has changed afterwards.
Thanks to Michael Gruber for implementing this change while being a part of our Delivery Chain Taskforce.

Friday, March 2, 2012

database schema evolution followup

Having thought about the comment of pvblivs to my previous post, a substantial difference between version control of software and database changes became obvious to me. It is not a big deal, I just never thought about this before.

For software, the version control system (VCS) represents a hopefully consistent state of your software for any revision. This state will be delivered to your dev/staging/production system. Maybe the software is build into an executable format, but basically it is taken as a whole. Maybe for optimization you only push the actuall differences to your systems, but still the final state of the target system will be represented by the revision in your VCS. If you want to know what has changed between to revisions, you ask your VCS what the differences are between these revisions, and it will show you.

With database changes, you actually write scripts that describe the transition between previous state and required state. So the db developer manually does the job that the VCS does for software. The VCS is basically used for storage and historization of these transitions, only.

In contrast to software, databases typically have two streams of evolution. Structural changes and Application data is pushed like software bottom up from development to production. The other stream is a top down stream of data added to the database by users. I think that this is the reason why we define changes and not the required target state as we have to find a way to merge the two streams.

A question I have now is, should there be a tool that automatically determines the required changes between two database states?

I remember listening to a presentation about a db migration tool, I think it was liquibase, you could point at two different databases and instruct it to migrate one to the state of the other. And I remember the bad feeling I had regarding that idea. Mainly because I did not like the idea to move changes from staging to productionthis way, because you would have to make damned sure not to accidentaly delete the production data. You would need to define very well, which application data to move and which user data to leave as it is. But maybe I should rethink that.

What do you think?

Tuesday, February 28, 2012

database schema evolution versus schema versioning

Why we have choosen to use dbdeploy for our database evolution over flyway and some other candidates.

Flyway as most other tools, assumes that you have a schema version, that is strictly monotonically increasing. This means, you can only evolve from a lower version to a higher version. As we are not yet doing continuous delivery, we still have a staging environment, where we test a release candidate. And from time to time it happens, that this RC needs a database change. And that is where trouble starts.
Let us assume, that the latest staging version uses db version 22.13, the trunk is currently on 23.5. If you now need a db change in staging, it will increas the schema version to 22.14. But your dev databases are allready at 23.5. So flyway will not allow to add the missing script to advance from 22.13 to 22.14 on your dev databases, as these are allready on 23.5. The only way to add the required change would be to recreate the database from scratch, which gets a little bit complicated and time consuming, if you are working an a more than 10 years old application, as we are.

The main reason I can come up with, for this behaviour is, that this guarantees a consistent order of changes. And thus solves eventually occuring dependencies between database changes. For example db version 23.5 may change a table that was introduces with 22.12. Thus 23.5 will fail if 22.12 is not yet applied.

However, with 100 people working on one big web application -which needs changing, but this is another story- most changes will not depend on each other as they affect different functional parts of the application. And often changes are developed in parallel which makes sctrictly monotonically increasing nmbering in svn difficult.

To allow an efficient development, the dbdeploy way of doing things looked more appropriate for us. Dbdeploy also uses a unique numbering scheme for naming the changes, but it does not enforce the order as strictly als flyway. If you allready applied 100, 110 and 120, you can still create a 115, and get it deployed. Dbdeploy basically removes the allready applied scripts from the set of available scripts and applies the remaining scripts in the order given by their numbers. Dependencies between scripts are at your risk.

So the basic difference is versioning a schema versus evolving a schema by applying (hopefully small and independent) changes. The only thing we missed in dbdeploy was the capability to warn about changed scripts allready applied to the database. Thus we added a sha256 checksum to the dbdeploy changelog table, and added checksum comparison for scripts allready applied. If a allready applied script was changed, we will set up the database to a previous production version by importing an anonymized dump and apply the missing features. As this is currently our normal way of database deployment, we know how to do that. But my strong hope is that we will only have to do that in one out of a hundred cases, as this takes 15 minutes. Applying the missing changes takes less than a minute as far as we have experienced up to now.

Monday, February 27, 2012

Oracle database shutdown and startup via jdbc

While working on the automation of our database setup procedures, we learned that you can shut down and start up an oracle db instance remotely via jdbc. It took some trials and finally reading this blog post to understand how to achive that.

From the oracle java api documentation it is not obvious, that you have to call several methods to achieve the same effect of a shutdown transactional from within sqlplus. You actually should issue four commands as shown in the following code example

   import static oracle.jdbc.OracleConnection.DatabaseShutdownMode.TRANSACTIONAL;

...

   OracleConnection connection = connectionHolder.getConnection();
    try {
      connection.shutdown(TRANSACTIONAL);

      Statement stmt = connection.createStatement();
      stmt.execute("alter database close normal");
      stmt.execute("alter database dismount");
      stmt.close();
      connection.shutdown(OracleConnection.DatabaseShutdownMode.FINAL);
    } finally {
      connection.close();
    }

For startup it is :

    connectionHolder.getPrelimAuthConnection().startup(NO_RESTRICTION);

    Statement statement = connectionHolder.getConnection().createStatement();
    statement.execute("alter database mount");
    statement.close();

Both calls require a connection with sysdba or sysoper role. Startup requires a preliminary auth connection. To get one use

  private OracleConnection getPrelimAuthConnection()

      throws SQLException

  {
    Properties props = new Properties();
    props.put(USER_KEY, username);
    props.put(PASSWORD_KEY, password);
    props.put(ROLE_KEY, "sysdba");
    props.put(PRELIM_AUTH_KEY, "true");

    OracleConnection newConnection =

       (OracleConnection) DriverManager.getConnection(connectionString, props);
    return newConnection;
  }