David Ross's Blog Random thoughts of a coder

Database Continuous Integration Part 1.

26. August 2008 21:20 by David in

The most fundamental tool for any team doing agile development is Continuous Integration. Unfortunately while Unit testing frameworks such as Nunit and friends and CI servers such as Cruise Control and Teamcity seem to have universal adoption, the tools for managing the build process of the database is nowhere near advanced.

This walk through (I have made the mistake of implementing each of the Antipatterns) describes how I have successfully implemented a build/versioning process for the database(s) on a number of projects over the last 3 years. At the core is a simple version table used to store changes that have been applied to the database. A build task iterates over a directory of sequentially numbered scripts and applies all the scripts that have not already been applied to the database and updates the version number.

The code is open source (BSD license) and is implemented as Nant/MSBuild tasks so that it can be plugged into your current build process. There is also an Installer class so that you can embed the versioning process into an MSI. Currently the code works with SQL Server and Postgress. Feel free to send me an email for future enhancements or patches to enable integration with any DBMS such as MySQL and Oracle. The code will be made available on a future posting...


Database Deployment Patterns

The following is a list of development processes that I have built up over the last couple of years.

Anti-Pattern: Shared development database

Developers should NEVER connect to a shared database. Full stop end of story. Each developer and build machine should have its own database.

Sharing a database is fraught with problems:

  • Any schema or data change effects EVERYONE

  • It is impossible to write integration tests that need to commit transactions. By committing the data it becomes visible to the other database users. For example any tool or library that manages transactions such as SQL Server Integration Services will commit during its processing.

  • Eventually the database gets cloned anyway for Performance testing, big schema changes etc. So you may as well have individual databases from day one.

Anti-Pattern: “Big Bang” database upgrade script

Database modifications are by their very nature unidirectional. Once a script has deleted all the rows out of a database the only way back is a database restore.

Hence as soon as a database enters production it is imperative that all database changes are managed through a rigorously tested deployment process. The ONLY reliable method is to regularly (weekly/fortnightly?) restore a copy of production in a test environment and simulate the deployment process that is going to take place in production. Tools such as Redgate can be used to do a database compare between the production and development databases. The differences between environments are then bundled up into an update script that will be run during “Go Live”.

Once the database has been upgraded the testing begins. Unit tests are run to ensure that the business rules they validate are still correct. Testers hammer the front end racing through their regression test scripts to verifying and validating up until the green light is given.

The problems arise when environment explosion takes place. As well as development and production there may be training, integration test, system test, user acceptance test and production hot fix environments. Lets assume that data in each database is different (at the very least Authorisation and Authentication data is different – Testers are “Gods” in test but not prod). The release cycle will also be different with many releases into the lower test environments and less regular releases into production. This state of affairs quickly leads to the one large upgrade script being broken into smaller logical chunks.  Where the scripts granularity maps to the needs of the environment that gets deployed into the most regularly.

Anti-Pattern: Scripts updates many databases

The “use <database>” statement should NEVER exist in an update script. Scripts should be database agnostic.  They should only change and modify the data for a single database. This has many benefits:

  • Easy to have single database server running different test environments.

  • Easy to manage in source control place each databases scripts in a different folder

Pattern: Database has a version number/Scripts are versioned

If we have a folder of 20 database scripts, how can we determine which have and which have not been executed against the database? For schema changes we can see if a column exists or doesn't but it is impossible to tell if an update or delete statement has been run. For this reason tagging the database with a version number is a far simpler solution.

Table: DatabaseScriptRunner

  • Version: int

  • AppliedOn: Date Time

Pattern: Version key includes Release Id

When I first started using versioning tables I was repeatedly burnt when production systems needed to be hot fixed. For example assume that the production database was at version 50 and after a couple of weeks of development the trunk had reached version 60. In order to patch production we needed to create a new script 51 for the production branch.  At this point there were now 2 version 51 scripts, in different branches each with different content.  To get the releases aligned the release 51 script would need to be prepended to the  trunk script.  It started getting very messy once production needed a second and third emergency patch.  The trunk 51 script would get longer and longer.

A simpler approach is to store each release's scripts in a different folder and have the first script version number set to 0.

Table: DatabaseScriptRunner

  • Release: int

  • Version: int

  • AppliedOn: Date Time

Pattern: Version table includes Assembly version numbers

Depending upon your build environment it can be extremely useful to record the version of the components that have been built against the database. This can be used during debugging to ensure that the same version of the code is being used as the one that the database is expecting.

Table: DatabaseScriptRunner

  • Release: int

  • Version: int

  • RepositoryAssemblyVersion: string

  • AppliedOn: Date Time

Team Specific: Don't version stored procedures

If you work in an organisation with a low number of stored procedures (for example you may use an ORM for instance) then the script changes should be migrated through the database continuous process. On the other hand, if the team is using stored procedures as the solution's data access layer then it can be very important for each script to be easily tracked in the source control repository.  For this reason it is possible to use the build process for all schema and data changes and have all the stored procedures placed into a seperate unversioned folder where all the scripts are rerun each time the build is executed.