The most fundamental tool for any team
doing agile development is Continuous Integration.
Unfortunately while Unit testing frameworks such as Nunit and friends
and CI servers such as Cruise Control and Teamcity seem to have
universal adoption, the tools for managing the build process of the
database is nowhere near advanced.
This walk through (I have made the
mistake of implementing each of the Antipatterns) describes how I
have successfully implemented a build/versioning process for the
database(s) on a number of projects over the last 3 years. At the
core is a simple version table used to store changes that have been
applied to the database. A build task iterates over a directory of
sequentially numbered scripts and applies all the scripts that have
not already been applied to the database and updates the version
number.
The code is open source (BSD license)
and is implemented as Nant/MSBuild tasks so that it can be plugged
into your current build process. There is also an Installer class so
that you can embed the versioning process into an MSI. Currently the
code works with SQL Server and Postgress. Feel free to send me an
email for future enhancements or patches to enable integration with
any DBMS such as MySQL and Oracle. The code will be made available on a future posting...
Database Deployment Patterns
The following is a list of development processes that I have built
up over the last couple of years.
Anti-Pattern: Shared development database
Developers should NEVER connect to a shared database. Full stop
end of story. Each developer and build machine should have its own
database.
Sharing a database is fraught with problems:
Any schema or data change effects EVERYONE
It is impossible to write integration tests that need to commit
transactions. By committing the data it becomes visible to the
other database users. For example any tool or library that manages
transactions such as SQL Server Integration Services will commit
during its processing.
Eventually the database gets cloned anyway for Performance
testing, big schema changes etc. So you may as well have individual
databases from day one.
Anti-Pattern: “Big Bang” database upgrade
script
Database modifications are by their
very nature unidirectional. Once a script has deleted all the rows
out of a database the only way back is a database restore.
Hence as soon as a database enters
production it is imperative that all database changes are managed
through a rigorously tested deployment process. The ONLY reliable
method is to regularly (weekly/fortnightly?) restore a copy of
production in a test environment and simulate the deployment process
that is going to take place in production. Tools such as Redgate can
be used to do a database compare between the production and
development databases. The differences between environments are then
bundled up into an update script that will be run during “Go Live”.
Once the database has been upgraded the
testing begins. Unit tests are run to ensure that the business
rules they validate are still correct. Testers hammer the front
end racing through their regression test scripts to verifying and
validating up until the green light is given.
The problems arise when environment
explosion takes place. As well as development and production there
may be training, integration test, system test, user acceptance
test and production hot fix environments. Lets assume that data in each
database is different (at the very least Authorisation and
Authentication data is different – Testers are “Gods” in test
but not prod). The release cycle will also be different with many
releases into the lower test environments and less regular releases
into production. This state of affairs quickly leads to the one
large upgrade script being broken into smaller logical chunks. Where the scripts granularity maps to the needs of the environment that gets deployed into the most regularly.
Anti-Pattern: Scripts updates many databases
The “use <database>” statement should NEVER exist in an
update script. Scripts should be database agnostic. They should only
change and modify the data for a single database. This has many
benefits:
Pattern: Database has a version number/Scripts
are versioned
If we have a folder of 20 database scripts, how can we determine
which have and which have not been executed against the database?
For schema changes we can see if a column exists or doesn't but
it is impossible to tell if an update or delete statement has been
run. For this reason tagging the database with a version number is
a far simpler solution.
Table: DatabaseScriptRunner
Version: int
AppliedOn: Date Time
Pattern: Version key includes Release Id
When I first started using versioning tables I was repeatedly burnt when production systems needed to be hot fixed. For example assume that the production
database was at version 50 and after a couple of weeks of development the trunk had reached version 60.
In order to patch production we needed to create a new script 51 for the production branch. At this point there were now 2 version 51 scripts, in different branches each with different content. To get the releases aligned the release 51 script would need to be prepended to the trunk script. It started getting very messy once production needed a second and third emergency patch. The trunk 51 script would get longer and longer.
A simpler approach is to store each release's scripts in a different folder and have the first script version number set to 0.
Table: DatabaseScriptRunner
Release: int
Version: int
AppliedOn: Date Time
Pattern: Version table includes Assembly version
numbers
Depending upon your build environment it can be extremely useful
to record the version of the components that have been built against
the database. This can be used during debugging to ensure that the
same version of the code is being used as the one that the database
is expecting.
Table: DatabaseScriptRunner
Team Specific: Don't version stored procedures
If you work in an organisation with a low number of stored
procedures (for example you may use an ORM for instance) then the script changes should be migrated through the database continuous process. On the other hand, if the team is using stored procedures as the solution's data access layer then it can be very important for each script to be easily tracked in the source control repository. For this reason it is possible to use the build process for all schema and data changes and have all the stored procedures placed into a seperate unversioned folder where all the scripts are rerun each time the build is executed.