Wednesday, May 30, 2012

Major DB changes? Migrate, Duplicate or Deprecate

So you have a shiny new database technology, or it's time to make a major upgrade, or you just need to refactor. For whatever reason you want to move a large chunk of data into db_new_table. How do you cope with this? On a live system??

I see 3 options: Migrate, Duplicate or Deprecate.

Migrate

Migration is the act of coding up a script to dump any data from your legacy database table(s) into your new one(s) Migration, done properly, will give you the least grief in the long term as there will be no legacy tables to maintain in the future.

It may even present an interesting programming challenge.

This method is loved by developers due to the fun scripting problem and removal of legacy chod. It's also hated by the bean counters because of the  large risk of the migration script coder not knowing about some fringe case or other. What happens if it works on your Beta system but all goes bad when released to the wild? Migration is difficult to back out once you start using the new tables.

Duplicate

Duplication is the act of running the old and new databases side by side. Both are written to, and when you come to read, you check the old one first, then the new one. At some point, when you're happy that you are storing the same data in both places, you stop writing to the old one and you start to read from the new location first.

Duplication is a relatively safe way of introducing a major database change with just one slight concern: that you may never be able to retire the old storage method. This technique works well for "temporary" data (who really cares about a deleted email from 2 years ago?)

And if you are storing "temporary" data, this may lead on to deprecation.

Deprecate

To deprecate is to cleanse. Switching off the old system entirely and rolling with just the new one.

If you have a table that is cleaned out on a daily, weekly or monthly basis (such as one used in fast profit and loss calculations) deprecation is as easy as disabling the old interface and enabling the new one. A single config bit flip.

Most often you'll need a combination of these techniques though, such as before deprecation you may migrate some of yesterday's data into your new profit and loss table to check the numbers still add up correctly. Or you may want to migrate your data and then write to both systems - duplicate - until you have confidence. This will ultimately lead to deprecation.

Whichever method you chose will depend on the risk of it all going wrong, how much time you have to do it, and the nature of the data stored.