Any database application should be able to deal with data growth. As an application is used, data is created and evolves. With a corporate performance management tool, data can nicely sliced by a (Scenario, Year) combination. For example (Actual, 2004) would be the actual transactions that happened in 2004. (Budget, 2006) would be the approved budget for 2006). If a system has been in use for 5 years and each year is focused on building only a 1-year plan, 80% of the data is static. Many administrators do not know that, in most cases, all active data is reprocessed each night. Even though the last 4 years of data is not changing, those rows need to be loaded, checked and recalculated. The archiving process moves this data into separate tables that do not get reprocessed each night. That data is still available in the cube, but it does not eat up CPU cycles each night. It also does not need to be backed up each night. By keeping it in a separate database, daily backup file sizes can remain constant for as long as the application is in use. Rather than having backups of 60GB or more, backups should rarely be larger than 20GB.
High Level Steps for Archiving
- Figure out which (Scenario, Year) combinations have data in the system right now
- Decide which (Scenario, Year) combinations should be deleted, archived or stay active
- Create a new database for archiving
- Setup archive copies of only the necessary fact tables in this new archive database
- Write a process to move data to or from the active and archive tables
- Run the process to archive out each (Scenario, Year) combination that should be archived
- Add a new cube partition for the archive table