Surprisingly, Books Online (BOL), which I've always struggled using, has not only a good explanation of CDC but a great tutorial on setting it up. Plus the kind Microsoft people even take you through the implementation of an SSIS based incremental load. This was a great find, as one of the ways in which CDC really shines is when used as part of an ETL solution. There are other very useful applications, such as auditing data changes (not to be confused with SQL Server 2008's Audit feature), but I'll focus on the ETL application here.
For more in-depth information about what Change Data Capture is and how it works see the BOL entry 'Basics of Change Data Capture'.
What I'd like to do over the next few posts is share a few scripts I've come up with to streamline the implementation of CDC. While the basics are very easy to implement, it's the SSIS package(s) that could get rather tedious to build without some generic scripts.
So, in this first post I'll explain how to set CDC up for the database that has the source data for the ETL procedures.
There are several steps required:
- Create a table and populate it with the names of the tables to track via CDC
- As is recommended create a filegroup for the CDC tables
- Enable the database for CDC
- Enable the desired tables for CDC
--1. Create table with list of tables to track. As an example:SQL 2008+
--2. Create a filegroup for the CDC tables
--3. Enable the database for CDC
--4. Enable the desired tables for CDC. Run the resulting scripts on the source database and all data changes will be tracked.
Download the SQL file with the above queries.
Once you've run the above, you're done! Pretty easy. In the next post I'll delve into setting up the both the datawarehouse ad the SSIS package for incremental loads.
Part 2 - Peparing for SSIS