Tuesday, 14 October 2014

Automating Sliding Partition Windows - Part 4

Recurring Tasks

In the previous posts of this series I both explained the concept of the sliding window partition as well as the way I have configured my system for an automated sliding window. This post will delve into the automated part of the process, the tasks that recur on a regular, in my case monthly, basis.

I have decided to use an SSIS package to execute the various steps of the sliding window, but it is just as valid to wrap the steps into cursors or loops (obviously, you only need a loop if you have more than one table, partition scheme and/or function in your database). Depending on how much you plan on automating there are some challenges in using SSIS. I will be explaining how I've constructed my SSIS package, but it's outside the scope of this post series to explain how to use SSIS. There are many resources on the web that can help you if needed.

Switching Partitions

The first step involved is switching partitions between tables. If you remember from my previous posts I created a staging table for each partitioned table. So, at the end of each month (my partitions are all based on dates and partitioned monthly) partitions that are over a month old are switched to the staging table(s). In my scenario, the data in the staging tables are then moved to an archive database and the staging tables truncated.

You may also remember that I referred to a great query that provides me with all the partition info in my database. A cut-down version of that will provide me with the meta-data needed to perform the switch tasks. The query below finds all the tables and related partition schemes that match my particular naming convention (scheme names start with 'ps' function name start with ‘pf’ and the staging table names end with '_stg_Archive'. This is all up to you, but it certainly helps to have a set naming convention to stick to.

In my SSIS package, the above query results are stored in a system.object variable (essentially a table variable). I then use a ForEach Loop container and loop through the result set to dynamically create and execute the split partition SQL statements.

For the sql statement I've defined the following variables that are populated from the query above:

You'll notice that there a a few extra parameters defined, specifically the @MinPartNumber and @MaxPartNumber. As it is not worth switching partitions that are empty, the script is written to find the minimum partition number that has more than 0 rows and switch that one to the staging table. Currently, this procedure switches only one partition per table for each run. It is good practice to leave at least one empty partition at the beginning of your partition scheme due to some quirks in SQL Server - Kendra Little of Brent Ozar Unlimited explains it well in her post Best Practices + Table Partitioning: Merging Boundary Points

At this point, it will be a good idea to know what you want to do with the data that has just been switched. The reason for this is that merging two partitions that each have data in them is very time and resource intensive and will block any queries attempting to access the tables using the partitions in question.

Cleaning Up the Staging Tables

In the process that I have been outlining here I switch the data into staging tables and then transfer the data to an archive database. Once that has completed, I truncate the staging tables. This ensures that the partition is empty and any merge operation will be quick. If you are not archiving the data and do not need it, then you can go straight to the truncate statement. Since I'm doing this all in ans SSIS ForEach Loop container, my truncate looks like this:

Because I am transferring data between servers, I put additional checks in to make sure all the data has been moved. I will describe that in a later post.

One last thing to understand about partition merging is the direction they are merged in. Kendra Little explains it well in the link above. But, in short, when you merge Partitions 1 and 2, SQL Server moves partition 2 into 1 - backwards, essentially. This is important for two reasons: 1. you need to know which partition needs to be empty; 2. the filegroup that remains used is the one associated with the partition that is merged into, i.e. partition 1.

For this reason I like to build my partition scheme using PRIMARY as the first filegroup. So, as the sliding partition progresses, I can remove the custom filegroups I create along the way and keep things tidy. I'll go into file and filegroup removal in a later post.

Splitting Partitions

The next step in the sliding window is the partition split. As a reminder, the sliding partition allows you to create future partitions on the fly. The reason for this is multi-fold, not least that there are scenarios where you would exceed the number of allowed partitions, or, as in my case, space limitations and performance.

Similarly to the first query in this post, the query below finds the partition schemes and functions, along with their min and max partition numbers and values. The results are then used by the subsequent query within a ForEach loop container in my SSIS package. Once again, depending on how many partition functions and/or schemes you have yo can either run this on its own or, if you don't want to use SSIS, in a tsql loop.

Once again, in the SSIS package, the above script's results are stored in system.object variable which is then used in a ForEach Loop container with the query below.
There are a couple of things that are required for splitting a partition:
  • Setting the Next Used filegroup for the partition scheme
  • Making sure that filegroup exists

The above is listed in reverse execution order, as SQL Server won't allow you to set a filegroup as the next used filegroup if it doesn't exist. Now, if you've decided that your partitions will use the PRIMARY filegroup, you only need to set the next used to PRIMARY, and you can remove the 'Add filegroup' section of the query below. Otherwise, the query will determine what the next filegroup should be for each partition scheme by referencing the filegroup mapping table (see previous post) and then check for its existence. It will then add it to the database if needed.

Merging Partitions

Keeping in mind the idea of leaving at least one empty partition in place, the script below, that identifies the partitions to merge, will choose lowest partition number with more than zero rows but is also not the absolute minimum partition number. (If you've skipped to this section you may want to read the last two paragraphs of the switching partitions section above)

For example, if the database partitions look like:
The query below will select partition 3 as the partition to merge. So partition 3 will be merged into partition 2 and we will be left with four partitions - renumbered as 1 - 4. You will notice that I have included a loop into the query as a safe-guard and also in order to allow introducing this process into an already existing partitioning scheme.

For example, if the database partitions look like:

Partition 1    0 rows
Partition 2    0 rows
Partition 3    1000000 rows
Partition 4    1000000 rows
Partition 5    2000000 rows

The query below will select partition 3 to merge. So, partition 3 will be merged into partition 2 and we will be left with 4 partitions - renumbered as 1 - 4.You will notice that I have included a loop into the query as a safe-guard and also in order to allow introducing this process into an already existing partitioning scheme.

And with that, we have a basic automatic sliding partition window. Set up as a monthly job either as a stored procedure or using an SSIS package, your partitions will be maintained almost on their own. Obviously, testing needs to be performed before deploying to a production environment and I present all these queries "as is" with no warranty.

I thought I would also make a sample SSIS package available on my Google Drive (Sliding Partition Window.dtsx). It is a cut down version of the one I'm using. It performs only the tasks that I have outlined in this post.

Future posts will cover additional tasks that I require including transferring data to an archive database and removing empty files and filegroups.

1 comment:

Gio Mac said...

HiThomas, thanks for the youre posts about the sliding window topic. I read the last part of the tutorial. I wanna ask did you write about archive database and removing empty files from filegroups?

I download your SSIS package and wonder a bit its containg all stuff needed :)