Saturday, 5 October 2013

Automating Sliding Partition Windows - Part 2

Sliding Partition Window Concept

Before we continue, let’s remind ourselves of what the Sliding Partition Window is and what it gets us. The whole point of the sliding window is to create a sort of data conveyor belt. New data comes into the production table and is placed in the empty partitions (and files where relevant) and the old data is “switched” or moved off to a staging table for further processing and, crucially, removal. In this way the table slides forward along its partitions but remains essentially the same size. There will always be a set number of partitions worth of data in your production table keeping it lean and fit. In addition, once you’ve moved the data from the staging tables to an archive or reporting database the physical files can be removed and your production database can be maintained at a manageable, near constant, size.

In the previous post I provided a list of the tasks involved in both setting up and maintaining the sliding partition window. In this post I will delve into the tasks in the first list: the one-off tasks.

One-off tasks

Create partition function

When designing a partition function and scheme three fundamental decisions need to be made: first, the data type and column on which the table will be partitioned; second, the intervals that each partition will cover; and finally what filegroup or filegroups will be used for the partition scheme.
I’ve designed my partitions to use a datetime column: i.e. ‘DateCreated’. Based on the amount of data being input into the table, I’ve decided to include a month’s worth of data per partition. And finally, to fully leverage the performance benefits of partitions, I decided to place each partition on a separate filegroup - and therefore file(s).
The scripts required to set up the above are as follows:

In (not just) my opinion it is easier to deal with RIGHT partitions, especially when dealing with dates. Remember, though, that the date specified for each partition is really the starting date for the next partition. It took me a little time to get my head around the offset. A RIGHT partition is defined in BOL as follows:
CREATE PARTITION FUNCTION myRangePF2 (int)  AS RANGE RIGHT FOR VALUES (1, 100, 1000);The following table shows how a table that uses this partition function on partitioning column col1 would be partitioned.

Partition 1 2 3 4
col1 < 1
col1 >= 1 AND col1 <100
col1 >= 100 AND col1 < 1000
col1 >= 1000

Create Filegroups

Since I defined two partitions in the Partition Function, three filegroups need to be created. This is to allow SQL Server to place any data that falls outside the defined partitions onto another filegroup. The PRIMARY filegroup could be defined as the third, but that would beat the purpose of what I’m trying to achieve here.
Taking into account the offset in the date ranges above, I have named my filegroups to represent the data that will actually be contained within them:
  • fgMonthlyPartition012013 (Rows with a DateCreated value of < ‘2013-02-01’)
  • fgMonthlyPartition022013 (Rows with a DateCreated value of < ‘2013-03-01’)
  • fgMonthlyPartition032013 (Rows with a DateCreated value of < ‘2013-04-01’)

The CREATE FILEGROUP and FILE scripts will look something like:

Create Partition Scheme

The partition scheme needs to be created using the partition function and filegroups created above.

Create partitioned table

Now that we’ve got all the basics down it’s time to create the partitioned table if it doesn’t yet exist. Here’s a very simple example:

If your table exists, then move it to the partition. This is achieved by rebuilding the clustered index on the partition scheme. If the clustered index is the primary key, drop the constraint and recreate it. Remember that the clustered index must include the partitioning column in order to place it on the partition scheme. Below are examples of how to do this:

Drop and recreate the Primary Key constraint on a partition scheme

Rebuild an index on a partition scheme

An important note on indexes: In order to perform partition switching all indexes on the partitioned table must be “aligned” with the partitioned table. The easiest way to align an index with its partitioned table is to create it on the same partition scheme (making sure to include the partitioning column within the index). If you need more in-depth info on partitions and indexes see: Special Guidelines for Partitioned Indexes.

Create a staging table

The staging table is where the data on the partition being switched is placed. Generally, the staging table is meant, as suggested by the name, to be a temporary repository. Remember the main point of partition switching is to enable resource intensive tasks, such as data archiving or similar ETL operations, to be performed with minimal impact on the production system. In my scenario the partition being switched to the staging table will prevent the database from reading and locking the production table. Also, since I have placed each partition on its own filegroup all I/O activity is isolated minimising contention with the rest of the production table.
The staging table should look exactly like the production table in terms of its columns, data types , partitioning column, and clustered index. It need not have exactly the same nonclustered indexes. Finally, the production table’s identity column doesn’t need to be defined as identity on the staging table.
Here’s an example for my scenario:

Filegroup and Partition Scheme Mapping

I decided to create and populate a table mapping the filegroups with their respective partition scheme and partition value. This table allows you to do the following:
-          Plan your partition and filegroups in advance
-          Save on creating all the filegroups and files immediately
Having created the first set of filegroups, the partition function and the partition scheme we should be able to see how the partition will “slide”. The mapping table can now be created with a script like the one below:

The script above populates a table intended to be used within the scope of a single database. If you want to have a centralised mapping table, you would need to include a database_name column and store the table on an Admin database.
You may also need to modify this script if you are creating partitions for a different interval, i.e. days or quarters, or using a different data type. But I think it gives you a good starting point to create your own mapping table.
The final bit of this post involves populating your new table. Here’s a script to get data into the first two partitions:

In my next post I will go through the recurring tasks that will maintain the partitions on an ongoing basis. Don't be confused, it is Part 4. I created a Part 3 to fill the gap in time, but it is not really part of this series.


Brian M. said...

I enjoyed Parts 1 and 2 of this article. Can we expect to see a part 3 where you explain the recurring tasks to maintain the sliding partition window?

Thomas Mucha said...

Hi Brian,

Thanks for your comment. Yes another part is coming. I have been meaning to work on this and just have not had the time to devote to it.