Sliding Partition Window Concept
Before we continue, let’s remind ourselves of what the
Sliding Partition Window is and what it gets us. The whole point of the sliding
window is to create a sort of data conveyor belt. New data comes into the
production table and is placed in the empty partitions (and files where
relevant) and the old data is “switched” or moved off to a staging table for
further processing and, crucially, removal. In this way the table slides
forward along its partitions but remains essentially the same size. There will
always be a set number of partitions worth of data in your production table
keeping it lean and fit. In addition, once you’ve moved the data from the
staging tables to an archive or reporting database the physical files can be
removed and your production database can be maintained at a manageable, near
constant, size.
In the previous post I provided a list of the tasks involved
in both setting up and maintaining the sliding partition window. In this post I
will delve into the tasks in the first list: the one-off tasks.
One-off tasks
Create partition function
When designing a partition function and scheme three
fundamental decisions need to be made: first, the data type and column on which
the table will be partitioned; second, the intervals that each partition will
cover; and finally what filegroup or filegroups will be used for the partition
scheme.
I’ve designed my partitions to use a datetime column: i.e. ‘DateCreated’.
Based on the amount of data being input into the table, I’ve decided to include
a month’s worth of data per partition. And finally, to fully leverage the
performance benefits of partitions, I decided to place each partition on a
separate filegroup - and therefore file(s).
The scripts required to set up the above are as follows:
In (not just) my opinion it is easier to deal with RIGHT
partitions, especially when dealing with dates. Remember, though, that the date
specified for each partition is really the starting date for the next
partition. It took me a little time to get my head around the offset. A RIGHT
partition is defined in BOL as follows:
CREATE PARTITION FUNCTION myRangePF2 (int) AS RANGE RIGHT FOR VALUES (1, 100, 1000);The following table shows how a table that uses this partition function on partitioning column col1 would be partitioned.
Partition |
1 |
2 |
3 |
4 |
Values
|
col1 < 1
|
col1 >= 1 AND col1
<100
|
col1 >= 100 AND col1 < 1000
|
col1 >= 1000
|
Create Filegroups
Since I defined two partitions in the Partition Function,
three filegroups need to be created. This is to allow SQL Server to place any
data that falls outside the defined partitions onto another filegroup. The
PRIMARY filegroup could be defined as the third, but that would beat the
purpose of what I’m trying to achieve here.
Taking into account the offset in the date ranges above, I
have named my filegroups to represent the data that will actually be contained
within them:
- fgMonthlyPartition012013 (Rows with a DateCreated value of < ‘2013-02-01’)
- fgMonthlyPartition022013 (Rows with a DateCreated value of < ‘2013-03-01’)
- fgMonthlyPartition032013 (Rows with a DateCreated value of < ‘2013-04-01’)
The CREATE FILEGROUP and FILE scripts will look something
like:
Create Partition Scheme
The partition scheme needs to be created using the partition
function and filegroups created above.
Create partitioned table
Now that we’ve got all the basics down it’s time to create
the partitioned table if it doesn’t yet exist. Here’s a very simple example:
If your table exists, then move it to the partition. This is
achieved by rebuilding the clustered index on the partition scheme. If the
clustered index is the primary key, drop the constraint and recreate it. Remember
that the clustered index must include the partitioning column in order to place
it on the partition scheme. Below are examples of how to do this:
Drop and recreate the Primary Key constraint on a partition scheme
Rebuild an index on a partition scheme
An important note on indexes: In order to perform partition
switching all indexes on the partitioned table must be “aligned” with the
partitioned table. The easiest way to align an index with its partitioned table
is to create it on the same partition scheme (making sure to include the
partitioning column within the index). If you need more in-depth info on
partitions and indexes see: Special
Guidelines for Partitioned Indexes.
Create a staging table
The staging table is where the data on the partition being
switched is placed. Generally, the staging table is meant, as suggested by the
name, to be a temporary repository. Remember the main point of partition
switching is to enable resource intensive tasks, such as data archiving or
similar ETL operations, to be performed with minimal impact on the production system.
In my scenario the partition being switched to the staging table will prevent
the database from reading and locking the production table. Also, since I have
placed each partition on its own filegroup all I/O activity is isolated minimising
contention with the rest of the production table.
The staging table should look exactly like the production
table in terms of its columns, data types , partitioning column, and clustered
index. It need not have exactly the same nonclustered indexes. Finally, the
production table’s identity column doesn’t need to be defined as identity on
the staging table.
Here’s an example for my scenario:
Filegroup and Partition Scheme Mapping
I decided to create and populate a table mapping the
filegroups with their respective partition scheme and partition value. This
table allows you to do the following:
-
Plan your partition and filegroups in advance
-
Save on creating all the filegroups and files immediately
Having created the first set of filegroups, the partition
function and the partition scheme we should be able to see how the partition
will “slide”. The mapping table can now be created with a script like the one
below:
The script above populates a table intended to be used
within the scope of a single database. If you want to have a centralised
mapping table, you would need to include a database_name column and store the
table on an Admin database.
You may also need to modify this script if you are creating
partitions for a different interval, i.e. days or quarters, or using a
different data type. But I think it gives you a good starting point to create
your own mapping table.
The final bit of this post involves populating your new
table. Here’s a script to get data into the first two partitions:
In my next post I will go through the recurring tasks that will maintain the partitions on an ongoing basis. Don't be confused, it is Part 4. I created a Part 3 to fill the gap in time, but it is not really part of this series.
2 comments:
I enjoyed Parts 1 and 2 of this article. Can we expect to see a part 3 where you explain the recurring tasks to maintain the sliding partition window?
Hi Brian,
Thanks for your comment. Yes another part is coming. I have been meaning to work on this and just have not had the time to devote to it.
Tom
Post a Comment