In my last post I covered the reasons for poor performance and the ways of pinpointing performance bottlenecks.
In this post I will detail some of the ways of resolving those performance bottlenecks.
Resolving Performance ProblemsThis section will give an overview of how to determine which indexes to create and/or modify and how to modify a complex query to improve performance. As well as targeting changes to alleviate critical performance problems I will mention a few things I do to keep a database tidy and where to use best practice for performance reasons.
How indexes workThe key to creating an effective index is knowing a few basic rules about them. For more detailed information on indexes see Understanding Indexes in Books Online.
- Clustered indexes contain all the data of the table ordered by the column(s) that define the index. There can only be one clustered index per table, because the data rows themselves can only be sorted in one order.
- The query optimizer does not generally consider more than the first three columns of an index.
- The column that is used in the WHERE clause in an equal to (=), greater than (>), less than (<), or BETWEEN search condition, or participates in a join, should be placed first. Additional columns should be ordered based on their level of distinctness, that is, from the most distinct to the least distinct.
- The less distinct a column is the less useful it becomes in an index.
- Columns in a select clause can be added to a non-clustered index as ‘Included’ columns. This places the data of the included columns on the index page and prevents the database from performing additional reads of the clustered index or heap to retrieve data not defined as an index key column. USE THIS FEATURE SPARINGLY as it increases database size and impacts performance. See below for how to use.
Creating a new index and choosing the columnsOnce you’ve identified the expensive statement in the query and you’ve determined that an index would help you need to decide how to design that index. As briefly mentioned in 7.1.1 you need to look at both the where and join clauses to find the columns used for the table needing the index. Focus first on columns involved in equality conditions (=, >, <, or BETWEEN). And second on any inequality conditions (!=,<>,NOT BETWEEN). Then order the columns within the index from most distinct to least distinct.
If the table is very large, consider adding columns from the SELECT clause as included columns to the index. As stated earlier, be careful using this feature. Adding all the columns of a table as included columns is not a very good idea. It unnecessarily bloats the database and updating the index may cause performance problems.
The naming convention for indexes is: IX_TableName_ColumnName1_ColumnName2. Each environment is different, but I try to spread the database load over at least two filegroups and add at least two files to each filegroup. The database systems I currently manage use merge replication extensively. I have tried, as much as I can, to move the user defined objects onto a separate filegroup. So the system created merge objects sit on the PRIMARY (or default) filegroup and the application specific objects sit on distinct files. Optimally, these files would sit on a separate array (though in my case that was not possible).
There is quite a debate regarding the benefits of multiple database files that reside on a single disk/array. I know that there are arguments for and against. My experience has shown that it does bring benefits. Care needs to be taken as to how many files you create as too many files will tax the CPU and disk controller. I usually stick to a 4:1 processor to file ratio. A full discussion of file optimisation is beyond the scope of this post.
The application database I manage extensively use explicit index hints. I'm not sure why. There must have been a period of poor performance and someone solved it using an index hint. I can only assume that it was then decreed that all developers use such hints. See the Tuning Queries section to see what to do with those hints.
NB: Renaming or removing existing indexes may prevent stored procedures from running. DO NOT let this keep you from doing what you feel is right. Instead go through the SQL code and ensure the index you are modifying is not referenced anywhere.
Modifying an existing indexThere are several cases where it is advisable to modify an existing index:
- There are already many indexes on the table
- The new index would be very similar (overlap) an existing index
- Adding included columns to existing index
Removing indexesAs discussed in the two previous sections it is a good idea to keep the database tidy by removing indexes that are not being used and are only being written to. Using the queries in those sections run the drop index queries that they return.
Tuning QueriesHere I will discuss a few ways of improving the performance of queries. Some of these will not necessarily boost performance, but over time they should aid in trimming down the database as a whole and gradually have an impact.
Remove index hints***Best practice tip***
In 99.9% of cases SQL Server’s query optimizer can evaluate how best to exploit the resources available to it without the use of explicit hints from humans. And this includes the use of explicit index hints in queries. If any such hints are encountered remove them immediately.
Remove SELECT ****Best practice tip***
This construct has no place in a production system. There are several reasons:
- Causes problems if tables in queries change
- For EXISTS clauses it forces unnecessary data page reads
- Visually makes it difficult to read the code
- Test performance difference between temp tables and table variables.
- Consider adding clustered index on between temp tables and table variables.
- Larger data sets may run faster on temp tables than table variables.
- Larger temp tables may benefit from additional indexes (follow same guidelines as for normal table).
- Table variables only support constraints. So, in terms of indexes, only PRIMARY and UNIQUE keys can be defined.
- Queries joining many tables, especially when using OUTER JOINs, may benefit from the use of temporary objects. See the next section for detailed explanation.
Breaking up large joinsThis is a very soft science. What I generally look for are discreet sections of the join clause that can be easily made into a temp table. For example, look for a derived table. A derived table is essentially a subquery within a join statement. Derived tables can usually be taken out of the query and placed above it as a "SELECT INTO #TempTable" statement. Then replace the derived table section of the join with the new temp table and join on the same columns.
Stepping through the various joins and replacing them with temporary objects can bring significant performance benefits if done correctly. However, it is not guaranteed and you need to test the performance as you make the changes to be sure a performance gain is being made.
Optimise Cursors***Best practice tip***
Cursors are used very widely in the databases I manage, often unnecessarily. Removing them is not a simple matter as they are often nested or involve the execution of stored procedures within them. However, there are cases where they can easily be converted to set based queries, see this example from my "Remove a Cursor From a Stored Procedure" posts.
The problem with cursors is down to coding mentality. Often the developers writing SQL code are not SQL Server developers. If they come from an Oracle background or are application developers a row-by-row approach is how they think. But SQL Server works best using set-based queries. So it is worth trying to encourage developers to write their code in such a manner. If they don't feel comfortable doing it that way take the time to work with them and show them how easy changing from row based to set based is. The improved performance is truly worth the effort.
If you embrace the advice of the top SQL Server set-based evangelist Jeff Moden you'll be well on your way: "[The] first step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
ConclusionI hope this series has helped give you a better understanding of SQL Server performance and how to tune your database to run better.
Remember that as a DBA you need to not only attack the database when it starts performing poorly, but also create a maintenance schedule aimed at continuously improving performance. You should treat your database like a car. At regular time intervals or development cycles you should revisit your code, indexes and statistics.
With the SQL Sever 2005+ editions Microsoft have given the DBA an arsenal of diagnostics tools in the form of the Database Management Views and Functions. Use these to keep your database up to date with the needs of the application and users who access the data. As you have read above, this involves both adding, removing, and modifying indexes. It is also your job, as the performance expert, to develop and enforce SQL coding best practices.
As ever, please feel free to comment. Let me know if you think I'm wrong or if I've missed something. As much as everyone else I am still learning and greatly value other people's knowledge and opinions.