Back in November I kicked off a survey that had you run some code to get some details about your cluster keys, nonclustered indexes, and table size. I got results from more than 500 systems across the world, resulting in 97565 lines of data - thanks!

The purpose of the survey is to highlight one of the side-effects of not adhering to the general guidelines (i.e. there are exceptions to these) for choosing a clustered index key. It should be, if possible:

  1. Narrow
  2. Static
  3. Ever-increasing
  4. Unique

The survey and this post are intended to show how not adhering to Rule #1 can lead to performance problems.

Both Kimberly and I have explained in the past the architecture of nonclustered indexes - where every nonclustered index row has to have a link back to the matching heap or clustered index record. The link must be a unique value as it must definitively match a single record in the heap or clustered index. For nonclustered indexes on a table with a clustered index, this link is the cluster key (or keys) as these are guaranteed to be unique. Ah, you say, but what about when the clustered index is NOT defined as unique? That's where Rule #4 comes in. For a non-unique clustered index, there will be a hidden 4-byte column (called the uniquifier) added when necessary as a tie-breaker when multiple clustered index records have the same key values. This increases the clustered index key size by 4 bytes (the uniquifier is an integer) when needed.

But I digress. The crux of the matter is that every nonclustered index record will include the cluster keys. The wider the cluster key size is (e.g. a few natural keys), the more overhead there is in each nonclustered index record, compared to using, for instance, an integer (4-byte) or bigint (8-byte) surrogate cluster key. This can mean you've got the potential for saving huge amounts of space by moving to smaller clustered index keys - as we'll see from the data I collected.

The survey code I got you to run returned:

  1. Number of nonclustered indexes
  2. Number of cluster keys
  3. Total cluster key size
  4. Number of table rows
  5. Calculation of bytes used in all the nonclustered indexes to store the cluster keys in each row

I did not take into account filtered indexes in 2008, or variable length cluster key columns, as to be honest although these will make a difference, for the purposes of my discussion here (making you aware of the problem), they're irrelevant. It also would have made the survey code much more complex for me to figure out :-)

Now let's look at some of the results I received. To make things a little simpler, I discarded results from tables with less then ten thousand rows, and with clustered index key sizes less than 9. This dropped the number of data points from 97565 down to 22425.

The graphs below show the estimated amount off savings that could be had in GB from moving to an 8-byte bigint, plotted against the first four factors in the list above.

 

 

 

 

And here are the top 20 in terms of potential savings so you can see how the rough table schema:

NCIndexes  ClusterKeys  KeyWidth  TableRows      KeySpaceGB  SavingsGB
---------  -----------  --------  -------------  ----------  ---------
6          4            72        891,751,171    358.8       352.1
6          3            16        3,189,075,035  285.1       261.4
1          5            45        4,479,327,954  187.7       154.4
6          4            72        453,251,463    182.4       179.0
4          3            16        2,766,814,206  164.9       144.3
4          2            89        371,745,035    123.3       120.5
2          4            774       76,337,053     110.1       109.5
2          4            774       76,331,676     110.0       109.5
2          4            774       75,924,837     109.5       108.9
2          4            774       75,533,539     108.9       108.3
5          4            72        318,217,628    106.7       104.3
7          1            60        269,590,810    105.5       103.4
22         3            13        389,203,725    103.7       100.8
22         3            13        329,772,049    87.8        85.4
2          2            509       90,311,271     85.6        85.0
17         1            510       9,334,362      75.4        75.3
22         3            13        267,380,864    71.2        69.2
2          7            172       219,929,560    70.5        68.8
22         3            13        261,967,851    69.8        67.8
6          5            31        395,800,250    68.6        65.6

Wow - that's some pretty amazing stuff - and that doesn't even account for the space taken up by page headers etc.

You might be thinking - why do I care? There are plenty of reasons:

  • If you can save tens or hundreds of GBs by changing the cluster key to something much smaller, that translates directly into a reduction in size of your backups and data file disk space requirements.
  • Smaller databases mean faster backups and restores.
  • Making the nonclustered indexes smaller means that index maintenance (from inserts/updates/deletes) and index fragmentation removal will be much faster and generate less transaction log.
  • Making the nonclustered indexes smaller means that consistency checking will be much faster - nonclustered index checking takes 30% of the CPU usage of DBCC CHECKDB.
  • Reducing the width of nonclustered index records means the density of records (number of records per nonclustered index page) increases dramatically, leading to faster index processing, more efficient buffer pool (i.e. memory) usage, and fewer I/Os as more of the indexes can fit in memory.
  • Anything you can do to reduce the amount of transaction log directly affects the performance of log backups, replication, database mirroring, and log shipping.

As you can see, there are many reasons to keep the cluster key as small as possible - all directly translating into performance improvements. For those of you that think that moving to a bigint may cause you to run out of possible keys, see this blog post where I debunk that - unless you've got 3 million years and 150 thousand petabytes to spare...

One thing I'm not doing in this post is advocating any particular key over any other (although bigint identity does fit all the criteria from the top of the post) - except to try to keep it as small as possible. Choosing a good cluster key entails understanding the data and workload as well as the performance considerations of key size that I've presented here. And in some very narrow cases, not having a cluster key at all is acceptable - which means there's 8 bytes in each nonclustered index record (just to forestall those who may want to post a comment arguing against clustered indexes in general :-)

Changing the cluster key can be tricky - Kimberly blogged a set of steps to follow plus some code to help you on our SQL Server Magazine blog back in April 2010.

Later this week I'll blog some code that will run through your databases and spit out table names that could have significant space savings from changing the cluster key.

I won't be blogging or tweeting much in January as we'll be in Indonesia diving, but I will be posting photos later in the month.

Enjoy!

Continuing with my "index health" series, I've got another piece of code for you to run.

This time I'm interested in the number of columns in your clustered indexes and the consequent amount of nonclustered index space used by the clustered index keys.

Again, you're going to be really interested to see the results on your servers. When I editorialize the results I'll provide another query for you to run which will make the data actionable on your server.

Here are some results from a random customer server (yes, we already knew about these - long story :-):

NCIndexes ClusterKeys KeyWidth TableRows            KeySpaceInBytes
--------- ----------- -------- -------------------- --------------------
7         3           16       129902437            14549072944
1         3           12       29199817             350397804
10        2           12       1612919              193550280
5         2           5        4266671              106666775
2         2           8        5887697              94203152
5         4           20       827975               82797500
3         3           16       1215800              58358400
7         2           5        1497746              52421110
1         3           12       2667765              32013180
1         4           25       1033063              25826575
1         3           12       989320               11871840
2         2           8        278989               4463824
1         3           12       293736               3524832
4         2           5        160696               3213920

Feel free to send the results in any format you want - Excel spreadsheet works best though. Try not to add any columns to the result set - complicates the aggregation process.

The more results the better - thanks!

Here's the code:

IF EXISTS (SELECT * FROM tempdb.sys.objects WHERE [name] = 'SQLskillsIKSpace')
    DROP TABLE tempdb.dbo.SQLskillsIKSpace;
GO
CREATE TABLE tempdb.dbo.SQLskillsIKSpace (
    DatabaseID SMALLINT,
    ObjectID INT,
    IndexCount SMALLINT,
    TableRows  BIGINT,
    KeyCount   SMALLINT,
    KeyWidth   SMALLINT);
GO

EXEC sp_MSforeachdb 
    N'IF EXISTS (SELECT 1 FROM (SELECT DISTINCT [name]
    FROM sys.databases WHERE [state_desc] = ''ONLINE''
        AND [database_id] > 4
        AND [name] != ''pubs''
        AND [name] != ''Northwind''
        AND [name] != ''distribution''
        AND [name] NOT LIKE ''ReportServer%''
        AND [name] NOT LIKE ''Adventure%'') AS names WHERE [name] = ''?'')
BEGIN
USE [?]

INSERT INTO tempdb.dbo.SQLskillsIKSpace
SELECT DB_ID (''?''), o.[object_id], 0, 0, 0, 0
FROM sys.objects o
WHERE o.[type_desc] IN (''USER_TABLE'', ''VIEW'')
    AND o.[is_ms_shipped] = 0
    AND EXISTS (
        SELECT *
        FROM sys.indexes
        WHERE [index_id] = 1
            AND [object_id] = o.[object_id]);

UPDATE tempdb.dbo.SQLskillsIKSpace
SET [TableRows] = (
    SELECT SUM ([rows])
    FROM sys.partitions p
    WHERE p.[object_id] = [ObjectID]
    AND p.[index_id] = 1)
WHERE [DatabaseID] = DB_ID (''?'');
 
UPDATE tempdb.dbo.SQLskillsIKSpace
SET [IndexCount] = (
    SELECT COUNT (*)
    FROM sys.indexes i
    WHERE i.[object_id] = [ObjectID]
    AND i.[is_hypothetical] = 0
    AND i.[is_disabled] = 0
    AND i.[index_id] != 1)
WHERE [DatabaseID] = DB_ID (''?'');

UPDATE tempdb.dbo.SQLskillsIKSpace
SET [KeyCount] = (
    SELECT COUNT (*)
    FROM sys.index_columns ic
    WHERE ic.[object_id] = [ObjectID]
    AND ic.[index_id] = 1)
WHERE [DatabaseID] = DB_ID (''?'');

UPDATE tempdb.dbo.SQLskillsIKSpace
SET [KeyWidth] = (
    SELECT SUM (c.[max_length])
    FROM sys.columns c
    JOIN sys.index_columns ic
    ON c.[object_id] = ic.[object_id]
    AND c.[object_id] = [ObjectID]
    AND ic.[column_id] = c.[column_id]
    AND ic.[index_id] = 1)
WHERE [DatabaseID] = DB_ID (''?'');

DELETE tempdb.dbo.SQLskillsIKSpace
WHERE
    ([KeyCount] = 1 AND [KeyWidth] < 9)
    OR [IndexCount] = 0 OR [TableRows] = 0;

END';
GO

SELECT
    [IndexCount] AS [NCIndexes],
    [KeyCount] AS [ClusterKeys],
    [KeyWidth],
    [TableRows],
    [IndexCount] * [TableRows] * [KeyWidth] AS [KeySpaceInBytes]
FROM tempdb.dbo.SQLskillsIKSpace
ORDER BY [KeySpaceInBytes] DESC;

DROP TABLE tempdb.dbo.SQLskillsIKSpace;
GO

Yesterday I blogged about how having too few or too many nonclustered indexes can be a big problem for performance (see here). Today I'm posting some code you can run which will print out the number of indexes for each table in each database on an instance.

I made it print a result set per database so the table in msdb doesn't get too big having to store schema and object names. You can get the code below and here: NCIndexCounts.zip (870.00 bytes).

I also created a version that *does* store names and prints a single result set. You can get that one here: NCIndexCountsSingleResultSet.zip (1.41 kb).

There's probably a niftier way to do this but I'm not a T-SQL expert :-)

Enjoy!

/*============================================================================
  File:     NCIndexCounts.sql

  Summary:  Nonclustered index counts (multiple result sets)

  SQL Server Versions: 2005 onwards
------------------------------------------------------------------------------
  Written by Paul S. Randal, SQLskills.com

  (c) 2011, SQLskills.com. All rights reserved.

  For more scripts and sample code, check out
   
http://www.SQLskills.com

  You may alter this code for your own *non-commercial* purposes. You may
  republish altered code as long as you include this copyright and give due
  credit, but you must obtain prior permission before blogging this code.
 
  THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF
  ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED
  TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
  PARTICULAR PURPOSE.
============================================================================*/

IF EXISTS (SELECT * FROM msdb.sys.objects WHERE [name] = 'SQLskillsPaulsIndexCounts')
    DROP TABLE msdb.dbo.SQLskillsPaulsIndexCounts;
GO
CREATE TABLE msdb.dbo.SQLskillsPaulsIndexCounts (
    SchemaID INT,
    ObjectID INT,
    BaseType CHAR (10),
    IndexCount SMALLINT);
GO

EXEC sp_MSforeachdb
    N'IF EXISTS (SELECT 1 FROM (SELECT DISTINCT [name]
    FROM sys.databases WHERE [state_desc] = ''ONLINE''
        AND [database_id] > 4
        AND [name] != ''pubs''
        AND [name] != ''Northwind''
        AND [name] != ''distribution''
        AND [name] NOT LIKE ''ReportServer%''
        AND [name] NOT LIKE ''Adventure%'') AS names WHERE [name] = ''?'')
BEGIN
USE [?]
INSERT INTO msdb.dbo.SQLskillsPaulsIndexCounts
SELECT o.[schema_id], o.[object_id], ''Heap'', 0
FROM sys.objects o
WHERE o.[type_desc] IN (''USER_TABLE'', ''VIEW'')
    AND o.[is_ms_shipped] = 0
    AND EXISTS (
        SELECT *
        FROM sys.indexes
        WHERE [index_id] = 0
            AND [object_id] = o.[object_id]);

INSERT INTO msdb.dbo.SQLskillsPaulsIndexCounts
SELECT o.[schema_id], o.[object_id], ''Clustered'', 0
FROM sys.objects o
WHERE o.[type_desc] IN (''USER_TABLE'', ''VIEW'')
    AND o.[is_ms_shipped] = 0
    AND EXISTS (
        SELECT *
        FROM sys.indexes
        WHERE [index_id] = 1
            AND [object_id] = o.[object_id]);
           
UPDATE msdb.dbo.SQLskillsPaulsIndexCounts
SET [IndexCount] = (
    SELECT COUNT (*)
    FROM sys.indexes i
    WHERE i.object_id = [ObjectID]
    AND i.[is_hypothetical] = 0)

IF EXISTS (SELECT * FROM msdb.dbo.SQLskillsPaulsIndexCounts)
SELECT
    ''?'' AS [Database],
    SCHEMA_NAME ([SchemaID]) AS [Schema],
    OBJECT_NAME ([ObjectID]) AS [Table],
    [BaseType],
    (CASE
       WHEN [IndexCount] = 0 THEN 0
       ELSE [IndexCount]-1 END)
    AS [NCIndexes]
FROM msdb.dbo.SQLskillsPaulsIndexCounts
ORDER BY [BaseType] DESC, IndexCount;

TRUNCATE TABLE msdb.dbo.SQLskillsPaulsIndexCounts;
END';
GO

DROP TABLE msdb.dbo.SQLskillsPaulsIndexCounts;
GO

Back at the start of August I kicked off a survey (see here) that gave you some code to run to produce an aggregate list of the number of tables on your server with different numbers of nonclustered indexes. I got back results from more than 1000 servers across the world - a big thank you to everyone who sent me data!

It's taken me a while to get to this post because a) I needed a few hours to set aside to aggregate the comments, txt files, and spreadsheets that people sent and load them into a database; and b) I've been really busy with teaching etc. Finally I've had time this week while at SQL Connections in Las Vegas to put together the results and this post.

The winners:

  • Highest number of nonclustered indexes on a single clustered index: 1032
  • Highest number of nonclustered indexes on a single heap: 148
  • Highest number of clustered indexes with zero nonclustered indexes on a single server: 185237
  • Highest number of heaps with zero nonclustered indexes on a single server: 88042

Wow!

Now to some of the details...

Tables with Zero Nonclustered Indexes

The two graphs below show the number of servers that have a certain number of tables with zero nonclustered indexes.

 

For the clustered indexes, there is one case I can think of where having zero nonclustered indexes is acceptable: if all queries return all columns of the table and the query search predicate for all queries is the cluster key (or a left-based subset of the cluster key).

All queries that have a search predicate that does not match the cluster key (or a left-based subset thereof) will be table scans, which can put pressure on the buffer pool (see my post on Page Life Expectancy) and lead to contention on the ACCESS_METHODS_DATASET_PARENT latch (all manifesting as a high percentage of LATCH_EX or PAGEIOLATCH_SH and maybe CXPACKET waits).

For the heaps, all queries are inefficient table scans. Well, efficient if you're returning all the rows in the table every time, I suppose :-)

Bottom line: tables usually need nonclustered indexes to provide efficient access paths to the data requested by the various queries that your workload performs.

There are two things you can do to help find queries that need nonclustered indexes:

  1. Use the missing index DMVs (cautiously!) to determine which nonclustered indexes to create. I use the script posted by Microsoftie Bart Duncan in his blog post. However don't just go create all the indexes there. I generally look for Bart's "improvement_measure" column to be above 100k before I'll consider recommending the index to a client (and on systems that already have nonclustered indexes on the table, I'll look for index consolidation possibilities). Note also that the missing index DMVs will sometimes add the cluster key as an INCLUDEd column. This is unnecessary but harmless.
  2. Look directly in the plan cache to find query plans that perform scans. I use a variant of a query published by fellow MVP Glenn Berry in this blog post. Using the graphical query plan I can see what columns are being searched for and returned from the scans and then create the correct nonclustered indexes for these.

You can also use the Database Tuning Advisor, but I don't use that, personally.

You'll be amazed at the performance difference by having a good set of nonclustered indexes.

But don't go overboard otherwise you could detrimentally affect performance by having too many nonclustered indexes...

Tables with Nonclustered Indexes

The two graphs below show the number of tables that have a certain number of nonclustered indexes.

 

The data shows that it is most common to have 10 nonclustered indexes or less, but even that may be too many.

Every nonclustered index incurs overheard when a table row is inserted or deleted, or when any of the nonclustered index key columns (or INCLUDEd columns) are updated. Filtered indexes in SQL 2008+ are a special case, obviously. The overheard takes a few forms:

  • Buffer pool (i.e. memory and I/O) overhead of having to search the nonclustered index for the record to update.
  • I/O overhead of having to flush the updated index page to disk during the next checkpoint
  • Log space for the log records generated by the operation on the nonclustered index
  • Resource overheard for those log records in terms of:
    • Time to be read by the replication/CDC log reader Agent job
    • Time to be read by log backups (and data backups, if applicable)
    • Time and bandwidth to send the log records to a database mirroring mirror
    • Disk space to store the log records in a log backup
    • Time to restore the log records on a log shipping secondary or during a disaster recovery
  • Locking overhead
  • Page split overhead
  • Time to consistency check
  • Time to examine for fragmentation
  • Time to update statistics
  • Disk and backup space overhead

As you can see, nonclustered indexes can be a big burden on a system - you have to be careful when creating them so that you don't have too many.

There are three things you can do to reduce the number of nonclustered indexes on your system:

  1. Use the sys.dm_db_index_usage_stats DMV to find indexes that are only being updated. I've blogged about this here. Again, be careful though. Just because an index hasn't been used doesn't mean it should be dropped. It may be used only infrequently, but it's critical when it is used. Ideally you need to look at the output from the DMV after an entire business cycle has passed. Even then, be careful about dropping indexes that are enforcing uniqueness constraints as these can be used by the query optimizer without reflecting any user seeks or scans.
  2. Remove duplicate nonclustered indexes. Kimberly blogged code to find duplicate indexes here. There is no downside to doing this.
  3. Look for consolidation possibilities. Kimberly has code to show you all the key and INCLUDEd columns here. This is harder and is more of an art than a science. You're looking for indexes where you can combine two or more indexes into one without affecting the ability of the optimizer to use them for the various queries that the non-consolidated indexes used to help.
    • For example, an index on c1, c2, c3 INCLUDE c4, c5 can be combined with an index on c1, c2, c3 INCLUDE c4, c6. But only as long as c6 isn't a really wide column that would affect the performance of the queries using the first index.
    • A harder example: would you consolidate an index c1, c2 INCLUDE c3 with an index on c1, c3 INCLUDE c2? Possibly. It would depend on what the indexes are being used for in queries. 

Summary

Nonclustered indexes are essential for the performance of most workloads, but how many should you have? I often get someone in a class that Kimberly's teaching on indexes to ask her "what's the optimum number of indexes a table should have?" because I know it's a nonsensical question. (And she reciprocates by getting someone to ask me "how long will CHECKDB take?" :-)

The answer is a big, fat "it depends" - and hopefully I've given you some pointers to figure it out for yourself.

I'll continue this series of posts with more surveys and code that you can use on your systems to gauge the health of your indexes.

Hope this helps!

The September edition of TechNet Magazine is available on the web now and has the latest installment of my regular SQL Q&A column.

This month's topics are:

  • Online index operations logging changes in SQL Server 2008
  • Is that transaction contained in my full backup?
  • ALTER INDEX ... REBUILD vs. ALTER INDEX ... REGORGANIZE
  • Avoiding regular log file shrinking

Check it out at http://technet.microsoft.com/en-us/magazine/hh395481.aspx.

Here's a survey I've been meaning to run for quite a while - that Kimberly and I are really interested in.

For all the databases on your server, how many nonclustered indexes do you tables have, plus is the table a heap or a clustered index?

The code below will return the following result set (taken from a random client system):

BaseType   NCIndexes TableCount
---------- --------- -----------
Clustered  0         1645
Clustered  1         832
Clustered  2         417
Clustered  3         68
Clustered  4         18
Clustered  5         13
Clustered  6         9
Clustered  7         5
Clustered  8         3
Clustered  9         3
Clustered  10        2
Heap       0         303
Heap       1         19
Heap       2         51
Heap       3         4
Heap       4         1 

I think it'll make some really interesting reading, and you may be really surprised at the results for your system - how many tables with no indexes at all and how many tables with large numbers of nonclustered indexes.

And yes, I'm sure there's a slightly better way to write the code below, but I'm not an expert T-SQL developer (and yes Aaron, I know about your (seriously) fabulous sp_MSforeachdb replacement :-)

If someone wants to bang out a quick post with the PowerShell to run this on multiple servers, I'll link to it. (Update: The PowerShell script can be downloaded from the bottom of this post)

Feel free to send the results in email, in a spreadsheet, in a comment below, or even written on papyrus - anything works.

The more results the better - thanks!

Here's the code:

IF EXISTS (SELECT * FROM msdb.sys.objects WHERE [name] = 'SQLskillsPaulsIndexCounts')
    DROP TABLE msdb.dbo.SQLskillsPaulsIndexCounts;
GO
CREATE TABLE msdb.dbo.SQLskillsPaulsIndexCounts (
    BaseType CHAR (10),
    IndexCount SMALLINT);
GO

EXEC sp_MSforeachdb
    N'IF EXISTS (SELECT 1 FROM (SELECT DISTINCT [name]
    FROM sys.databases WHERE [state_desc] = ''ONLINE''
        AND [database_id] > 4
        AND [name] != ''pubs''
        AND [name] != ''Northwind''
        AND [name] != ''distribution''
        AND [name] NOT LIKE ''ReportServer%''

        AND [name] NOT LIKE ''Adventure%'') AS names WHERE [name] = ''?'')
BEGIN
USE [?]
INSERT INTO msdb.dbo.SQLskillsPaulsIndexCounts
SELECT ''Heap'', COUNT (*)-1
FROM sys.objects o
JOIN sys.indexes i
    ON o.[object_id] = i.[object_id]
WHERE o.[type_desc] IN (''USER_TABLE'', ''VIEW'')
    AND o.[is_ms_shipped] = 0
    AND EXISTS (
        SELECT *
        FROM sys.indexes
        WHERE [index_id] = 0
            AND [object_id] = o.[object_id])
GROUP BY i.[object_id];

INSERT INTO msdb.dbo.SQLskillsPaulsIndexCounts
SELECT ''Clustered'', COUNT (*)-1
FROM sys.objects o
JOIN sys.indexes i
    ON o.[object_id] = i.[object_id]
WHERE o.[type_desc] IN (''USER_TABLE'', ''VIEW'')
    AND o.[is_ms_shipped] = 0
    AND EXISTS (
        SELECT *
        FROM sys.indexes
        WHERE [index_id] = 1
            AND [object_id] = o.[object_id])
GROUP BY i.[object_id];
END';
GO

SELECT DISTINCT [BaseType], [IndexCount] AS [NCIndexes], COUNT (*) AS [TableCount]
FROM msdb.dbo.SQLskillsPaulsIndexCounts
GROUP BY [BaseType], [IndexCount]
ORDER BY [BaseType], [IndexCount];
GO

DROP TABLE msdb.dbo.SQLskillsPaulsIndexCounts;
GO

Pauls Index Survey.ps1 (2.64 kb)

(In this post I'm not going to name-and-shame, as I'm sure the problems will be fixed in time.)

I want to warn you about unthinkingly acting on advice from 3rd-party tools around dropping nonclustered indexes. One of my long-term clients recently bought and installed a new tool and has been asking me about recommendations from it concerning indexes I've added to their system.

Case 1: recommendation to drop an index because its key is a left-based subset of another index key, without considering INCLUDEd columns

The first index has two narrow key columns (A, B) and is scanned as part of a join. The second index has a key of (A), plus 3 wide varchar columns as included columns, and is seeked as part of a join. The tool considered the second index to be redundant and recommended dropping it to save on index maintenance costs. Even combining the two indexes would have a strongly detrimental performance impact on the first join, and the index maintenance has a negligible effect on the system performance.

In this case it turns out that the product does not consider INCLUDEd columns in the nonclustered indexes when making these recommendations - a cardinal sin in my opinion.

Case 2: recommendation to drop an index because it hasn't been used since the system last rebooted 10 days ago, without considering an entire business life-cycle 

In this case the index in question is used to help a once-per-month process not result in a hash join with a huge worktable that spills to tempdb. The recommendation from the tool did not even suggest waiting for an entire business life-cycle before considering dropping the index - standard advice when considering information from sys.dm_db_index_usage_stats. I'd be interested to know whether index usage statistics are being persisted by the tool over system reboots so that recommendations are not given solely on post-reboot usage statistics.

Summary

In each of these cases I knew exactly why the index was required and the recommendation from the tool was incorrect. The client wasn't sure and so asked me.

You need to be careful out there - just because a tool says to drop an index, it doesn't mean that the developers of the heuristic algorithms the tool is using know what they're doing, as has been shown in this case.

Don't get me wrong - most tools are excellent and can save you time and money, but you need to be *extremely* careful any time dropping an index is recommended.

 

There was an interesting discussion on Twitter this morning (started by my good friend Erin Stellato (blog|twitter)) about in-place updates for index records where the key value changes and the record remains on the same page. Various sources including the SQL Server 2008 Internals book (pg 361 - I didn't write or review that bit :-) describe the process, but they describe it incorrectly unfortunately. They say that the record will remain in the exact same location on the same page and only the key storage bytes need to change.

This is not true. True in-place updates of index key values do not happen in 2005 onwards. Let's go down the rabbit hole...

Here's a simple example:

CREATE DATABASE KeyUpdateTest;
GO
USE KeyUpdateTest;
GO
ALTER DATABASE KeyUpdateTest SET RECOVERY SIMPLE;
ALTER DATABASE KeyUpdateTest SET AUTO_CREATE_STATISTICS OFF;
ALTER DATABASE KeyUpdateTest SET AUTO_UPDATE_STATISTICS OFF;
GO
CREATE TABLE test (c1 INT, c2 VARCHAR (2000));
GO
CREATE CLUSTERED INDEX test_cl ON test (c1);
GO
CHECKPOINT;
GO
INSERT INTO test VALUES (1, REPLICATE ('Paul', 500));
GO
CHECKPOINT;
GO

Now we look at the page itself (and you can get the stored-proc I use below from my old blog post Inside The Storage Engine: sp_AllocationMetadata - putting undocumented system catalog views to work):

EXEC sp_allocationMetadata 'test';
GO

Object Name Index ID Alloc Unit ID     Alloc Unit Type First Page Root Page First IAM Page
----------- -------- ----------------- --------------- ---------- --------- --------------
test        1        72057594039762944 IN_ROW_DATA     (1:144)    (1:144)   (1:145)

DBCC TRACEON (3604);
GO
DBCC PAGE (keyupdatetest, 1, 144, 2);
GO

Memory Dump @0x0000000016A6A000

0000000016A6A000:   01010400 00c20001 00000000 00000800 †.....Â..........
0000000016A6A010:   00000000 00000100 1c000000 bd174108 †............½.A.
0000000016A6A020:   90000000 01000000 1d000000 8a000000 †............Š...
0000000016A6A030:   12000000 00000000 00000000 239dbd4a †............#.½J
0000000016A6A040:   00000000 00000000 00000000 00000000 †................
0000000016A6A050:   00000000 00000000 00000000 00000000 †................
0000000016A6A060:   30000800 01000000 03000002 001100e1 †0..............á
0000000016A6A070:   07506175 6c506175 6c506175 6c506175 †.PaulPaulPaulPau
0000000016A6A080:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000016A6A090:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000016A6A0A0:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000016A6A0B0:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
<snip>
0000000016A6A820:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000016A6A830:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000016A6A840:   6c000021 21212121 21212121 21212121 †l..!!!!!!!!!!!!!
0000000016A6A850:   21212121 21212121 21212121 21212121 †!!!!!!!!!!!!!!!!
<snip>
0000000016A6BFE0:   21212121 21212121 21212121 21212121 †!!!!!!!!!!!!!!!!
0000000016A6BFF0:   21212121 21212121 21212121 21216000 †!!!!!!!!!!!!!!`.

OFFSET TABLE:

Row - Offset                        
0 (0x0) - 96 (0x60)

Now I'll update the record:

UPDATE test SET c1 = 2 WHERE c1 =1;
GO

And let's look at the log records generated (in the output below I've removed all the extraneous stuff from the log that occurs before my transaction):

SELECT [Current LSN], [Operation], [Context],
 [Log Record Length], [Page ID], [Slot ID] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation           Context               Log Record Length Page ID        Slot ID
----------------------- ------------------- --------------------- ----------------- -------------- -----------
0000001d:00000096:0011  LOP_BEGIN_XACT      LCX_NULL              112               NULL           NULL
0000001d:00000096:0012  LOP_DELETE_ROWS     LCX_MARK_AS_GHOST     2116              0001:00000090  0
0000001d:00000096:0013  LOP_MODIFY_HEADER   LCX_PFS               76                0001:00000001  0
0000001d:00000096:0014  LOP_SET_BITS        LCX_PFS               56                0001:00000001  0
0000001d:00000096:0015  LOP_INSERT_ROWS     LCX_CLUSTERED         2120              0001:00000090  1
0000001d:00000096:0016  LOP_COMMIT_XACT     LCX_NULL              52                NULL           NULL
0000001d:000000a2:0001  LOP_EXPUNGE_ROWS    LCX_CLUSTERED         64                0001:00000090  0
0000001d:000000a2:0002  LOP_SET_BITS        LCX_PFS               56                0001:00000001  0
0000001d:000000a2:0003  LOP_MODIFY_HEADER   LCX_PFS               76                0001:00000001  0

The row is deleted and then inserted. Look at the log record lengths! And look at the slot numbers. The new row is in slot 1 (the LOP_INSERT_ROWS record) and slot 0 is ghosted (the LOP_DELETE_ROWS record), and then we see ghost cleanup running (the LOP_EXPUNGE_ROWS record), which removes the old slot 0, and the slot array shifts down so the new record is slot 0 again. See my blog post Inside the Storage Engine: Ghost cleanup in depth for info on ghost cleanup internals.

But is it in the same physical place on the page? No - a whole new record was created.

DBCC PAGE (keyupdatetest, 1, 144, 2);
GO

Memory Dump @0x0000000013CAC000

0000000013CAC000:   01010400 00800001 00000000 00000800 †................
0000000013CAC010:   00000000 00000100 1c000000 bd172210 †............½.".
0000000013CAC020:   90000000 01000000 1d000000 a2000000 †............¢...
0000000013CAC030:   01000000 76020000 00000000 239dbd4a †....v.......#.½J
0000000013CAC040:   01000000 00000000 00000000 00000000 †................
0000000013CAC050:   00000000 00000000 00000000 00000000 †................
0000000013CAC060:   3c000800 01000000 03000002 001100e1 †<..............á
0000000013CAC070:   07506175 6c506175 6c506175 6c506175 †.PaulPaulPaulPau
0000000013CAC080:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000013CAC090:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
<snip>
0000000013CAC820:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000013CAC830:   6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000013CAC840:   6c300008 00020000 00030000 02001100 †l0..............
0000000013CAC850:   e1075061 756c5061 756c5061 756c5061 †á.PaulPaulPaulPa
0000000013CAC860:   756c5061 756c5061 756c5061 756c5061 †ulPaulPaulPaulPa
<snip>
0000000013CADFE0:   21212121 21212121 21212121 21212121 †!!!!!!!!!!!!!!!!
0000000013CADFF0:   21212121 21212121 21212121 41084108 †!!!!!!!!!!!!A.A.

OFFSET TABLE:

Row - Offset                        
0 (0x0) - 2113 (0x841)              

Notice that where the original record ended (offset 0x830 into the page). In the first page dump, everything after that is empty space. In the second page dump, the new record starts there. And look at the slot array (called the offset table by DBCC PAGE). In the first page dump the original record started at offset 0x60, and in the second dump it starts at offset 0x841. Even though ghost cleanup ran, the record did NOT migrate to it's original location.

It's very clear that an in-place update did not happen, and that the entire record is recreated in it's entirety.

In this case, the record ended up being physical slot 0 again, but that's a pathalogical case because there's only one slot on the page. If you repeat the experiment with two records (e.g. c1=1 and c1=2) and then update c1 of the first record to 3, you'll see the new record being physical slot 2, then after ghost cleanup becoming physical slot 1. But again, it's deleted and reinserted in it's entirety.

Now I'll show you a case where the row has to move to a new page - definitely not an in-place update!

CREATE DATABASE KeyUpdateTest;
GO
USE KeyUpdateTest;
GO
ALTER DATABASE KeyUpdateTest SET RECOVERY SIMPLE;
ALTER DATABASE KeyUpdateTest SET AUTO_CREATE_STATISTICS OFF;
ALTER DATABASE KeyUpdateTest SET AUTO_UPDATE_STATISTICS OFF;
GO
CREATE TABLE test (c1 INT, c2 VARCHAR (4000));
GO
CREATE CLUSTERED INDEX test_cl ON test (c1);
GO
CHECKPOINT;
GO
INSERT INTO test VALUES (1, REPLICATE ('Paul', 1000));
GO
INSERT INTO test VALUES (2, REPLICATE ('Erin', 1000));
GO
CHECKPOINT;
GO

I've filled up the page with two rows.

DBCC PAGE (keyupdatetest, 1, 144, 2);
GO

<snip>
Metadata: ObjectId = 2105058535      m_prevPage = (0:0)                   m_nextPage = (0:0)
pminlen = 8                          m_slotCnt = 2                        m_freeCnt = 58
m_freeData = 8130                    m_reservedCnt = 0                    m_lsn = (29:150:2)
<snip>
0000000016A6BFA0:   696e4572 696e4572 696e4572 696e4572 †inErinErinErinEr
0000000016A6BFB0:   696e4572 696e4572 696e4572 696e4572 †inErinErinErinEr
0000000016A6BFC0:   696e0000 21212121 21212121 21212121 †in..!!!!!!!!!!!!
0000000016A6BFD0:   21212121 21212121 21212121 21212121 †!!!!!!!!!!!!!!!!
0000000016A6BFE0:   21212121 21212121 21212121 21212121 †!!!!!!!!!!!!!!!!
0000000016A6BFF0:   21212121 21212121 21212121 11106000 †!!!!!!!!!!!!..`.

OFFSET TABLE:

Row - Offset                        
1 (0x1) - 4113 (0x1011)             
0 (0x0) - 96 (0x60)

Now I'll update the first row to have c1 = 3 so my row moves away from Erin's :-)

UPDATE test SET c1 = 3 WHERE c1 =1;
GO

And look at the log:

SELECT [Current LSN], [Operation], [Context],
 [Log Record Length], [Page ID], [Slot ID] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation           Context                Log Record Length Page ID        Slot ID
----------------------- ------------------- ---------------------- ----------------- -------------- -----------
0000001d:000000a3:0015  LOP_BEGIN_XACT      LCX_NULL               136               NULL           NULL
0000001d:000000a3:0016  LOP_MODIFY_ROW      LCX_PFS                80                0001:00000001  0
0000001d:000000a3:0017  LOP_MODIFY_ROW      LCX_IAM                88                0001:00000091  0
0000001d:000000a3:0018  LOP_HOBT_DELTA      LCX_NULL               64                NULL           NULL
0000001d:000000a3:0019  LOP_FORMAT_PAGE     LCX_INDEX_INTERIOR     84                0001:00000092  -1
0000001d:000000a3:001a  LOP_INSERT_ROWS     LCX_INDEX_INTERIOR     84                0001:00000092  0
0000001d:000000a3:001b  LOP_ROOT_CHANGE     LCX_CLUSTERED          96                0001:00000082  52
0000001d:000000a3:001c  LOP_COMMIT_XACT     LCX_NULL               52                NULL           NULL
0000001d:000000a3:001d  LOP_BEGIN_XACT      LCX_NULL               120               NULL           NULL
0000001d:000000a3:001e  LOP_MODIFY_ROW      LCX_PFS                80                0001:00000001  0
0000001d:000000a3:001f  LOP_MODIFY_ROW      LCX_IAM                88                0001:00000091  0
0000001d:000000a3:0020  LOP_HOBT_DELTA      LCX_NULL               64                NULL           NULL
0000001d:000000a3:0021  LOP_FORMAT_PAGE     LCX_HEAP               84                0001:00000093  -1
0000001d:000000a3:0022  LOP_INSERT_ROWS     LCX_CLUSTERED          4092              0001:00000093  0
0000001d:000000a3:0023  LOP_DELETE_SPLIT    LCX_CLUSTERED          60                0001:00000090  1
0000001d:000000a3:0024  LOP_MODIFY_HEADER   LCX_HEAP               84                0001:00000090  0
0000001d:000000a3:0025  LOP_INSERT_ROWS     LCX_INDEX_INTERIOR     88                0001:00000092  1
0000001d:000000a3:0026  LOP_COMMIT_XACT     LCX_NULL               52                NULL           NULL
0000001d:000000a3:0027  LOP_INSERT_ROWS     LCX_CLUSTERED          4120              0001:00000093  1
0000001d:000000a3:0028  LOP_COMMIT_XACT     LCX_NULL               52                NULL           NULL

This is really interesting. Here's what happens (missing a few things for clarity):

  • 0000001d:000000a3:0016: page (1:146) is allocated to be the new root page for the clustered index
  • 0000001d:000000a3:0019: it gets minimally formatted (just the page header - although the page header is 96 bytes, there is 12 bytes of empty space at the end)
  • 0000001d:000000a3:001a: the index record pointing at (1:144) is inserted in the root page
  • 0000001d:000000a3:001b: the index root os changedd from (1:144) to (1:146) in metadata
  • 0000001d:000000a3:001e: page (1:147) is allocated to be the second data page at the leaf of the clustered index
  • 0000001d:000000a3:0022: a page split of page (1:144) occurs, moving the Erin row to slot 0 of page (1:147)
  • 0000001d:000000a3:0023: the Erin row is removed from page (1:144) - not ghosted
  • 0000001d:000000a3:0025: the index record pointing at (1:147) is inserted in the root page
  • 0000001d:000000a3:0027: the updated Paul row is inserted as slot 1 of page (1:147)

As you can see, this is as far as you can get from doing an in-place update. Page (1:144) is left with a single ghost-record on (the Paul record - the Erin record isn't ghosted because it was moved because of a split). The next access to the page will cause ghost cleanup to remove it and deallocate the page.

So no, in-place updates of key values do not happen, even though they appear to at first glance. You can experiment to show the same thing happening in nonclustered indexes too.

Another misconception busted!

PS Some of you may remember when such in-place updates really *did* happen. That was before 2005 when the Storage Engine (specifically my old Access Methods team) did the index maintenance (changing clustered and nonclustered index structures based on key value changes). A whole bunch of features in 2005 meant that we switched it over to the Query Processor to drive the index maintenance, and nifty short-cuts like we used to do aren't possible any more.

Yesterday I presented my lecture Index Fragmentation: The Hidden Menace for the PASS Performance Virtual Chapter. The recording of the session, plus a PDF of my slides, and the demo scripts are now available from the PASS website.

Check it out here (look for January 18, 2011).

Enjoy!

Here's yet another reason to be very careful when using the missing index DMVs...

There's a bug in the missing index DMVs that could end up causing you to knock your head against a brick wall and question your sanity. I know I did.

The bug is this: the missing index code may recommend a nonclustered index that already exists. Over and over again. It might also recommend an index that won't actually help a query.

Yes, I'm surprised by this too - as the missing index code is in the query optimizer too. However, it will continue to recommend you create the already-existing index - which is terribly annoying.

This is a little-known bug (Connect item #416197) which is fixed in SQL11 but won't be fixed in earlier versions.

I experienced this on SQL Server 2008 SP1 this weekend and I wanted to blog about it so you don't spend ages trying to work out what's going on.

Here's a repro for you:

CREATE TABLE t1 (
    c1 INT IDENTITY,
    c2 AS c1 * 2,
    c3 AS c1 + c1,
    c4 CHAR (3000) DEFAULT 'a');
GO
CREATE UNIQUE CLUSTERED INDEX t1_clus ON t1 (c1);
GO

SET NOCOUNT ON;
GO
INSERT INTO t1 DEFAULT VALUES;
GO 100000

This creates a table with a a bunch of rows, with each row pretty large so that the cost of scanning the table is expensive.

Now say I want to run a query:

SELECT COUNT (*) FROM t1
    WHERE c2 BETWEEN 10 AND 1000
    AND c3 > 1000;
GO
 

If I display the estimated execution plan...

 

...it will tell me there's a missing index I should create:

 

So I go ahead and create the index and everything's cool:

CREATE NONCLUSTERED INDEX [_missing_c2_c3] ON [dbo].[t1] ([c2],[c3]);
GO

Now what if I want to do something more complicted? How about a cursor over the table? (Don't start on about not using cursors - they're everywhere in application code we see - this is just an easy example to engineer.)

DECLARE testcursor CURSOR FOR
    SELECT c1 FROM t1
    WHERE
        c2 BETWEEN 10 AND 1000
        AND c3 > 1000;

DECLARE @var BIGINT;

OPEN testcursor;

FETCH NEXT FROM testcursor INTO @var;

WHILE (@@fetch_status <> -1)
BEGIN
    -- empty body
    FETCH NEXT FROM testcursor INTO @var;
END

CLOSE testcursor;
DEALLOCATE testcursor;

If I display the estimated execution plan again, it shows:

 

Hmm. That index is actually exactly the same as the one we created earlier (even though it's asking for c1 to be INCLUDEd, it already is in the existing nonclustered index as c1 is the cluster key and is included automatically). However, just to prove I'm not doing anything dodgy, I'll create the index it wants:

CREATE NONCLUSTERED INDEX [_missing_c2_c3_inc_c1] ON [dbo].[t1] ([c2],[c3]) INCLUDE ([c1]);
GO

And nothing changes. You cannot get the missing index code to stop recommending the index. The index isn't being used for the *Key Lookup* in the query plan above  - but the missing index code thinks the index would be useful and suggests it. Not only would that index not actually help the Key Lookup, it already exists!

If you use a query that aggregates the missing index DMV output (such as Bart Duncan's excellent script) and you have some very common queries on your system that are hitting this bug, you will find that the missing index DMV aggregation will be broken too.

Be careful out there!

(Look in the Misconceptions blog category for the rest of the month's posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together: CommonSQLServerMyths.pdf (732.96 kb))

Another quickie but goodie before the finale tomorrow!

Myth #29: fix heap fragmentation by creating and dropping a clustered index.

Nooooooooooooo!!!

This is just about one of the worst things you could do outside of shrinking a database.

If you run sys.dm_db_index_physical_stats (or my old DBCC SHOWCONTIG) on a heap (a table without a clustered index) and it shows some fragmentation, don't EVER create and drop a clustered index to build a nice, contiguous heap. Do yourself a favor and just create the well-chosen clustered index and leave it there - there's a ton of info out there on choosing a good clustering key - narrow+static+unique+ever-increasing is what you need. Kimberly has a blog post from 2005(!) that sums things up: Ever-increasing clustering key - the Clustered Index Debate..........again! and I've got An example of a nasty cluster key.

Yes, you can use ALTER TABLE ... REBUILD in SQL Server 2008 to remove heap fragmentation, but that is almost as bad as creating and dropping a clustered index!

Why am I having a conniption fit about this? Well, every record in a nonclustered index has to link back to the matching row in the table (either a heap or clustered index - you can't have both - see Kimberly's recent SQL Server Magazine blog post What Happens if I Drop a Clustered Index? for an explanation). The link takes the form of:

  • if the table is a heap, the actual physical location of the table record (data file:page number:record number)
  • if the table has a clustered index, the clustering key(s)

The blog post link at the bottom of this post explains in a lot more detail.

If you create a clustered index, all the linkages to the heap records are no longer valid and so all the nonclustered indexes must be rebuilt automatically to pick up the new clustering key links. If you drop the clustered index again, all the clustering key links are now invalid so all the nonclustered indexes must be rebuilt automatically to pick up the new heap physical location links.

In other words, if you create and then drop a clustered index, all the nonclustered indexes are rebuilt twice. Nasty.

If you think you can use ALTER TABLE ... REBUILD in SQL Server 2008 to fix heap fragmentation, you can, but it causes all the nonclustered indexes to be rebuilt as the heap record locations obviously change.

Now, what about if you *rebuild* a clustered index? Well, that depends on what version you're on and whether you're doing a simple rebuild or changing the definition of the index. One major point of misconception is that moving a clustered index or partitioning it changes the cluster keys - it doesn't. For a full list of when the nonclustered indexes need to be rebuilt, see Indexes From Every Angle: What happens to non-clustered indexes when the table structure is changed?

(Look in the Misconceptions blog category for the rest of the month's posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together: CommonSQLServerMyths.pdf (732.96 kb))

I'm all mythed-out from yesterday, so today's is a quick post addressing some myths around fill factor - which I made sure to stomp on back in SQL Server 2005 in Books Online.

Tomorrow I have a very cool one coming up...

Myth #25: various myths around fill factor.

All are FALSE

25a) fill factor is adhered to at all times

No. From Books Online:

Important: 
The fill-factor setting applies only when the index is created, or rebuilt. The SQL Server Database Engine does not dynamically keep the specified percentage of empty space in the pages. Trying to maintain the extra space on the data pages would defeat the purpose of fill factor because the Database Engine would have to perform page splits to maintain the percentage of free space specified by the fill factor on each page as data is entered.

25b) fill factor of 0 is different from fill factor of 100

No: From Books Online:

Note: 
Fill-factor values 0 and 100 are the same in all respects.

25c) fill factor of 0 leaves some space in the upper levels of the index

No. This one isn't in Books Online and I don't know where this myth came from, but it's completely untrue. You can easily convince yourself of this using a script like the one below:

CREATE DATABASE foo;
GO
USE foo;
GO
CREATE TABLE t1 (c1 INT IDENTITY, c2 CHAR (1000) DEFAULT 'a');
CREATE CLUSTERED INDEX t1c1 ON t1 (c1);
GO
SET NOCOUNT ON;
GO
INSERT INTO t1 DEFAULT VALUES;
GO 10000

Now check the fill factor is 0 and perform an index rebuild.

SELECT [fill_factor] FROM sys.indexes
WHERE NAME = 't1c1' AND [object_id] = OBJECT_ID ('t1');
GO
ALTER INDEX t1c1 ON t1 REBUILD WITH (FILLFACTOR = 100);
GO
 

Then figure out the index pages above the leaf level and look at the m_freeCnt value in the page header, the amount of free space on the page:

EXEC sp_allocationMetadata 't1';
GO
DBCC TRACEON (3604);
DBCC PAGE (foo, 1, 164, 3);    -- the root page, from the SP output
GO
DBCC PAGE (foo, 1, 162, 1);    -- the page ID in the DBCC PAGE output above
GO

I see a value of 10 bytes - clearly no space was left on the page. It's a myth. Btw - you can get the sp_allocationMetadata script from this blog post.

(Look in the Misconceptions blog category for the rest of the month's posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together: CommonSQLServerMyths.pdf (732.96 kb))

Another short one today as I'm still teaching a class. (Ok - I'm actually writing this at lunchtime on 4/7/10 - and it turned out to be a little longer than I thought...)

Myth #8: Online index operations do not acquire locks.

FALSE

Online index operations are not all unicorns and rainbows (for information about unicorns and rainbows - see http://whiteboardunicorns.com/ - safe for work).

Online index operations acquire short-term locks at the beginning and end of the operation, which can cause significant blocking problems.

At the start of the online index operation, a shared (S lock mode) table lock is required. This lock is held while the new, empty index is created; the versioned scan of the old index is started; and the minor version number of the table schema is bumped by 1.

The problem is, this S lock is queued along with all other locks on the table. No locks that are incompatible with an S table lock can be granted while the S lock is queued or granted. This means that updates are blocked until the lock has been granted and the operation started. Similarly, the S lock cannot be granted until all currently executing updates have completed, and their IX or X table locks dropped.

Once all the setup has been done (this is very quick), the lock is dropped, but you can see how blocking of updates can occur. Bumping the minor version number of the schema causes all query plans that update the table to recompile, so they pick up the new query plan operators to maintain the in-build index.

While the long-running portion of the index operation is running, no locks are held (where 'long' is the length of time your index rebuild usually takes).

When then index operation has completed, the new and old indexes are in lock-step as far as updates are concerned. A schema-modification lock (SCH_M lock mode) is required to complete the operation. You can think of this as a super-table-X lock - it's required to bump the major version number of the table - no operations can be running on the table, and no plans can be compiling while the lock is held.

There's a blocking problem here that's similar to when the S lock was acquired at the start of the operation - but this time, no read or write operations can start while the schema-mod lock is queued or granted, and it can't be granted until all currently running read and write activity on the table has finished.

Once the lock is held, the allocation structures for the old index are unhooked and put onto the deferred-drop queue, the allocation structure for the new index are hooked into the metadata for the old index (so the index ID doesn't change), the table's major version number is bumped, and hey presto! You've got a sparkly new index.

As you can see - plenty of potential for blocking at the start and end of the online index operation. So it should really be called 'almost online index operations', but that's not such a good marketing term...

You can read more about online index operations in the whitepaper Online Indexing Operations in SQL Server 2005.

Way back in the mists of time, at the end of the last century, I wrote DBCC SHOWCONTIG for SQL Server 2000, to complement my new invention DBCC INDEXDEFRAG.

I also used to wear shorts all the time, with luminous orange, yellow, or green socks.

Many things change - I now have (some) dress sense, for one. One other thing that changed was that DMVs came onto the scene with SQL Server 2005. DBCC SHOWCONTIG was replaced by sys.dm_db_index_physical_stats. Under the covers though, they both use the same code - and the I/O characteristics haven't changed.

This is a blog post I've been meaning to do for a while now, and I finally had the impetus to do it when I heard about today's T-SQL Tuesday on I/O in general being run by Mike Walsh (Twitter|blog). It's a neat idea so I decided to join in this time. In retrospect, reading this over before hitting 'publish', I got a bit carried away (spending two hours on this) - but it's one of my babies, so I'm entitled to! :-)

This isn't a post about how to use DMVs in general, how to use this DMV in particular, or anything about index fragmentation. This is a blog post about how the DMV works.

DMV is a catch-all phrase that most people (myself included) use to describe all the various utility views in SQL Server 2005 and 2008. DMV = Dynamic Management View. There's a catch with the catch-all though - some of the DMVs aren't views at all, they're functions. A pure DMV gets info from SQL Server's memory (or system tables) and displays it in some form. A DMF, on the other hand, has to go and so some work before it can give you some results. The sys.dm_db_index_physical_stats DMV (which I'm going to call 'the DMV' from now on) is by far the most expensive of these - but only in terms of I/O.

The idea of the DMV is to display physical attributes of indexes (and the special case of a heap) - to do this it has to scan the pages comprising the index, calculating statistics as it goes. Many DMVs support what's called predicate pushdown, which means if you specify a WHERE clause, the DMV takes that into account as it prepares the information. This DMV doesn't. If you ask it for only the indexes in the database that have logical fragmentation > 30%, it will scan all the indexes, and then just tell you about those meeting your criteria. It has to do this because it has no way of knowing which ones meet your criteria until it analyzes them - so can't support predicate pushdown.

This is where understanding what it's doing under the covers comes in - the meat of this post.

LIMITED

The default operating mode of the DMV is called LIMITED. Kimberly always makes fun of the equivalent option for DBCC SHOWCONTIG, which I named as a young and foolish developer - calling it WITH FAST. Hey - it's descriptive!

The LIMITED mode can only return the logical fragmentation of the leaf level plus the page count. It doesn't actually read the leaf level. It makes use of the fact that the next level up in the index contains a key-ordered list of page IDs of the pages at the leaf level - so it's trivial to examine the key-ordered list and see if the page IDs are also in allocation order or not, thus calculating logical fragmentation.

The idea behind this option is to allow you to find the fragmentation of an index by reading the minimum number of pages, i.e. in the smallest amount of time. This option can be magnitudes faster than using the DETAILED mode scan, and it depends on how big the index's fanout is. Without getting too much into the guts of indexes, the fanout is based on the index key size, and determines the number of child-page pointers an index page can hold (e.g. the number of leaf-level pages that a page in the next level up has information about).

Consider an index with a char(800) key. Each entry in a page in the level above the leaf has to include a key value (the lowest key that can possibly appear on the page being referred to), plus a page ID, plus record overhead, plus slot array entry - so 812 bytes. So a page can only hold 8096/812 = 9 such entries. The fanout is at most 9.

Consider an index with a bigint key. Each entry is 13 bytes, so a page can hold 8096/13 = 622 entries. The fanout is at most 622, but will likely be smaller, depending on operations on the index causing fragmentation at the non-leaf levels.

For a table with 1 million pages at the leaf level, the first index will have 1 million/9 = 111112 pages at least at the level above the leaf. The second index will have at least 1608 pages. The savings in I/O from using the LIMITED mode scan will clearly differ based on the fanout.

I've created a 100GB clustered index (on the same hardware as I'm using for the benchmarking series) with 13421760 leaf-level pages and a maximum fanout of 540. In reality, I populated the index using 16 concurrent threads, so there's some fragmentation. The level above the leaf has 63012 pages, an effective fanout of 213. Still, the LIMITED mode scan will read 213x less than a DETAILED scan, but will it be 213x faster?

Here's a perfmon capture of the LIMITED mode scan on my index:

 

There's nothing special going on under the covers in a LIMITED mode scan - the chain of pages at the level above the leaf is read in page-linkage order, with no readahead. The perfmon capture shows:

  • Avg. Disk Read Queue Length (light blue) is a steady 1.
  • Avg. disk sec/Read (pink) is a steady 4ms.
  • Disk Read Bytes/sec (green) is roughly 14.5million.
  • Page reads/sec (dark blue) is roughly 1800.

DETAILED 

The DETAILED mode does two things:

  • Calculate fragmentation by doing a LIMITED mode scan
  • Calculate all other statistics by reading all pages at every level of the index

And so it's obviously the slowest. It has to do the LIMITED mode scan first to be able to calculate the logical fragmentation, because it reads the leaf level pages in the fastest possible way - in allocation order. DBCC has a customized read-ahead mechanism for allocation order scans that it uses for this DMV and for DBCC CHECK* commands. It's *incredibly* aggressive and will hit the disks as hard as it possibly can, especially with DBCC CHECK* running in parallel.

Here's a perfmon capture of the DETAILED mode scan on my index:

 

Not quite as pretty as the LIMITED mode scan, but I like it :-) Here's what it's showing:

  • Avg. Disk Read Queue Length (black) is in the multiple hundreds. Clearly it's appetite for data is outstripping what my RAID array can do. It basically tries to saturate the I/O subsystem to get as much data as possible flowing into SQL Server.
  • Avg. disk sec/Read (pink line at the bottom) is actually measuring in whole seconds, rather than ms. Given the disk queue length, I'd expect that.
  • DBCC Logical Scan Bytes/sec (red) varies substantially as the readahead mechanism throttles up and down, but it's driving anywhere up to 80MB/sec. You can see around 9:49:20 AM when it drops to zero for a few seconds.
  • Readahead pages/sec (green) is tracking the DBCC scan. This is a buffer pool counter, the DBCC one is an Access Methods counter (the dev team I used to run during 2005 development). If I had Disk Read Bytes/sec and Pages reads/sec showing, they'd track the other two perfectly - I turned them off for clarity.

So the DETAILED mode not only reads more data, but it does it a heck of a lot more aggressively so has a much more detrimental effect on the overall I/O capabilities of the system while it's running.

SAMPLED

There is a third mode that was introduced just for the DMV. The idea is that if you have a very large table and you want an idea of some of the leaf level statistics, but you don't want to take the perf hit of running a DETAILED scan, you can use this mode. It does:

  • LIMITED mode scan
  • If the number of leaf level pages is < 10000, read all the pages, otherwise read every 100th pages (i.e. a 1% sample)

Summary

There's no progress reporting from the DMV (or DBCC SHOWCONTIG) but if you look at the reads column in sys.dm_exec_sessions you can see how far through the operation it is. This method works best for DETAILED scans, where can compare that number against the in_row_data_page_count for the index in sys.dm_db_partition_stats (yes, you'll need to mess around a bit if the index is actually partitioned).

In terms of timing, I ran all three scan modes to completion. The results:

  • LIMITED mode: 282 seconds
  • SAMPLED mode: 414 seconds
  • DETAILED mode: 3700 seconds

Although the LIMITED mode scan read roughly 200x less than the DETAILED scan, it was only 13 times faster, because the readahead mechanism for the DETAILED scan is way more efficient than the (necessary) follow-the-page-linkages scan of the LIMITED mode.

Just for kicks, I ran a SELECT COUNT(*) on the index to see how the regular Access Methods readahead mechanism would fare - it completed in 3870 seconds - 5% slower, and it had less processing to do than the DMV. Clearly DBCC rules! :-)

Although the DETAILED mode gives the most comprehensive output, it has to do the most work. For very large indexes, this could mean that your buffer pool is thrashed by the lazy writer making space available for the DMV to read and process the pages (it won't flush out the buffer pool though, as the pages read in for the DMV are the first ones the lazywriter will kick out again). One of the reasons I advise people to only run the DMV on indexes they know they're interested in - and better yet, run it on a restored backup of the database.

Hope this is helpful!

PS Oh, also beware of using the SSMS fragmentation wizard. It uses a SAMPLED mode scan, but I found it impossible to cancel!

I made them up. Yup.

I'm talking about the guidance which is:

  • if an index has less than 1000 pages and is in memory, don't bother removing fragmentation
  • if the index has:
    • less than 5% logical fragmentation, don't do anything
    • between 5% and 30% logical fragmentation, reorganize it (using DBCC INDEXDEFRAG or ALTER INDEX ... REORGANIZE)
    • more than 30% logical fragmentation, rebuild it (using DBCC DBREINDEX or ALTER INDEX ... REBUILD)

These numbers are made up. They can and will vary for you, but they're a good starting point to work from.

There's been some discussion since PASS, when I confessed publicly on Twitter (during Grant Fritchey's session) to making them up, about whether I really said that etc etc. Yes - I really did make them up.

Back in 1999/2000 when I wrote DBCC INDEXDEFRAG and DBCC SHOWCONTIG for SQL Server 2000, customers wanted *some* guidance on what the thresholds should be where they should care about fragmentation or not, and how to remove it. We had to put *something* into Books Online (my favorite "it depends!" wouldn't have been too helpful), so I talked to some customers, inside and outside Microsoft, and chose these numbers as most appropriate at the time.

They're not set in stone - they're a big generalization, and there are a ton of other factors that may affect your choice of threshold and fragmentation removal method (e.g. recovery model, high-availability technologies in use, log backup schedule, query workload, disk space, buffer pool memory, and so on). I wish Microsoft would update the old whitepaper on fragmentation - they keep promising me they'll get around to it.

In the meantime, take those numbers with a pinch of salt and don't treat them as absolute.

The August edition of TechNet Magazine is available on the web now and has the latest installment of my regular SQL Q&A column.

This month's topics are:

  • Proactive page checksum failure detection
  • Why GUIDs make for bad clustered index keys 
  • Possible problems using a log shipping secondary for reporting 
  • Recovery model choice for log size management

Check it out at http://technet.microsoft.com/en-us/magazine/dd776512.aspx.

I'm teaching a class this week on database maintenance, for DBAs inside Microsoft. One of the things we're discussing today is index fragmentation and how poor cluster key choice can lead to page splits, poor performance, index fragmentation, and so on - not just in the clustered index, but also in nonclustered indexes.

One of the students looked in a database underpinning an application and found a unique cluster key, which is the worst I've ever seen (although not the worst that Kimberly's ever seen apparently - the mind boggles!).

The cluster key is defined as a combination of the following column types:

  • 16-byte GUID
  • varbinary (16)
  • nvarchar (512)
  • nvarchar (256)
  • tinyint

Now, the wide cluster key isn't a big deal UNLESS there are nonclustered indexes, but there are in this case - so the cluster key is included in all nonclustered index rows. And the random GUID high-order key is always a bad idea, as it means the clustered index will be heavily fragmented as records are inserted. This is all simplified and generalizations (and I open this can of worms happily) - but you get the idea.

Good design up-front, with an understanding of how key choice affects the behavior of SQL Server and how indexes are stored and indexed, can lead to vastly reduced performance problems and maintenance issues.

Quickie this morning to start the day off. I saw a question on a forum: if I *have* to use a GUID and *must* have a primary key, should I make the primary key clustered or nonclustered?

Now, I'm not getting into the whole GUID vs. bigint identifier, or random GUID vs. GUID generated by NEWSEQUENTIALID(), so please don't comment on those issues, they're not relevant here. I just want to address the question - what kind of index should it be?

From a Storage Engine perspective, my answer is nonclustered. Here are three reasons why:

  • If the index is clustered, then the cluster key is immediately at least 16 bytes (the size of a GUID). This doesn't change the size of the clustered index records (as the GUID column has to be stored in the table anyway, and a clustered index IS the table), but it does change the size of the nonclustered indexes. All nonclustered indexes on the table must include the cluster keys, even of they are not explicitly part of the nonclustered index keys (I'll do a post on this later). This means the GUID is present in every nonclustered index record too. From this perspective, it would be better to use a smaller clustered index key and have the GUID primary ley be nonclustered so it's only present in that one nonclustered index.
  • Random GUIDs used as the high-order key cause index fragmentation. Their random nature means the insertion point into the index is also random. This causes page splits, which cause fragmentation and are *expensive*. (I touched on this a bit a few days ago in my post How expensive are page splits in terms of transaction log?.). With a random key value, it's hard to avoid page splits and fragmentation, although you can delay them somewhat using FILLFACTOR, but at the expense of using extra space. By making the GUID index nonclustered, you can delay page splits even further. The clustered index is the table, so the records are (usually always) larger than nonclustered index records. This means you can get fewer clustered index records on an 8KB page than nonclustered index records. With fewer records per page, you can do fewer random insertions on the page before a page split occurs. So using a nonclustered index for the GUID key means you can do fewer expensive page splits.
  • Given that whatever kind of index you create for the GUID key is going to experience index fragmentation, you're probably going to want to periodically remove the fragmentation as part of your database maintenance plan. It makes sense to try to limit the amount of resources used by the fragmentation removal operation (e.g. cpu, IO, disk space, transaction log space), and so the smaller the fragmented index, the better. A nonclustered index for the GUID key will be smaller than a clustered index, so if you choose a non-fragmentation-causing clustered index key, and confine the fragmentation to the nonclustered index, you can use fewer resources during database maintenance.

And there you have it. I'm sure some of you have seen pathological cases that disprove one of the above points, but my arguments are generalizations. Maybe this is a can of worms I've opened, in which case I look forward to the comments!

PS Brent did a great post about humor when blogging, the cartoon links he includes are great. Check it out here.

Page splits are always thought of as expensive, but just how bad are they? In this post I want to create an example to show how much more transaction log is created when a page in an index has to split. I'm going to use the sys.dm_tran_database_transactions DMV to show how much more transaction log is generated when a page has to split. You can find the list of columns and a small amount of explanation of each column in Books Online here - I was reminded of its existence by someone on Twitter (sorry, don't remember who it was and I couldn't find it in search).

In the example, I'm going to create a table with approximately 1000-byte long rows:

CREATE DATABASE PageSplitTest;
GO
USE pagesplittest;
GO

CREATE TABLE BigRows (c1 INT, c2 CHAR (1000));
CREATE CLUSTERED INDEX BigRows_CL ON BigRows (c1);
GO

INSERT INTO BigRows VALUES (1, 'a');
INSERT INTO BigRows VALUES (2, 'a');
INSERT INTO BigRows VALUES (3, 'a');
INSERT INTO BigRows VALUES (4, 'a');
INSERT INTO BigRows VALUES (6, 'a');
INSERT INTO BigRows VALUES (7, 'a');
GO

I've engineered the case where the clustered index data page has space for one more row, and I've left a 'gap' at c1=5. Let's add it as part of an explicit transaction and see how much transaction log is generated:

BEGIN TRAN
INSERT INTO BigRows VALUES (8, 'a');
GO

SELECT [database_transaction_log_bytes_used] FROM sys.dm_tran_database_transactions
WHERE [database_id] = DB_ID ('PageSplitTest');
GO

database_transaction_log_bytes_used
-----------------------------------
1228

That's about what I'd expect for that row. Now what about when I cause a page split by inserting the 'missing' c1=5 row into the full page?

-- commit previous transaction
COMMIT TRAN
GO

BEGIN TRAN
INSERT INTO BigRows VALUES (5, 'a');
GO

SELECT [database_transaction_log_bytes_used] FROM sys.dm_tran_database_transactions
WHERE [database_id] = DB_ID ('PageSplitTest');
GO

database_transaction_log_bytes_used
-----------------------------------
6724

Wow. 5.5x more bytes are written to the transaction log as part of the system transaction that does the split.

The ratio gets worse as the row size gets smaller. For a row with an approximately 100-byte long row (use the same code as above, but change to a CHAR (100), insert 67 rows with a 'gap' somewhere then insert the 68th to cause the split), the two numbers are 328 and 5924 - the split cause 18 times more log to be generated! For a row with an approximately 10-byte long row, I got numbers of 240 and 10436, because I created skewed data (about 256 rows with the key value 8) and then inserted key value 5 which forced a (rare) non-middle page split. That's a ratio of more than 43 times more log generated! You can try this yourself if you want: I changed the code to have a CHAR (10), inserted values 1, 2, 3, 4, 6, 7, then inserted 256 key values of 8 and then 2 of 5. The resulting page had only 6 rows - it split after the key value 5 - the Storage Engine doesn't always do a 50/50 page split. And that's not even causing nasty cascading page-splits, or splits that have to split a page multiple times to fit a new (variable-sized) row in.

Bottom line: page splits don't just cause extra IOs and index fragmentation, they generate a *lot* more transaction log. And all that log has to be (potentially) backed up, log shipped, mirrored....

Jack Li, one of the Senior Escalation Engineers in Product Support, just posted details of an interesting case over on the CSS blog - his article is here. It talks about index builds and rebuilds, but the issue is the same for both, so I'll just talk about rebuilds.

The jist of the problem is that index rebuilds can parallelize, but sometimes they don't parallelize vey well. Each thread gets a certain range of the index to rebuild, using the existing index statistics to divide the ranges equally between the threads. If there's massive data skew, then one thread can end up doing the majority of the work, leading to a long run-time. The case in Jack's post involved a 250 million row index where 150 million rows had the same (NULL) key value. This range has to be processed by a single thread - a single value can't be divided between two+ threads.

Now, this is understandable behavior by the database engine, but it relies on the statistics being up-to-date. That's a bit of a catch-22 - rebuilding an index updates the statistics, but if the statistics aren't up-to-date then the index rebuild might parallelize badly! I guess the solution is that if you know that you have massive data skew in your large indexes, update statistics BEFORE doing an index rebuild. And given what I've been hearing this week at SQL Connections about how badly statistics keep biting people, I'm leaning towards a different recommendation for those people who have lots of perf trouble caused by statistics and the potential for skewed data - rebuild all your statistics regularly, and only rebuild/reorganize fragmented indexes. Statistics just cause so many problems it seems.

Thanks

PS Kimberly has a lot more info about statistics over on her blog - I'm just starting to venture into that mine-field

Last week’s survey was on what kind of regular index maintenance you perform (see here for the survey) as a way of kicking off a new series I’m writing around index maintenance. Here are the results as of 3/21/09 – I find them very encouraging:

As you can see, about 2/5 of respondents are performing some form of analysis-based fragmentation removal (answers 5+6), which I consider the best way to perform index maintenance, if you’re willing to invest the time involved to set it up. It allows the least amount of work to be performed, for the most targeted performance gains – and so is especially appropriate for 24x7 systems where there’s a minimal or non-existent maintenance window.

The next best option is to do all rebuilds or all defrags based on a fragmentation threshold (answers 3+4), which about 1/5 of respondents do. This also allows work to be limited, but by choosing only a single method of removing fragmentation, there are pros and cons. Now, the survey was limited to a number of questions so I couldn’t explore what the threshold is that people are using (e.g. logical fragmentation, page density, extent fragmentation, or something else). Some measures are good to use and some not so good, and I’ll be exploring the various counters and ways of determining fragmentation as the series progresses.

Either of the options to operate on all indexes regardless of fragmentation (options 1+2) can lead to lots of wasted resources (disk space, transaction log space, I/Os, CPU) by operating on indexes that are not fragmented in the first place, or for which fragmentation removal has no benefit for workload performance. About 1/3 of respondents do this. This isn’t surprising to me as rebuild-all-the-indexes-every-night/week is a very common index maintenance plan for “involuntary DBAs” who know that index maintenance is important, but don’t have the knowledge or training to implement a more sophisticated maintenance plan. This growing size of this group of people is one of the main reasons I’m going to write this series.

Doing absolutely nothing for index maintenance, which about 1/10 do, is usually not a good idea, as indexes in a database that’s not read-only commonly become fragmented over time. However, these people may know they don’t suffer from fragmentation issues, or that removing fragmentation has no effect on workload performance. However, my suspicion (based on what I see in the field) is that some of these respondents don’t realize the benefits of performing index maintenance.

You may be surprised to hear that I don’t consider doing nothing to be the worst choice. That dubious honor goes to answer 7 – doing any kind of index maintenance followed by a database shrink operation – as 3 respondents do. A post-maintenance shrink operation may well undo some of the benefits of the maintenance by introducing massive amounts of index fragmentation – see my blog post Auto-shrink - turn it OFF! for details of how bad this can get.

I’m actually pretty encouraged by these results though. Compared to how things were around 1999 when I wrote DBCC INDEXDEFRAG and DBCC SHOWCONTIG for SQL Server 2000, these results show that knowledge in the field (or at least in the group that reads my blog and responded) has vastly improved. My aim for the forthcoming series about index fragmentation and maintenance is to increase knowledge a lot more broadly.

Next up - this week's survey. Thanks for reading!

I'm about to start a new series of post about index fragmentation and removing it. For this week's survey, I'd like to know what index maintenance you do to address fragmentation (in any of its forms) - I'll report on it in a week.

Thanks

The April edition of TechNet Magazine is available on the web now and has the latest installment of my regular SQL Q&A column.

This month's topics are:

  • Disappearing errors with DBCC CHECKDB
  • Provisioning tempdb when moving from 2000 to 2008
  • Does fillfactor prevent fragmentation and should it be set instance-wide
  • Avoiding FILESTREAM performance problems

Check it out at http://technet.microsoft.com/en-us/magazine/2009.04.sqlqa.aspx

Back in 2005 Kimberly produced two very popular webcast series - an 11-part webcast series for TechNet called SQL Server 2005 for the IT Professional and a 10-part webcast series for MSDN called A Primer for Proper SQL Server Development. The webcast links and blog posts were broken for quite a while but now they're all fixed up and working again. I've created some web pages that link to all the webcasts and blog posts, along with abstracts. I've also included some more recent ones too and will be adding to the list over the next few weeks.

There's over 30 hours of good stuff to watch - check them out at http://www.sqlskills.com/webcasts.asp

This blog post describes the demo "2 - NC Indexes" from my Corruption Survival Techniques conference session from various conferences in 2008. The links to the scripts and databases to use are in this blog post.

The aim of this demo is to show that sometimes its just redundant data (i.e. nonclustered indexes) that get corrupted, and so you don't have to do anything that takes the actual data offline - like restoring from a full backup or running one of the repair options (both of which require the database to be in single-user mode).

Let's look at an example. Extract and restore the DemoNCIndex database, and the NCIndexCorruption.sql script. What do we get from running DBCC CHECKDB on the DemoNCIndex database (lines 39-42)?

DBCC CHECKDB (DemoNCIndex) WITH NO_INFOMSGS, ALL_ERRORMSGS;
GO

Msg 8951, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Data row does not have a matching index row in the index 'CustomerName' (ID 2). Possible missing or invalid keys for the index row matching:
Msg 8955, Level 16, State 1, Line 1
Data row (1:45:28) identified by (CustomerID = 29) with index values 'LastName = 'Adams' and CustomerID = 29'.
Msg 8951, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Data row does not have a matching index row in the index 'CustomerName' (ID 2). Possible missing or invalid keys for the index row matching:
Msg 8955, Level 16, State 1, Line 1
Data row (1:180:164) identified by (CustomerID = 2118) with index values 'LastName = 'Adams' and CustomerID = 2118'.

<snip - removed for brevity>

Msg 8952, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Index row in index 'CustomerName' (ID 2) does not match any data row. Possible extra or invalid keys for:
Msg 8956, Level 16, State 1, Line 1
Index row (1:24482:16) with values (LastName = 'Andersen' and CustomerID = 18718) pointing to the data row identified by (CustomerID = 18718).
Msg 8952, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Index row in index 'CustomerName' (ID 2) does not match any data row. Possible extra or invalid keys for:
Msg 8956, Level 16, State 1, Line 1
Index row (1:24482:127) with values (LastName = 'Arthur' and CustomerID = 9758) pointing to the data row identified by (CustomerID = 9758).
CHECKDB found 0 allocation errors and 26 consistency errors in table 'Customers' (object ID 453576654).
CHECKDB found 0 allocation errors and 26 consistency errors in database 'DemoNCIndex'
repair_rebuild is the minimum repair level for the errors found by DBCC CHECKDB (DemoNCIndex).
 

Lots of errors. Now, in this example there are only 26 errors, but in cases where there are hundreds of errors it can be really hard to tell whether all the corruptions are in nonclustered indexes (i.e. indexes with IDs > 1). Luckily, there's an undocumented option to all the DBCC CHECK* commands - WITH TABLERESULTS. The option is undocumented because the output can change from release to release, but basically this converts the DBCC output into tabular form. Try running lines 48-50 in the script and you'll see what I mean. One of the columns in the output is IndexId - so you can easily see whether all the errors are in nonclustered indexes. In this case, they are, and all in one index of the Customers table.

You could run lines 55-57 of the script to prove to yourself that repairs can't be run online, and then realize that we can address the problem without having to run repair or restore. First off we need to figure out the name of the index to fix - index ID 2 of the Customers table. Lines 77-80 run sp_HelpIndex on the table (although I should really be using Kimberly's sp_HelpIndex2):

USE DemoNCIndex
GO
EXEC sp_HelpIndex 'Customers';
GO

index_name     index_description                                   index_keys
-------------- --------------------------------------------------- ------------
CustomerName   nonclustered located on PRIMARY                     LastName
CustomerPK     clustered, unique, primary key located on PRIMARY   CustomerID

The nonclustered index is called CustomerName. Plug in the index name to line 82, then try fixing the index by doing an online index rebuild, and run DBCC CHECKDB afterwards (lines 82-89). The corruption hasn't been fixed! Online index rebuild reads the old index to build the new one so the new index has the same missing rows as the old one. We need to do an offline index rebuild - with lines 110-115. After the last DBCC CHECKDB, the index is fixed up. Now, on SQL Server 2008, you may or may not get a query plan for the index rebuild that doesn't use the old index, because the query optimizer has some more plan choices available to it - so on SQL Server 2008 you may need to do an actual drop and create of the broken index (carefully, if its enforcing a constraint).

So - just because DBCC CHECKDB reports a ton of errors, that doesn't necessarily mean that the database needs to be taken (essentially) offline to repair it - check through the errors to see if it's just nonclustered indexes that are affected.

Here's another corruption bug that people are hitting on 2005 SP2 - something I didn't know about until today.

The situation is this: a table with a non-unique clustered index (i.e. so a hidden uniquifier column is created), and then rebuilt using DBCC DBREINDEX. Sometimes an incorrect query plan is chosen so the rows in the index get sorted incorrectly. A subsequent DBCC CHECKDB (or DBCC CHECKTABLE) will return errors like:

Msg 2511, Level 16, State 1, Line 1
Table error: Object ID 2073058421, index ID 1, partition ID 72057594044678144, alloc unit ID 72057594048348160 (type In-row data). Keys out of order on page (1:16042), slots 47 and 48.

The workaround is to use the new ALTER INDEX command in 2005 to do the rebuild (remember that DBCC DBREINDEX is deprecated). There's also a hotfix available in CU9 (and the upcoming SP3 I guess). Checkout KB 954734 here for details.

Hopefully you won't hit this, but if you do, knowing this may save you a lot of head-scratching.

PS Last post from our 22-day road-trip. Tonight's a date-night in Las Vegas (dinner at Le Cirque, then tickets to Cirque du Soleil's "O") before heading back to our (vacation!) home in Redmond and PASS next week.

It's been a long time since the last Conference Questions Pot-Pourri - in fact it was at the last SQL Connections in Orlando in April. Now we're in Las Vegas doing SQL Connections Fall - Kimberly's lecturing for an hour on partitioning so I can get out a quick post.

This is a question that came up yesterday - can CREATE INDEX ... WITH DROP_EXISTING be used to move indexes that enforce constraints? Let's check it out.

First up I'm going to create a couple of tables. Table t1 has a unique constraint backed by a nonclustered index. Table t1 has a primary key constraint backed by a clustered index.

CREATE DATABASE ConstraintTest;
GO
USE ConstraintTest;
GO

CREATE TABLE UniqueConstraint (c1 INT UNIQUE NONCLUSTERED);
GO
INSERT INTO UniqueConstraint VALUES (1);
GO

CREATE TABLE PrimaryKeyConstraint (c2 INT PRIMARY KEY CLUSTERED);
GO
INSERT INTO PrimaryKeyConstraint VALUES (1);
GO

(Oops - in the original post I c&p'd the wrong code and had the second table as a unique nonclustered constraint too - sorry for the mixup) 

Now I'll add another filegroup that we'll try to move the indexes into.

ALTER DATABASE ConstraintTest ADD FILEGROUP ExtraFilegroup;
GO

ALTER DATABASE ConstraintTest ADD FILE (
   NAME = ExtraFile1,
   FILENAME = 'C:\SQLskills\ExtraFile1.ndf',
   SIZE = 5MB)
TO FILEGROUP ExtraFilegroup;
GO

Now I'll try moving the nonclustered index enforcing the unique constraint.

SELECT [name], [index_id] FROM sys.indexes
WHERE [object_id] = OBJECT_ID ('UniqueConstraint')
GO

name                           index_id
------------------------------ --------
NULL                           0
UQ__UniqueConstraint__7C8480AE 2

The index we want is index ID=2, so we should be able to move it as follows:

CREATE UNIQUE NONCLUSTERED INDEX UQ__UniqueConstraint__7C8480AE
ON UniqueConstraint (c1) WITH (DROP_EXISTING = ON)
ON ExtraFilegroup;
GO

No problem! Now let's try the same thing for the clustered primary key constraint.

SELECT [name], [index_id] FROM sys.indexes
WHERE [object_id] = OBJECT_ID ('PrimaryKeyConstraint')
GO

name                           index_id
------------------------------ --------
PK__PrimaryKeyConstr__7E6CC920 1

We only have one choice, so we should be able to rebuild it using:

CREATE CLUSTERED INDEX PK__PrimaryKeyConstr__7E6CC920
ON PrimaryKeyConstraint (c2) WITH (DROP_EXISTING = ON)
ON ExtraFilegroup;
GO

Msg 1907, Level 16, State 1, Line 1
Cannot recreate index 'PK__PrimaryKeyConstr__7E6CC920'. The new index definition does not match the constraint being enforced by the existing index.

Hmm - what am I missing? I'll check sys.indexes again and not filter the columns this time:

SELECT * FROM sys.indexes
WHERE [object_id] = OBJECT_ID ('PrimaryKeyConstraint')
GO

object_id  name                           index_id type type_desc is_unique
---------- ------------------------------ -------- ---- --------- ---------
2105058535 PK__PrimaryKeyConstr__7E6CC920 1        1    CLUSTERED 1

data_space_id ignore_dup_key is_primary_key is_unique_constraint fill_factor
------------- -------------- -------------- -------------------- -----------
1             0              1              0                    0

is_padded is_disabled is_hypothetical allow_row_locks allow_page_locks
--------- ----------- --------------- --------------- ----------------
0
         0           0               1               1

Ah - I was missing the unique attribute on the index. What I need to do is the following:

CREATE UNIQUE CLUSTERED INDEX PK__PrimaryKeyConstr__7E6CC920
ON PrimaryKeyConstraint (c2) WITH (DROP_EXISTING = ON)
ON ExtraFilegroup;
GO

And that works fine. So - the answer is yes, you can use CREATE INDEX ... WITH DROP_EXISTING to move indexes that enforce constraints. This is really good, as it allows you to move these indexes without having to do DROP INDEX then CREATE INDEX. That method is bad, as it creates a window of opportunity for someone to enter data that violates the constraint while the index is dropped.

For the first time in ages, Kimberly sneaked in an interview without me! I was teaching a class on Database Maintenance for some Microsoft DBAs and Kimberly recorded a RunAs Radio interview on her favorite subject - indexes. I love listening to her talk about indexes - or maybe I just like the sound of her voice :-)

Check it out at http://www.runasradio.com/default.aspx?showNum=76.

Enjoy!

PS She also drops some hints about an upcoming project of ours...

At the last Connections conference in April, the conference organizers tapped a bunch of speakers for interviews, articles, and other content to put into a free "newsletter" called MyDevConnections, and now it's finally available. It covers all the Connections conferences, so isn't just limited to SQL Server. As far as SQL is concerned, Kimberly and I wrote an article about our favorite SQL Server 2008 features (data compression and filtered indexes, respectively), and Ross Mistry wrote an article on hardening a SQL Server 2005 installation. There's also an extract from a Women in Technology interview that Kimberly took part in. Overall there's 84 pages of content in the PDF, with some adverts for the Connections shows (obviously).

Check it out at http://www.devconnections.com/mydevconnections/S08_DevOnlineMag_Web.pdf

Every so often I'll see posts on the various data corruption forums discussing causes of corruption. In this post I want to debunk some of the myths around what can cause corruption. There are really two types of corruption to deal with, physical corruption and logical corruption.

Physical corruption

This is where something has altered the contents of a data or log file sector with no regard for what is being stored there. Possible causes of physical corruption are:

  • Problem with the I/O subsystem (99.8% of all cases I've ever seen - only 3 nines as I'd estimate I've seen around about a thousand corruption cases). Remember the I/O subsystem is everything underneath SQL Server in the I/O stack - including the OS, 3rd-party file system filter drivers, device drivers, RAID controllers, SAN controllers, network hardware, drives themselves, and so on. Millions of lines of code and lots of moving parts spinning very fast, very close to very fragile pieces of metal oxide (I once heard Jim Gray liken a disk drive head to a 747 jumbo jet flying at 500 mph at a height of 1/4 inch from the ground...)
  • Problem with the host machine hardware (0.1% of cases). Most of the time this is a memory error.
  • SQL Server bugs (0.1% of cases). Yes, there have been corruption bugs. Every piece of software has bugs. There are KB articles describing bugs.
  • Deliberate introduction of corruption using a hex editor or other means.

Physical corruption is what DBCC CHECKDB usually reports and the majority of cases are caused by a physical failures of some kind, with the minority caused by humans - software bugs.

Logical corruption

This is where something has altered some data so that a data relationship is broken. Possible causes of logical corruption are:

  • Humans

:-) Okay...

  • Application bug. The application deletes one part of an inherent data relationship but not the other. Or the application designer doesn't implement a constraint properly. Or the application designer doesn't cope with a transaction roll-back properly. You get the idea.
  • Accidental update/delete. Someone deletes or updates some data incorrectly.
  • SQL Server bug. See above.
  • DBCC CHECKDB when using the REPAIR_ALLOW_DATA_LOSS option. As is documented in Books Online, and I've blogged about and mentioned when lecturing, if you run repair, it doesn't take into account any inherent or explicit constraints on the data.

The point here is that a physical failure of a component does not cause logical corruption, it causes physical corruption. Conversely, application errors cause logical corruption, not physical corruption. DBCC CHECKDB errors are about physical corruption (okay, with the inclusion of DBCC CHECKCATALOG code in 2005, it will find cases where the DBA has manually altered the system tables, causing logical corruption) and applications cannot cause physical corruption as they can only manipulate data through SQL Server. If an application hits a SQL Server bug which causes physical corruption, that's still not the application causing physical corruption, it's SQL Server.

So - on to the myths.

  • Can an application cause physical corruption? No.
  • Can stopping a shrink operation cause corruption of any kind? No.
  • Can stopping an index rebuild cause corruption of any kind? No.
  • Can running DBCC CHECKDB without repair cause corruption of any kind? No.
  • Can creating a database snapshot cause corruption of any kind? No.

Hope this helps.

At the user group meeting on Monday I spent some time explaining how GUIDs can cause fragmentation in clustered indexes AND in non-clustered indexes, even if the GUID isn't specifically included in the non-clustered index key. GUIDs are essentially random values (pseudo-random in ranges if generated using NEWSEQUENTIALID) that are also unique. Their uniqueness is what makes them attractive to many developers as a key value, without understanding the havoc they can cause in production in terms of fragmentation and poor query performance.

A GUID key causes fragmentation because of its randomness. The insertion point of a new record in an index is dictated by the value of the index key, so if the key value is random, so is the insertion point. This means that if an index page is full, a random insert that happens to have to go onto that page will cause a page split to make room for the new record. A page-split is where a new page is allocated and (as near as possible to) half the rows from the splitting page are moved to the new page. The new row is then inserted into one of the two pages, determined by the key value. Usually the newly allocated page is not physically contiguous to the splitting page, and so fragmentation has been caused. In this case *two* kinds of fragmentation have been caused - logical fragmentation (where the next logical page as determined by the index order is not the next physical page in the data file) and physical (or internal) fragmentation (where space is being wasted on index pages). These can both affect query performance (topic for a later post), as well as the expense of having to do the page split in the first place.

It's fairly well known that GUIDs can cause fragmentation in the index where the GUID is the key (e.g. a clustered index), but not about the knock-on effects in non-clustered indexes. Here's an example - I'll create two clustered indexes with GUID keys (one generated from NEWID and one from NEWSEQUENTIALID), plus a non-clustered index on each. Let's see what happens when we insert 100000 rows:

-- Create a table with a GUID key
CREATE TABLE BadKeyTable (
   
c1 UNIQUEIDENTIFIER DEFAULT NEWID () ROWGUIDCOL,
   
c2 DATETIME DEFAULT GETDATE (),
   
c3 CHAR (400) DEFAULT 'a');
CREATE CLUSTERED INDEX BadKeyTable_CL ON BadKeyTable (c1);
CREATE NONCLUSTERED INDEX BadKeyTable_NCL ON BadKeyTable (c2);
GO

-- Create another one, but using NEWSEQUENTIALID instead
CREATE TABLE BadKeyTable2 (
   
c1 UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID () ROWGUIDCOL,
   
c2 DATETIME DEFAULT GETDATE (),
   
c3 CHAR (400) DEFAULT 'a');
CREATE CLUSTERED INDEX BadKeyTable2_CL ON BadKeyTable2 (c1);
CREATE NONCLUSTERED INDEX BadKeyTable2_NCL ON BadKeyTable2 (c2);
GO

DECLARE @a INT;
SELECT @a = 1;
WHILE (@a < 10000)
BEGIN
   
INSERT INTO BadKeyTable DEFAULT VALUES;
   
INSERT INTO BadKeyTable2 DEFAULT VALUES;
   
SELECT @a = @a + 1;
END;
GO

-- And now check for fragmentation
SELECT
   
OBJECT_NAME (ips.[object_id]) AS 'Object Name',
   
si.name AS 'Index Name',
   ROUND (ips.avg_fragmentation_in_percent, 2) AS 'Fragmentation',
   
ips.page_count AS 'Pages',
   
ROUND (ips.avg_page_space_used_in_percent, 2) AS 'Page Density'
FROM sys.dm_db_index_physical_stats (DB_ID ('DBMaint2008'), NULL, NULL, NULL, 'DETAILED') ips
CROSS APPLY sys.indexes si
WHERE
   
si.object_id = ips.object_id
   
AND si.index_id = ips.index_id
   AND ips.index_level = 0;
GO

Object Name    Index Name        Fragmentation  Pages  Page Density
-------------  ----------------  -------------  -----  ------------
BadKeyTable    BadKeyTable_CL    99.13          8092   66.08
BadKeyTable    BadKeyTable_NCL   30.97          78     64.1
BadKeyTable2   BadKeyTable2_CL   0.83           5556   96.26
BadKeyTable2   BadKeyTable2_NCL  1.88           372    99.61

The BadKeyTable_CL clustered index with the GUID key generated from NEWID is almost perfectly fragmented, with 34% space being wasted on each page. Conversely, the BadKeyTable2_CL clustered index with the GUID key generated from NEWSEQUENTIALID is hardly fragmented and only 4% of free space is wasted (and this is just because of the row size chosen). These numbers are entirely expected given the nature of the cluster keys.

Now look at the non-clustered indexes. BadKeyTable_NCL is 31% fragmented with 36% space wasted on each page! BadKeyTable2_NCL is harldy fragmented with no free space wasted on each page. So what's going on? The non-clustered index key in both cases is a datetime column, which has a minimum granularity of 3 milliseconds. The code above runs in a tight loop inserting records and so can insert more than one record per 3ms time interval - and I allow this because I didn't create make the non-clustered indexes unique. For all the records inserted in one 3ms time interval, there *has* to be something that makes the non-clustered index key unique internally (as even though an index can be defined as non-unique, the Storage Engine requires that each record really is unique in an index and will add whatever it needs to so that happens).

In this case, the clustered index key (which must be present in the non-clustered index anyway) is used to differentiate between all the non-clustered index records with the same datetime value. For BadKeyTable_NCL, the cluster key is a random GUID, so the non-clustered index record insertion points ALSO become random within each 3ms time interval - leading to the fragmentation above. The BadKeyTable2_NCL non-clustered index has the same time interval issue, but it's cluster key is a sequential GUID, so the non-clustered index doesn't get fragmented. If the non-clustered index key was a time datatype with a larger minimum granularity (like smalldatetime, or the new date), the fragmentation of BadKeyTable_NCL would be even worse - try it for yourself and you'll see.

So the answer to the question in the blog post title is really - it depends! Under the right conditions, a GUID cluster key can also seriously fragment a non-clustered index as well.

Fresh off a week of teaching classes on the Microsoft campus, we've finalized some user group dates. Here's what we have coming up:

  • Monday August 18th: user group meeting in Redmond
  • Monday September 1st to 3rd: public class in England
    • In conjunction with our UK partners, SQL Know How at Hatfield, England
    • Topic: Best Practices in Performance and Availability in SQL Server 2005/2008 
  • Thursday September 4th: user group meeting in Ireland
  • Friday September 5th: SQL Server launch event in Ireland
    • In conjunction with Microsoft Ireland at the Dublin Microsoft office
    • Topic: SQL Server 2008 Overview for DBAs
  • Monday September 8th to 9th: public class in Scotland
    • In conjunction with our UK partners, SQL Know How at Edinburgh, Scotland
    • Topic: Indexing for Performance in SQL Server 2000/2005/2008

It's going to be a busy few weeks - hope to see you at one of these events!

Well, almost... Kimberly and I have agreed to co-author the upcoming SQL Server 2008 Internals book with our good friend Kalen Delaney. This is the 2008 evolution of Kalen's Inside SQL Server series of books and will be published by MS Press around February 2009. The other authors are (also our good friends) Adam Machanic and Conor Cunningham.

Kimberly's going to write the chapter on Index Internals and I'm going to write the chapter on DBCC Internals. This is very exciting as these are our respective favorite subjects, as you probably already know - these won't be short chapters :-)

It's going to be a busy rest of the year - phew!

PS Next year we have plans to write a book ourselves - watch this space...

Over the last few weeks I've seen (and helped correct) quite a few myths and misconceptions about index rebuild operations. There's enough now to make it worthwhile doing a blog post (and it's too hot here in Orlando for us to go sit by the pool so we're both sitting here blogging)...

Myth 1:  index rebuild pre-allocates the necessary space

This myth has two variations:

  1. The space for the new copy of the index is pre-allocated
  2. The space for the sort portion of the rebuild is pre-allocated

Neither of these are true. Index rebuild (whether online or offline, and at least as far back as 7.0) will create a new copy of the index before dropping the old copy. The pages and extents required to do this will always be allocated as needed, as with any other operation in SQL Server. The sort phase of an index rebuild, if required (in certain cases it is skipped in 2005), will adhere to the same allocation behavior.

Myth 2: indexes are rebuilt within a single file in a multi-file filegroup

This is a new one that I just heard yesterday - (paraphrasing) "In a two-file filegroup, an index in file 1 will be rebuilt into file 2. The next time it is rebuilt, it will be built in file 1. And so on".

This is untrue. Any time any allocations are done in a multi-file filegroup, the allocations are spread amongst all the files using the allocation system's proportional fill algorithm. In a nutshell, this says that space will be allocated more frequently from larger files with more free space than from smaller files with less free space. There is no concept in SQL Server of limiting allocations to a particular file in a multi-file filegroup.

Myth 3: non-clustered indexes are always rebuilt when a clustered index is rebuilt

This is untrue. The rules are a little complex here but can be summed up as follows:

  • In 2005+, rebuilding a unique or non-unique clustered index (without changing its definition) will NOT rebuild the non-clustered indexes
  • In 2000:
    • Rebuilding a non-unique clustered index WILL rebuild the non-clustered indexes
    • Rebuilding a unique clustered index will NOT rebuild the non-clustered indexes

The first few service packs of 2000 had bugs that changed the behavior of rebuilding unique clustered indexes back and forth - this is the source of much of the confusion around this myth.

For a much more detailed discussion of this, see my blog post from last Fall - Indexes From Every Angle: What happens to non-clustered indexes when the table structure is changed?.

Myth 4: BULK_LOGGED recovery mode decreases the size of the transaction log and log backups for an index rebuild

This myth is partly true.

Switching to the BULK_LOGGED recovery mode while doing an index rebuild operation WILL reduce the amount of transaction log generated, which is very useful for limiting the size of the transaction log file (note I say 'file', not 'files' - you only need one log file).

Switching to the BULK_LOGGED recovery mode while doing an index rebuild will NOT reduce the size of the transaction log BACKUP. Although the operation will be minimally-logged, the next transaction log backup will read all the transaction log since the last backup plus all the extents that were changed by the minimally-logged index rebuild. This will result in a log backup that's almost exactly the same size as for a fully-logged index rebuild. The ONLY time a log backup will contain data extents is when a minimally-logged operation has taken place since the last log backup - see here on MSDN for more info.

If you're considering using the BULK_LOGGED recovery mode, beware that you lose the ability to do point-in-time recovery to ANY point covered by a transaction log backup that contains even a single minimally-logged operation. Make sure that there's nothing else happening in the database that you may need to effectively roll-back with P.I.T. recovery. The operations you should perform if you're going to do this are:

  • In FULL recovery mode, take log backup immediately before switching to BULK_LOGGED
  • Switch to BULK_LOGGED and do the index rebuild
  • Switch back to FULL and immediately take a log backup

This limits the time period in which you can't do P.I.T. recovery.

Myth 5: online index rebuild doesn't take any locks

This myth is untrue. The 'online' in 'online index operations' is a bit of a misnomer.  Online index operations need to take two very short-term table locks. An S (Shared) table lock at the start of the operation to force all write plans that could touch the index to recompile, and a SCH-M (Schema-Modification - think of it as an Exclusive) table lock at the end of operation to force all read and write plans that could touch the index to recompile.

The most recent time this came up on the forums was someone noticing insert queries timing out after an online index rebuild operation had just started. The problem is that the  table lock that online index rebuild needs has to be entered into the grant queue in the lock manager until it can be acquired - and it will stay there until existing transactions that are holding conflicting locks either commit or roll-back. Any transaction that requires a conflicting lock AFTER the index rebuild lock has been queued but not acquired (and then released) will wait behind it in the lock grant queue. If the query timeout is reached before the transaction can get it's lock, it will timeout.

This is still much better than the table lock being held for the entire duration of the index rebuild operation. For more info, checkout this whitepaper on Online Index Operations in SQL Server 2005.

Kimberly and I were presenting at our local (Redmond) .Net Developers Association on Monday and the following question came up while Kimberly was talking about missing and extra indexes (paraphrasing):

What's the best non-clustered index to use for the query with a predicate WHERE lastname = 'Randal' AND firstname = 'Paul' AND middleinitial = 'S'?

Kimberly said that the order of the keys (e.g. lastname, firstname, middleinitial; or middleinitial, lastname, firstname; etc) doesn't matter for this case. I thought about it for a second and then argued, saying that the most selective column should come first. We agreed to discuss with the group at the end, but I thought about it some more and realized (and admitted to the group) that she's right - I should know better than to question Kimberly's knowledge of indexing... :-)

She's right because for a pure equality query using AND for multiple predicates, the Storage Engine will seek straight to the first exactly matching record in the index (and then scan for more matches if it's a non-unique index). It doesn't matter what order the index keys are defined because the Storage Engine is looking for an exact match.

When I started arguing, I was thinking about a phone book, which is ordered by lastname, firstname, middleinitial. You may think that a phone book is ordered that way because lastname is the most selective. Wrong. It's because the lastname is what most people know - it just happens to be the most selective of the three choices. Most SQL geeks should be able to find Kimberly in a phone book by looking for Tripp, Kimberly. But what if it was ordered by middleinital? I'd have no problem finding Kimberly, but how many of you would remember that her middleinitial is L? Probably a few as we both use our middle initials in our public names. What about if it was ordered by middleNAME? Again, no problem for me but who how many other people know her middle name is Lynn?

Then I started thinking about other queries and how they would play into the index choice to answer to the question above. If I also wanted to support a query with the predicate WHERE lastname = 'Randal', then having the left-most index key be anything other than lastname won't work so well. If the key order was firstname, middleinitial, lastname then all the distinct lastname values would be spread through the index rather than being together. The index might still be used to satisfy the query if it's the lowest cost index to use. However, having lastname be the leading key probably wouldn't work very well for a query with a predicate of WHERE firstname = 'Paul' - that argues for having firstname be the left-most index key.

Which should I choose? I probably I can't have both in the same index, so maybe I'd have TWO non-clustered indexes, to support both queries. The answer depends on how often the various queries are used and the trade-off between how much of a performance gain the non-clustered index would provide against the performance drop of having to maintain it during DML operations.

I hear time and again about people adding a non-clustered index for every column in the table, thinking that this will help - and my thinking is that this is wrong because these indexes can only satisfy a query where the only predicate is the column being indexed. I ran this argument past Kimberly and she added that these indexes could also be used if the column is chosen as the most selective in a multi-predicate query, and no other index has a lower cost than that one (a slim chance usually). Even what I though of as a simple case has caveats!

So what's the point of this post? Well, I wanted to show how indexing for one very simple query is pretty straightforward, but as soon as the number of different queries grows, and the query predicates get more complicated, indexing becomes more complex. You really have to know your workload and your data to know which columns are used, in what combinations, and how often - and then it helps to know how indexes are costed and used so that you can make intelligent choices about which indexes to define.

This thought-exercise has really shown me that I didn't know how much I don't know about indexes - I know precisely how they work at the Storage Engine level but not too much about how they're used by the Query Processor. I have new-found respect for Kimberly's indexing expertise. Luckily she's teaching a class at Microsoft called Indexing For Performance next week - I think I'll attend :-)

It seems like all I've been talking about on the forums the last couple of days is the correct order of operations in a maintenance plan. The biggest confusion seems to be about the effect of rebuilding an index on statistics, with some incorrect advice being given out on what to do.

Rebuilding an index will update statistics with the equivalent of a full scan - doesn't matter whether you use DBCC DBREINDEX or ALTER INDEX ... REBUILD, the effect is the same. It can do this because it sees a complete picture of the index while its doing the rebuild.

Reorganizing an index (using the old DBCC INDEXDEFRAG I wrote, or the new ALTER INDEX ... REORGANIZE) will NOT update statistics at all, because it only sees a few pages of the index at a time.

The problem I've been seeing is people rebuilding indexes and then updating statistics. So why is this a problem? Well, it depends :-)

  • If your default for updating statistics is to do a sampled scan, then having rebuild update the statistics with a full scan (as a side effect), and then proceeding to manually update them again with the default sampled scan, means that after both operations you're left with sampled statistics. You've wasted resources doing the sampled scan AND you've lost the 'free' full-scan statistics that the index rebuild did for you.
  • If your default is to do a full scan, then you don't lose out on having the best statistics, but you do waste resources by unnecessarily updating statistics a second time.

So what's the solution?

The simple answer is not to update statistics on indexes that have just been rebuilt.

The more complicated answer is to:

  1. Have a list of indexes (F) that you know will cause workload performance problems if they get fragmented
  2. Have a list of indexes (S) that you know will cause workload performance problems if the statistics don't get regularly updated
  3. Evaluate fragmentation for list F, and choose to reorganize, rebuild, or do nothing
  4. For all indexes in list S that were not rebuilt in step 3, update statistics

Hope this helps.

One of the cool features of SQL Server 2008 for developers is spatial data support. There have been some great posts recently about using spatial (see Simon Sabin's blog here), which is all developer stuff - but what I'm interested in are the implications of spatial support for DBAs, and they are focused on spatial indexes

Note: all images in this post are taken from November CTP Books Online

There are two kinds of spatial data that 2008 supports - planar (i.e. points, lines, polygons on a single 2-D plane) and geodetic (i.e. points, lines, polygons on a geodetic ellipsoid - for example, the Earth). These are presented in SQL as geometry and geography data respectively. A common operation that's performed on spatial data is comparing two spatial values to see if they intersect at all. Now, this is a complicated calculation, which gets more computationally expensive as the complexity of the spatial values increase. Given a problem of 'which spatial values in this table does this spatial value X intersect with', it would be great to have some way of quickly pruning out spatial values in the table that cannot possibly intersect with X, and so avoid doing the expensive calculation for them. Enter spatial indexes.

Here's the basic idea behind a spatial index:

  • A plane is broken up into a grid of cells.
  • Each spatial value is evaluated to see which cells in the grid it intersects with
  • The list of cells is stored along with the primary key of the table row that the spatial value is part of
  • Comparing two spatial values for intersection is a matter of comparing the list of grid cells - if there are no matching cells, the spatial values do not intersect, and there's no need to do the expensive intersection calculation

In practice its a bit more complicated. For planar data (i.e. the geometry data type), you need to define a bounding box (i.e. 4 corner points that define a rectangle of space in which you're interested on the 2-D plane). That bounding box on the plane will be broken down into a grid of cells. The top-level grid can be up to 16x16, giving 256 cells. The next level of granularity breaks each of those top-level grid cells into a further grid, again up to 16x16. So now there could be (16x16) x (16x16) cells in the grid - or 65536 cells. This obviously allows a more exact description of a spatial value in the list of cells. And so on and so on. There are actually 4 levels of grid that the bounding box is broken up into - and each can be 16x16, for a possible total of 168 or 4 billion cells. The picture below illustrates this with a grid size of 4x4 at each level.

The bounding box and the size of the grid at each level are specified when the spatial index is created, as well as the maximum number of matching grid cells to store in the spatial index per spatial value - to a max of 8192. Once the bounding box has been decomposed into the various levels of grid, each value in the spatial column is evaluated against the grid. The value is first decomposed against the first level grid. If the number of cells it matches is less than the max per spatial value, the decomposition then moves to the second level grid. This decomposition continues until the maximum number of matching grid cells is reached. If the max is reached while processing a deeper level for a cell, (e.g. in the middle of processing the 2nd level grid of 4x4 for cell #13 in the 1st level), the deeper level matches are thrown away and only the coarser granularity matching cell is stored (e.g. continuing that example, the 2nd level grid matches are discarded and only cell #13 in the 1st level will be stored). The picture below helps to illustrate this.

So, each geometric spatial value is approximated in the spatial index by a list of matching cells in the defined bounding box. As there is a limit to the number of matching cells that can be stored in the approximation, it is an optimistic representation. This means that if two values are compared using the approximations, there will be no false negatives, as the approximations map a larger space than the actual spatial values. There can, however, be false positives. A false (or real) positive means the spatial values need to then be compared using the complex, computationally expensive intersection algorithm using the actual spatial values. So again, the spatial index serves as a way of pruning out the need to run the expensive algorithm.

The algorithm is very similar for geodetic data (i.e. the geography data type), however there's no bounding box. Instead, the entire geodetic ellipsoid is projected onto a 2-D plane and then the grid decomposition algorithm is applied to that plane in exactly the same way as for planar data. The picture below describes how the projection is done.

You may have already realized that the effectiveness of the spatial index in pruning is directly proportional to how exactly the approximations in the index actually describe the spatial values. In other words, the higher the number of grid cells at each level, and the higher the number of grid cell matches that are stored per spatial value in the index, the better the index is at pruning. More exact approximations require storing more matching grid cells at deeper granularities - i.e. taking MORE SPACE. Creating a more exact spatial index takes more space.

With all that in mind, the interesting thing for DBAs here is that there's a trade-off between CPU use to do the real intersection algorithm and spatial index size to use in pruning calls to the algorithm. It's too early to know what best practices there are - but I'll blog them as I here about them.

While we were in Barcelona we sat down with Richard Campbell and Greg Hughes from RunAs Radio to record a 1/2 hour interview on SQL Server 2008. We touch on a ton of different features (look at the number of Categories I've tagged this with!) and have a bunch of laughs along the way - check it out here.

PS There's been a ton of interest in the slide deck idea I had so we'll be going ahead with that. Look for an announcement sometime in the first few months of next year about how to get them. Thanks to everyone that replied!

I’m in the middle of a flight from Washington D.C. to Zurich on the way to Barcelona for TechEd IT Forum and I can’t sleep – Kimberly’s out like a light so what else is there to do except write another blog post? :-) OK - actually posting this from Barcelona on Tuesday before our first of 12 sessions here!

In the Database Maintenance workshop we did at SQL Connections last week I promised to blog a script I used to show how data file shrink operations cause massive fragmentation of indexes. The reason is that data file shrink starts at the end of the data file and moves a single page at a time to a free space below the shrink threshold. In the process of doing so, it perfectly reverses the physical order of the pages comprising the leaf level of an index – thus perfectly fragmenting it!!

Let’s try out my simple script that demonstrates this. First thing I’m going to do is create a new database and create a 10MB ‘filler’ table, which I’m going to then drop later to create a space that shrink can use.

USE MASTER;

GO

 

IF DATABASEPROPERTYEX ('shrinktest', 'Version') > 0

      DROP DATABASE shrinktest;

 

CREATE DATABASE shrinktest;

GO

USE shrinktest;

GO

 

SET NOCOUNT ON;

GO

 

-- Create and fill the filler table

CREATE TABLE filler (c1 INT IDENTITY, c2 VARCHAR(8000))

GO

DECLARE @a INT;

SELECT @a = 1;

WHILE (@a < 1280) -- insert 10MB

BEGIN

      INSERT INTO filler VALUES (REPLICATE ('a', 5000));

      SELECT @a = @a + 1;

END;

GO

Next I’ll create the ‘production’ table that we’d really like to keep in optimal shape for performance.

-- Create and fill the production table

CREATE TABLE production (c1 INT IDENTITY, c2 VARCHAR (8000));

CREATE CLUSTERED INDEX prod_cl ON production (c1);

GO

DECLARE @a INT;

SELECT @a = 1;

WHILE (@a < 1280) -- insert 10MB

BEGIN

      INSERT INTO production VALUES (REPLICATE ('a', 5000));

      SELECT @a = @a + 1;

END;

GO 

Now I’ll use the sys.dm_db_index_physical_stats DMV to check the fragmentation of the production table’s clustered index – it should be almost zero:

-- check the fragmentation of the production table

SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (

      DB_ID ('shrinktest'), OBJECT_ID ('production'), 1, NULL, 'LIMITED');

GO

avg_fragmentation_in_percent fragment_count

---------------------------- --------------------

0.390930414386239            6

This is what I expected. Now I’m going to drop the filler table, run a shrink operation and then check the fragmentation again:

-- drop the filler table and shrink the database

DROP TABLE filler;

GO

 

-- shrink the database

DBCC SHRINKDATABASE (shrinktest);

GO

 

-- check the index fragmentation again

SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (

      DB_ID ('shrinktest'), OBJECT_ID ('production'), 1, NULL, 'LIMITED');

GO

avg_fragmentation_in_percent fragment_count

---------------------------- --------------------

99.7654417513683             1277

Wow! The index went from almost 0% fragmented to almost 100% fragmented – the shrink operation totally reversed the physical ordering of the leaf level of the clustered index – nasty.

One of the common maintenance operations I see at customer sites is to run a database shrink at some interval, and I always advise against it – now you can see why. Running a regular shrink operation can cause horrible fragmentation problems. The worst problems I see are those customers with maintenance plans that rebuild all indexes and then run a shrink to remove the extra space necessary for the index rebuilds – completely undoing the effects of the index rebuild!

One other common thing I see is to have auto-shrink set on for one or databases. This is bad for several reasons:

  • Shrink causes index fragmentation, as I’ve just demonstrated above.
  • You can't control when it kicks in. Although it doesn't have any effect like long-term blocking, it does take up a lot of resources, both IO and CPU. It also moves a lot of data through the buffer pool and so can cause hot pages to be pushed out to disk, slowing things down further. If the server is already pushing the limits of the IO subsystem, running shrink may push it over, causing long disk queue lengths and possibly IO timeouts.
  • You're likely to get into a death-spiral of auto-grow then auto-shrink then auto-grow then auto-shrink... (in my experience, if someone is using auto-shrink, they're most likely using and relying on auto-grow too). An active database usually requires free space for normal operations - so if you take that free space away then the database just has to grow again. This is bad for several reasons:
    • Repeatedly shrinking and growing the data files will cause file-system level fragmentation, which can slow down performance
    • It wastes a huge amount of resources, basically running the shrink algorithm for no reason
    • Auto-grow itself can be bad, especially if you're using SQL Server 2000 (or don't have Instant File Initialization turned on - see this post from Kimberly's blog) where all allocations to the file being grown are blocked while the new portion of the file is being zero-initialized.

Bottom-line: auto-shrink should *NEVER* be turned on…

A couple more questions from the last couple of classes.

Q1) Why doesn't performing an index rebuild alter the fragmentation?

A1) Here are the possibilities - all of which I've seen happen:

  • There isn't an index - either DBCC DBREINDEX or ALTER INDEX ... REBUILD are being run on a table that only has a heap, and so the (extent) fragmentation of the heap isn't changing because there's no way to rebuild a heap (except by the heavily NOT recommended method of creating and then dropping a clustered index).
  • The index is too small. An index with only a handful of pages may not show any changes in fragmentation because all the pages are single, mixed pages (see my previous post on extent types for more info) and so rebuilding the index does nothing.
  • The workload and schema are such that by the time the rebuild has finished and the fragmentation calculation has been done again, the index is already getting fragmented again.
  • The Extent Scan Fragmentation result from DBCC SHOWCONTIG is being used to gauge fragmentation for an index stored in a filegroup with multiple files. The Extent Scan Fragmentation in DBCC SHOWCONTIG does not cope with multiple files (as is documented in Books Online) and so the value may even go UP in some cases!

Q2) What operations take advantage of minimal-logging when the recovery mode is BULK_LOGGED?

A2) The list is very small - 4 four classes of operations:

  • Index builds, rebuilds, or drop of a clustered index (NOT index defrags with DBCC INDEXDEFRAG or ALTER INDEX ... REORGANIZE - this is a common misconception).
  • Bulk load operations (i.e. BCP, INSERT ... SELECT * FROM OPENROWSET (BULK...), and BULK INSERT).
  • Insert or appends of LOB data (either using WRITETEXT/UPDATETEXT for TEXT/NTEXT/IMAGE data types, or UPDATE with a .WRITE clause).
  • SELECT INTO operations on permanent tables.

For these operations, only the allocations are logged in the transaction log. Any extents that are allocated and changed through a minimally-logged operation are marked in the ML bitmaps (one for every 4GB of each file) and then the next transaction log backup will also read all those extents and include them in the backup.

This is a quick answer to a question I was sent today by someone who'd read Kimberly's partitioning whitepaper - Partitioned Tables and Indexes in SQL Server 2005 - and is implementing a "sliding-window" scenario. (This is a mechanism to allow fast insertion and deletion of significant portions of data into/from a partitioned production table. Insertion is done by taking a table and making it a new partition of the production table - called switching-in. Deletion is done by removing a partition from the production table and making it into a stand-alone table - called switching-out.)

The question is - what indexes are required on the staging table to prevent the ALTER TABLE ... SWITCH PARTITION statement from failing with a message like that below:

Msg 4947, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. There is no identical index in source table 'PartitionTest.dbo.StagingTable' for the index 'NC_Birthday' in target table 'PartitionTest.dbo.ProductionTable'.

The answer is that the staging table has to have the exact same indexes - clustered and non-clustered - as the production table. I asked Kimberly if it has to have the same constraints too - the answer is yes, plus the staging table has to have a trusted constraint on it such that SQL Server can tell (without checking all the data in the staging table) that all the data satisfies the partitioning function for the partition that you're switching-in (i.e. the partition that the staging table will become in the production table). If it doesn't, the switching-in will fail with the following error:

Msg 4982, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. Check constraints of source table 'PartitionTest.dbo.StagingTable' allow values that are not allowed by range defined by partition 4 on target table 'PartitionTest.dbo.ProductionTable'.

One thing that confuses people is that SQL Server does not create the target table for you when doing a switch-out of a partition. The target table has to exist and have the exact same schema as the production table. Also, it has to be completely empty - otherwise you'll get an error like:

Msg 4905, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. The target table 'PartitionTest.dbo.StagingTable' must be empty.

The must-be-empty requirement also holds for switching-in operations - the partition that will be created has to be empty otherwise a similar 4904 error results.

Hope this helps!

Two of the cool features in SQL Server 2005 are CROSS APPLY and DMVs (Dynamic Management Views). Now, far be it for me to get my hands dirty explaining developer stuff like CROSS APPLY :-) but I was having a discussion with Colin Leversuch-Roberts in the UK about the composability limitations of the sys.dm_db_index_physical_stats DMV. (Btw - you should check out Colin's blog post series on Analysing Indexes - lots of useful stuff).

So CROSS APPLY lets you do join-like functionality with table-valued functions that take parameters - which you can't do using JOIN. This works for most of the DMVs, but some of them are written to an older internal implementation that doesn't support CROSS APPLY, and sys.dm_db_index_physical_stats is one of them. If you try it you'll get an error like:

Msg 413, Level 16, State 1, Line 26

Correlated parameters or sub-queries are not supported by the inline function "master.sys.dm_db_index_physical_stats".

Fortunately there's a way around this - wrap the DMV in an artificial TVF of your own, and then CROSS APPLY to that. Here's an example:

 

CREATE FUNCTION my_index_physical_stats (

@database_id INT,

@object_id INT,

@index_id INT,

@partition_number INT,

@mode INT)

RETURNS @result TABLE (

database_id SMALLINT NULL, object_id INT NULL, index_id INT NULL, partition_number INT NULL,

index_type_desc NVARCHAR(60) NULL, alloc_unit_type_desc NVARCHAR (60) NULL, index_depth TINYINT NULL,

index_level TINYINT NULL, avg_fragmentation_in_percent FLOAT NULL, fragment_count BIGINT NULL,

avg_fragment_size_in_pages FLOAT NULL, page_count BIGINT NULL, avg_page_space_used_in_percent FLOAT NULL,

record_count BIGINT NULL, ghost_record_count BIGINT NULL, version_ghost_record_count BIGINT NULL,

min_record_size_in_bytes INT NULL, max_record_size_in_bytes INT NULL, avg_record_size_in_bytes FLOAT NULL,

forwarded_record_count BIGINT NULL)

BEGIN

INSERT INTO @result SELECT * FROM

sys.dm_db_index_physical_stats (@database_id, @object_id, @index_id, @partition_number, @mode)

RETURN

END;

GO

(This is an updated repost from earlier this year on my old blog)

Whenever I’m discussing index maintenance, and specifically fragmentation, I always make a point of saying ‘Make sure the index is being used before doing anything about fragmentation’. If an index isn’t being used very much, but has very low page density (lots of free space in the index pages), then it will be occupying a lot more disk space than it could do and it may be worth compacting (with a rebuild or a defrag) to get that disk space back. However, usually there’s not much point spending resources to remove any kind of fragmentation when an index isn’t being used. This is especially true of those people who rebuild all indexes every night or every week.

 

You could even go so far as to say if a non-clustered index isn’t being used, why is it there at all? Extra non-clustered indexes drag down performance in a number of ways. Consider a non-clustered index called IX_MyNCIndex on the table MyTable:

  • Any time a record is inserted into MyTable, a matching record is inserted into IX_MyNCIndex. This is a bunch of extra IOs, extra log records, plus maybe even a page-split.
  • Any time a record is deleted from MyTable, the matching record in IX_MyNCIndex must be deleted. Extra IOs again and log records again.
  • Any time a record in MyTable is updated:
    • If MyTable has a clustered index, and the clustered index key value changes, then the matching record in IX_MyNCIndex must be updated. Extra IOs and log records again.
    • If any of the non-clustered index key values changes, or any of the INCLUDEd column values changes, then the matching record in IX_MyNCIndex must be updated. Extra IOs and log records again.
  • If a clustered index is created on MyTable, then IX_MyNCIndex has to be rebuilt to include the logical RIDs rather than the physical heap RIDs (see this post for an explanation). Lot of extra IOs and log records again.

That’s a significant amount of extra IOs and log records to maintain each extraneous non-clustered index.

 

So, how can you tell if an index is being used?

  • In SQL Server 2000 there is no way to do it
  • In SQL Server 2005 there are a few different ways in SQL Server 2005 – the one I want to discuss in this post is the sys.dm_db_index_usage_stats DMV.

This DMV exposes the information that is tracked about index usage (as the name suggests). It does not generate any information itself; it just returns info from a cache inside SQL Server. This cache is empty when the server instance starts, and is not persisted across instance restarts. All cache entries for indexes in a database are removed when that database is closed. So, the cache tracks usage information about which indexes have been used since the database they are part of was last opened (either manually or as part of instance start-up).

 

This continues to confuse people so I'll call it out: if the output from the DMV does not have an entry for the index you're interested in, it has not been used since the last database startup.

 

The cache tracks the following info for each index (for user queries and system queries):

  • The number of times it was used in a seek operation (either looking up a single row, or doing a range scan) along with the time of the last seek.
  • The number of times it was used in a scan operation (e.g. a select * operation) along with the time of the last scan
  • The number of times it was used in a lookup operation (this means a bookmark lookup – where a non-clustered index does not fully cover a query and additional columns must be retrieved from the base table row) along with the time of the last lookup.
  • The number of times it was used in an update operation (this counts inserts, updates, and deletes) along with the time of the last update.

Let’s have a look at its use.

SELECT * FROM sys.dm_db_index_usage_stats;

GO

The output is too wide for a single image so I've split it in two (I won't post any more output from the DMV - I'll just talk about it):

indexusage11.jpg

indexusage21.jpg

Unless you've just re-started your instance, you'll see a bunch of output from this, representing all index activity since the instance/databases started. If you're interested in whether an index is being used, you can filter the output. Let's focus in on a particular table  - AdventureWorks.Person.Address.

SELECT * FROM sys.dm_db_index_usage_stats

WHERE database_id = DB_ID('AdventureWorks')

AND object_id = OBJECT_ID('AdventureWorks.Person.Address');

GO

You'll probably see nothing in the output, unless you've been playing around with that table. Let's force the clustered index on that table to be used, and look at the DMV output again.

SELECT * FROM AdventureWorks.Person.Address;

GO

SELECT * FROM sys.dm_db_index_usage_stats

WHERE database_id = DB_ID('AdventureWorks')

AND object_id = OBJECT_ID('AdventureWorks.Person.Address');

GO

Now there's a single row, showing a scan on the clustered index. Let's do something else.

SELECT StateProvinceID FROM AdventureWorks.Person.Address

WHERE StateProvinceID > 4 AND StateProvinceId < 15;

GO

SELECT * FROM sys.dm_db_index_usage_stats

WHERE database_id = DB_ID('AdventureWorks')

AND object_id = OBJECT_ID('AdventureWorks.Person.Address');

GO

And there's another row, showing a seek in one of the table's non-clustered indexes.

So, its easy to look at the index usage for particular tables and indexes. But how can you monitor this over time? This is easy too - let's see how. First we need to create our own table to store snapshots of the DMV output.

IF OBJECTPROPERTY (object_id (N'master.dbo.MyIndexUsageStats'), 'IsUserTable') = 1 DROP TABLE dbo.MyIndexUsageStats;

GO

SELECT GETDATE () AS ExecutionTime, * INTO master.dbo.MyIndexUsageStats

FROM sys.dm_db_index_usage_stats WHERE database_id=0;

GO

Next we need to take a baseline snapshot of the DMV output.

INSERT master.dbo.MyIndexUsageStats

SELECT getdate (), * FROM sys.dm_db_index_usage_stats;

GO

And now simulate a few operations and take another snapshot of the DMV:

SELECT * FROM AdventureWorks.Person.Address;

GO

SELECT * FROM AdventureWorks.Person.Address;

GO

SELECT StateProvinceID FROM AdventureWorks.Person.Address

WHERE StateProvinceID > 4 AND StateProvinceId < 15;

GO

INSERT master.dbo.MyIndexUsageStats

SELECT getdate (), * FROM sys.dm_db_index_usage_stats;

GO

And look at the filtered contents of our snapshot table:

SELECT * FROM master.dbo.MyIndexUsageStats

WHERE database_id = DB_ID('AdventureWorks')

AND object_id = OBJECT_ID('AdventureWorks.Person.Address');

GO

You should see four rows - two from the baseline snapshot and two from the final snapshot. If you ran just the statements above, you'll see that the user_scans count for the clustered index has increased by two, and the user_seeks count for the non-clustered index has increased by one.

So this is a pretty simple example of how you can track index usage. By putting something like this into a regularly run script you can tell which indexes aren't being used and could be candidates for less-regular index maintenance or removal altogether.

Let me know how you get on.

I'd like to kick off the Indexes From Every Angle series this evening by re-posting some articles from my old blog. Both of these topics continue to crop up on newsgroups and forums.

What happens to non-clustered indexes when changes are made to the underlying table? Are they always rebuilt or not? Well - it depends! (my favorite answer :-)).

Before we get into that discussion, I’ll give you a little background.

 

One way to describe a non-clustered index is whether it is a covering index or not. A covering index is one which has all the table columns necessary to satisfy a query, and so there is no need to go back to the underlying table (which I’ll call the base table) to fetch additional columns. Most non-clustered indexes are not covering indexes and so it is necessary for each non-clustered index record (even in covering indexes) to include a link back to the corresponding base table record. This link takes various forms, depending on the structure of the base table:

  • Heap: A heap does not impose any logical ordering on the records within it, so the only way to find a particular record within a heap is to do a full table scan or to know the physical location of the record (i.e. which record slot on which data page) – called the physical record ID or physical RID. Clearly doing a full table scan every time a query needs to go from a non-clustered index record back to a heap is not feasible, so each non-clustered index record stores the physical RID of the corresponding heap record.
  • Clustered index: A clustered index does impose a logical ordering on the records within it and the ordering exactly matches the definition of the clustered index keys – known as the cluster key. The fastest way to find a particular record in a clustered index is to use the cluster key to navigate through the clustered index b-tree to find the exact record. So, non-clustered indexes over a clustered index include the cluster key of the base table record in each non-clustered index record. This is known as the logical RID. There is a twist though, depending on whether the clustered index is unique or not.
    • Non-unique clustered index: Every record in a clustered index HAS to be unique, otherwise there would be no way to deterministically navigate to a particular record using the index b-tree. In a non-unique clustered index, SQL Server has to add a special column, called the uniquifier, to each record, so that if multiple records have the same cluster key values, the uniquifier column will be the tie-breaker. This uniquifier column is added as part of the cluster key, and so it is also present in all non-clustered index records as part of the logical RID.
    • Unique clustered index: Every record in a unique clustered index is already unique and so no extra column is required in the cluster key.

Ok - background out of the way. Now let's discuss what happens to non-clustered indexes when various actions are performed to the base table.

 

SQL Server 2000

  • Going from a heap to a clustered index: Changing the fundamental structure of a table from a heap to a clustered index means you're changing the way that records can be looked up in the table, from physical to logical. This means that all the physical RIDs that are stored in the non-clustered index records need to be changed to logical RIDs. So - in this case, all non-clustered indexes are rebuilt after the clustered index has been built.
  • Going from a clustered index to a heap: This is the opposite of the case above - you're changing the way records are looked up from logical to physical. This means that all the logical RIDs that are stored in the non-clustered index records need to be changed to physical RIDs. So - in this case, all non-clustered indexes are rebuilt after the clustered index has been dropped.
  • Rebuilding a unique clustered index: This operation has no effect on the cluster key, so there is no need for the non-clustered indexes to be rebuilt. Various early Service Packs of SQL Server 2000 had bugs that caused this behavior to be broken - and this is the cause of much of the confusion around this behavior.
  • Rebuilding a non-unique clustered index: In SQL Server 2000, when a non-unique clustered index (which contains uniquifier columns) is rebuilt, all the uniquifier values are regenerated. This essentially means that the cluster keys have changed and so all the non-clustered indexes are rebuilt after the clustered index rebuild has completed.
  • Changing the clustered index schema: This is simple - any change to the clustered index schema that changes the cluster keys will cause all the non-clustered indexes to be rebuilt.

 SQL Server 2005

  • Going from a heap to a clustered index: This is the same as SQL Server 2000 - all non-clustered indexes are rebuilt, regardless of whether the clustered index was created online or offline.
  • Going from a clustered index to a heap: Again, all non-clustered indexes are rebuilt, regardless of whether the clustered index is dropped online or offline.
  • Rebuilding a unique clustered index: This is the same as SQL Server 2000 - the cluster keys aren't changing and so the non-clustered indexes are not rebuilt.
  • Rebuilding a non-unique clustered index: Aha! This is different from SQL Server 2000. SQL Server 2005 will RE-USE the old uniquifier values so the cluster keys don't change. This means that non-clustered indexes are NOT rebuilt in this case - that's very cool!
  • Changing the clustered index schema:
    • Changing the cluster key: This behavior has to be the same as SQL Server 2000 - any change to the cluster keys forces all the non-clustered indexes to be rebuilt after the new clustered index has been created.
    • Changing anything else (e.g. partitioning it or moving it): This is one of the cases that confuses people in SQL Server 2005. Applying a partitioning scheme to a clustered index, or moving it to a different filegroup, doesn't change the cluster key at all - so there's no need to rebuild the non-clustered indexes.

So the rule of thumb to remember is: if the physical or logical RID contained in the non-clustered index records needs to change then all the non-clustered indexes need to be rebuilt. Its as simple as that.

 

Hopefully this will help to clear up some confusion - let me know of any questions!

Theme design by Nukeation based on Jelle Druyts