This is a question that came up today on Twitter, and is actually something I've been meaning to blog about.

One of the biggest space hogs in tempdb can be DBCC CHECKDB. It generates all kinds of information about what it's seeing in the database (called facts) and stores them in a giant worktable. A fact may be something like 'we read page F' or 'record X on page Y points to an off-row LOB column in record A of page B' or it could be something like an entire IAM page bitmap. It is usually the case that the amount of memory required for the worktable is more than is available to store it in memory and so the worktable spills out to tempdb.

CHECKDB needs to use this fact generation method because it doesn't read the data file pages in any kind of logical or depth-first order - it reads them in allocation order, which is the fastest way. In fact it spawns multiple threads and each thread reads a set of pages, which is why I/O performance is sucked down while CHECKDB is running - its special readahead drives the I/O subsystem as hard as it can. As each thread is generating all the facts, it hands them to the query processor which sorts them by a key CHECKDB defines (page ID, object ID, index ID etc) and inserts them into the worktable.

Once fact generation has finished, the query processor then gives the facts back to CHECKDB again so that it can match them up (e.g. page X points to page Y, so we better have seen page Y) - called the aggregation phase. If any mismatched or extra facts are found, that indicates a corruption and the relevant error is generated.

Here's a picture of the process:

Now - because CHECKDB can use up so much tempdb space, we put in a way to ask CHECKDB to estimate how much tempdb space will be required - it's called WITH ESTIMATEONLY. The output looks something like:

Estimated TEMPDB space needed for CHECKALLOC (KB)
-------------------------------------------------
56
(1 row(s) affected)

Estimated TEMPDB space needed for CHECKTABLES (KB)
--------------------------------------------------
3345
(1 row(s) affected)

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

This isn't the total amount of space required to check the entire database, because CHECKDB never checks the entire database in one go. To try to limit the amount of tempdb space required, it breaks the database down into batches. Batches are built by adding in more and more tables until one of the following limits is reached:

  • There are 512 or more indexes in the batch
  • The total estimate for facts for this batch is more than 32MB

The smallest possible batch is one table and all its indexes - so a very large table may easily blow the 32MB fact limit (or theoretically the 512 index limit).

The fact size estimation is calculated by looping through all partitions of all indexes3 for a table (remember a non-partitioned table or index is a special case of a single partition as far as SQL Server is concerned) and adding up:

  • Twice the sum of all pages allocated to the partition (HoBt, LOB, and SLOB)
  • Three times the number of HoBt pages in the clustered index
  • Two times the number of LOB columns in the table
  • Two times the number of tables rows, if a heap
  • Maximum row size times the number of HoBt pages

And these totals are multiplied by the sizes of the relevant facts.

The WITH ESTIMATEONLY option runs through all the batches and spits out the largest total fact estimate from all the batches. It's supposed to be a very conservative estimate, but certain pathological cases can trip it up as it's can't account for all possible schemas.

One more piece of knowledge I can safely page-out of my head now! :-)

PS Beware also that I've heard of several cases of a bug in SQL Server 2008 R2 where the output is incorrectly very low. The dev team is aware of this issue and are working on it.

I've recently found out that there's a bug some people are hitting when running DBCC CHECKDB on SQL Server 2005 - you get an error like:

Msg 831, Level 20, State 1, Line 1
Unable to deallocate a kept page.

This is a bug in the database snapshot code, and shows up sometimes when running DBCC CHECKDB because I changed DBCC CHECKDB to use a hidden, automatically-created database snapshot to get a transactionally consistent view of the database from SQL Server 2005 onwards.

If you're seeing this problem, you need to install SQL Server 2005 SP2 CU13, SQL Server 2005 SP3 CU4, or SQL Server 2008/2008 R2.

More details are in KB article 919142, which was only recently made public.

(Look in the Misconceptions blog category for the rest of the month's posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together: CommonSQLServerMyths.pdf (732.96 kb))

A short one today as I'm up to my eyeballs with client work and sorting through SQL Connections abstract submissions!

Myth #27: you can avoid having to run DBCC CHECKDB (or equivalent) by using BACKUP ... WITH CHECKSUM

FALSE

On first glance, this seems like it should be true as the WITH CHECKSUM option will cause all existing page checksums on allocated pages to be checked. But it's not and here's why:

Firstly, on a database that's been upgraded from SQL Server 2000 or earlier, page checksums must be enabled. After they're enabled, not all pages in the database will have page checksums on them - and an I/O subsystem does not distinguish between pages with and without page checksums when causing corruption. So if all you do is use BACKUP ... WITH CHECKSUM, there may be corruption that you won't find until it's too late... (I'll just leave the 'until it's too late' hanging there... you can imagine scary consequences for yourselves :-)

Secondly, you might only take a full database backup once per month, which isn't a frequent enough consistency check for my liking (I recommend at least once per week). Even with a weekly differential backup, that's not going to check all allocated pages in the database - only those in extents that it backs up (those extents that changed since the last full backup).

Lastly, and most insidiously, relying solely on BACKUP ... WITH CHECKSUM leaves you susceptible to in-memory corruptions. If a bad memory chip, badly-written XP, or other rogue Windows process corrupts a SQL Server data file page in memory, and then it gets written to disk, you've got a corrupt page with a valid checksum - and nothing will catch that except DBCC CHECKDB.

Bottom line - you can't avoid running consistency checks. If you're having trouble, take a look at my old blog post CHECKDB From Every Angle: Consistency Checking Options for a VLDB.

And while we're on the subject, check out this post: Search Engine Q&A #26: Myths around causing corruption.

(Look in the Misconceptions blog category for the rest of the month's posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together: CommonSQLServerMyths.pdf (732.96 kb))

It's the last day of our class here in Boston and I'm teaching all day today, so I need to bang out my Debunk-A-Myth-A-Day blog post before heading down from our hotel room. Incidentally, we've had fellow MVP Aaron Bertrand (Twitter|blog) in the class this week. He's a fine, very knowledgeable fellow and has added a lot to the discussions from his experiences and earlier in the week he wrote a blog post describing his views on the class, and how hard it can be to take a week off production work.

But I digress (as usual).

Myth #2: DBCC CHECKDB causes blocking because it takes locks by default.

FALSE

In 7.0 and before, DBCC CHECKDB (and the rest of the family of consistency checking commands) were a nasty mess of nested loop C code that took table locks (and the nested loops made the algorithm essentially order-n-squared, for the programmers amongst you). This was not good, and so...

In 2000, a guy called Steve Lindell (who's still on the SQL team) rewrote DBCC CHECKDB to get a consistent view of the database using transaction log analysis. Essentially DBCC CHECKDB would prevent log truncation and then at the end of reading through the inconsistent (because of concurrent user transactions) database, run crash recovery on the transaction log inside itself. Basically, there was a brand new reproduction of the recovery code, but inside DBCC CHECKDB. I helped write a bunch of the log analysis code - tortuous, but fun. No, more tortuous. And there were some little problems with it - like the possibility of false failures... "if it gave errors, run it again and see if you get the same errors". Occasionally it would take table SCH_S locks (schema-stability locks) that would only block table scans and table schema modifications. The logging code was overall not good, and so...

In 2005, a guy called Paul Randal rewrote DBCC CHECKDB again to use database snapshots to get the consistent view of the database (as a database snapshot automatically provides a transactionally-consistent, point-in-time view of the database). No more nasty transaction log analysis code, not more locks *at all* - as accesses to the source database of a database snapshot never take locks - the buffer pool manages all the possibilities of race conditions.

You can read more on the internals of this stuff (both 2000 and 2005+) in the following posts:

Now, in all versions, if you use the WITH TABLOCK option, DBCC CHECKDB will take locks to guarantee a transactionally consistent view, but I don't recommend doing that. The first thing it will try to do is grab an exclusive database lock, which in the vast majority of cases will fail (it only waits 20 seconds) because of concurrent database connections.

In 2000, the fact that it prevented log truncation could cause some issues - like the log having to grow - and in 2005, there can be issues around the use of database snapshots (see links above).

But by default, DBCC CHECKDB has been blocking-free since SQL Server 2000.

The very worst piece of advice I ever saw on the Internet was in response to someone asking on a SQL newsgroup 'how can I create a corrupt database?'. The first response was:

When I want to corrupt a database to play with, I go into the data center, find a hard-drive and flick the power switch on-and-off quickly a few times.

This was closely followed by a bunch of replies (including mine) saying 'Noooooooooooo!!!!'

So, for a few years now I've provided a zip file with a bunch of pre-corrupted databases in - so you can test your consistency checking jobs, and experiment with corrupt databases without having to create them yourself by destroying hard-drives, or less daft means. You can find the main zip file at the top of our Past Events resources page, along with a link to a blog post which explains the various databases and demo scripts.

Several times I've been asked to provide SQL Server 2008 versions of the two databases which demo system table corruption. Well, that task has finally bubbled to my long-list of blog posts and website updates. There's now a second, smaller zip file which has 2008-only versions of the DemoFatalCorruption1 and DemoFatalCorruption2 databases, which showcase corruptions that prevent DBCC CHECKDB from running.

Let me know if you have any problems (playing with the backups Wink)

Enjoy!

Back in September I blogged about an old 2005 bug that prevented DBCC CHECKFILEGROUP checking the partitions of an object on the specified filegroup unless *all* partitions of the object are on the specified filegroup (not a smart way to set things up!). The bug was fixed ages ago in 2005 but has only just been fixed in 2008. The post which explains the bug in more detail is DBCC CHECKFILEGROUP bug on SQL Server 2008. This is an important feature to be able to split the consistency checks of a partitioned VLDB over a series of days - see CHECKDB From Every Angle: Consistency Checking Options for a VLDB for more details.

SQL Server 2008 SP1 Cumulative Update 6 (which you can get here) has the bug fix for 2008 finally. The KB article which describes the bug is 975991.

You're all running regular consistency checks, right?

A couple of years ago I blogged about 3rd-party file system filter drivers and how if they're not coded correctly to cope with NTFS alternate streams they will cause all kinds of weird corruption errors to be reported when DBCC CHECKDB is running. At the time I'd only seen the issue a couple of times so thought no more about it until a few weeks ago.

I was settling down to watch a movie with my laptop nearby when I got involved helping fellow-MVP Jonathan Kehayias (blog|twitter). As things progressed it dawned on me that his CHECKDBs were fine using regular database snapshots, but failed using the automatically-created snapshots, which use NTFS alternate streams (see this blog post where I explain about them). The errors produced were as below (borrowed from Jonathan's blog post):

2009-12-10 01:35:44.04 spid75      Error: 17053, Severity: 16, State: 1.
2009-12-10 01:35:44.04 spid75      E:\SQLData\MSSQL.1\DatabaseName.mdf:MSSQL_DBCC16: Operating system error 1784(The supplied user buffer is not valid for the requested operation.) encountered.
2009-12-10 01:35:44.07 spid98      DBCC CHECKDB (DatabaseName) WITH all_errormsgs, no_infomsgs, data_purity executed by Domain\UserName found 0 errors and repaired 0 errors. Elapsed time: 0 hours 0 minutes 9 seconds.
2009-12-10 01:35:44.45 spid98      Error: 926, Severity: 21, State: 6.
2009-12-10 01:35:44.45 spid98      Database 'DatabaseName' cannot be opened. It has been marked SUSPECT by recovery. See the SQL Server errorlog for more information. 

Lo-and-behold, it turned out to be a Diskeeper 10 filter driver that had a bug in it. You can read Jonathan's more in-depth recounting of the story on his blog here.

Diskeeper issued a fix a couple of weeks back, and Microsoft Product Support also put out a blog post with some details. Links are:

Hope this doesn't happen to you!

(First blog post from Poland! We're here to present at the Microsoft Poland Technology Summit.) 

The October edition of TechNet Magazine is available on the web now and has the latest installment of my regular SQL Q&A column.

This month's topics are:

  • Unexpected consistency checks?
  • How much space is each database using in the buffer pool 
  • Difference between SUSPECT and RECOVERY_PENDING 
  • Why database mirroring failure detection isn't instant

Check it out at http://technet.microsoft.com/en-us/magazine/2009.10.sqlqa.aspx.

Here's an interesting bug that surfaced recently, first reported by Bryan Smith on the MSDN disaster recovery/HA forum three weeks ago.

One of the mechanisms I advise for running consistency checks on VLDBs with multiple filegroups is to run successive DBCC CHECKFILEGROUP operations on the filegroups holding the partitions (see CHECKDB From Every Angle: Consistency Checking Options for a VLDB). Prior to SQL Server 2005 SP3, if a table or index is split into multiple partitions, then DBCC CHECKFILEGROUP would skip checking the entire table or index if it was partitioned over multiple filegroups. From SQL Server 2005 SP3 onwards, DBCC CHECKFILEGROUP will validate only the partitions of tables and indexes that reside on the filegroup being checked - rather than skipping the whole table or index - a big improvement.

Now it seems that SQL Server 2008 has a bug where it essentially has regressed back to the old behavior where DBCC CHECKFILEGROUP will skip a table or index if it's not wholely contained on the filegroup being checked.

Here's a script you can use to test this. It creates a partitioned table over multiple filegroups and then runs DBCC CHECKFILEGROUP on the first partition. I'll discuss the results after the script.

CREATE DATABASE DBMaint2008;
GO
USE DBMaint2008;
GO

ALTER DATABASE DBMaint2008 ADD FILEGROUP DataPartition1;
ALTER DATABASE DBMaint2008 ADD FILEGROUP DataPartition2;
ALTER DATABASE DBMaint2008 ADD FILEGROUP DataPartition3;
GO

ALTER DATABASE DBMaint2008 ADD FILE
   (NAME = N'DataPartition1', FILENAME = N'C:\SQLskills\DataPartition1.ndf',
   SIZE = 10, FILEGROWTH = 10)
TO FILEGROUP DataPartition1;
GO
ALTER DATABASE DBMaint2008 ADD FILE
   (NAME = N'DataPartition2', FILENAME = N'C:\SQLskills\DataPartition2.ndf',
   SIZE = 10, FILEGROWTH = 10)
TO FILEGROUP DataPartition2;
GO
ALTER DATABASE DBMaint2008 ADD FILE
   (NAME = N'DataPartition3', FILENAME = N'C:\SQLskills\DataPartition3.ndf',
   SIZE = 10, FILEGROWTH = 10)
TO FILEGROUP DataPartition3;
GO

CREATE PARTITION FUNCTION Partitions_PFN (INT)
AS RANGE RIGHT FOR VALUES (1000, 2000);
GO

CREATE PARTITION SCHEME [Partitions_PS]
AS PARTITION [Partitions_PFN] TO (DataPartition1, DataPartition2, DataPartition3);
GO

CREATE TABLE TestTable (c1 INT IDENTITY, c2 DATETIME DEFAULT GETDATE ());
CREATE UNIQUE CLUSTERED INDEX TestPK ON TestTable (c1) ON Partitions_PS (c1);
GO

SET NOCOUNT ON;
GO
INSERT INTO TestTable DEFAULT VALUES;
GO 3000

DBCC CHECKFILEGROUP (DataPartition1);
GO

On SQL Server 2005 SP3, the output from the final batch contains:

DBCC results for 'TestTable'.
Cannot process rowset ID 72057594038452224 of object "TestTable" (ID 2073058421), index "TestPK" (ID 1), because it resides on filegroup "DataPartition2" (ID 3), which was not checked. 
Cannot process rowset ID 72057594038517760 of object "TestTable" (ID 2073058421), index "TestPK" (ID 1), because it resides on filegroup "DataPartition3" (ID 4), which was not checked. 
There are 999 rows in 3 pages for object "TestTable".

This shows that it processed the 1000 rows in the first partition, but not the other two - as we'd expect. 

On SQL Server 2008, the output for TestTable is limited to:

Cannot process rowset ID 72057594038910976 of object "TestTable" (ID 2105058535), index "TestPK" (ID 1), because it resides on filegroup "DataPartition2" (ID 3), which was not checked.

And that's it - nothing about it processing any rows in partition 1. This shows that DBCC didn't process the first partition as we'd expect - this becomes even more apparent with very large amounts of data, where DBCC CHECKFILEGROUP will just complete almost instantly.

As Bryan says in his post, Microsoft has acknowledged this is a bug and it should hopefully be fixed for 2008 CU5. In the meantime, this is something you should be aware of as your tables may not be being checked properly.

Here's a question that came up recently: if I've upgraded a database from SQL 2000 or before, how can I tell if the data purity checks will be run or not?

As you may know, DBCC CHECKDB in 2005 onwards includes 'data purity' checks. These look for column values where the value is outside the valid range of values for the column's data type. For databases created on SQL 2005 onwards, these checks are always performed by DBCC CHECKDB and cannot be turned off. For databases created on earlier versions, it's a little more complicated.

In SQL Server versions prior to 2005, it was possible to import invalid data values into a database. These invalid values could cause query execution problems, or possibly even wrong results. In 2005, when the import 'holes' were closed, the data purity checks were added to DBCC CHECKDB, but not by default for upgraded databases. Because the possibility existed of upgraded database containing invalid values, the decision was made not to enable the data purity checks by default as this could lead people to suspect that the upgrade had caused corruptions that weren't there on 2000 or before.

So, if you have a database that was upgraded and you want to run the data purity checks, you need to use the WITH DATA_PURITY option for DBCC CHECKDB. This is all documented in Books Online and I put it into the Release Notes for SQL 2005 as well. Back to the question though - at what point do you NOT need to specify the option? Well, for an upgraded database, if the WITH DATA_PURITY option is used and no problems are found, a bit is irrevocably flipped in the boot page (see Search Engine Q&A #20: Boot pages, and boot page corruption) and from that point onwards the data purity checks will be performed on the database whenever DBCC CHECKDB runs.

The problem is though, how can you tell whether the bit has been flipped? You need to look at the boot page. The easiest way to do that is to use the DBCC DBINFO command (undocumented, but perfectly safe). It's the equivalent of using DBCC PAGE ('dbname', 1, 9, 3) to look at the boot page contents, as I explained in the blog post referenced in the previous paragraph.

There are two things you need to look for: what version of SQL Server created the database, and it the 'create version' is 2000 or lower, whether the special 'data purity' flag is set or not.

For a database created on SQL Server 2005:

DBCC TRACEON (3604);
GO
DBCC DBINFO ('master');
GO

DBINFO STRUCTURE:

DBINFO @0x66C8EF64

dbi_dbid = 1                         dbi_status = 65544                   dbi_nextid = 1984726123
dbi_dbname = master                  dbi_maxDbTimestamp = 16000           dbi_version = 611
dbi_createVersion = 611              dbi_ESVersion = 0                   
dbi_nextseqnum = 1900-01-01 00:00:00.000                                  dbi_crdate = 1900-01-01 00:00:00.000
dbi_filegeneration = 0              
dbi_checkptLSN

m_fSeqNo = 7612                      m_blockOffset = 224                  m_slotId = 1
dbi_RebuildLogs = 0                  dbi_dbccFlags = 0                   
dbi_dbccLastKnownGood = 2009-05-12 16:07:15.647                          
dbi_dbbackupLSN

<snip>

If the dbi_createVersion is 611 or higher, the database was created on SQL Server 2005+ and will always have data purity checks performed.

For an upgraded database (this is one of my pre-corrupted databases, you can get it from Conference corruption demo scripts and example corrupt databases):

DBCC TRACEON (3604);
GO
DBCC DBINFO ('DemoCorruptMetadata');
GO

DBINFO STRUCTURE:

DBINFO @0x6855EF64

dbi_dbid = 7                         dbi_status = 16                      dbi_nextid = 2089058478
dbi_dbname = DemoCorruptMetadata     dbi_maxDbTimestamp = 100             dbi_version = 611
dbi_createVersion = 539              dbi_ESVersion = 0                   
dbi_nextseqnum = 1900-01-01 00:00:00.000                                  dbi_crdate = 2009-06-17 15:14:49.490
dbi_filegeneration = 0              
dbi_checkptLSN

m_fSeqNo = 10                        m_blockOffset = 303                  m_slotId = 1
dbi_RebuildLogs = 0                  dbi_dbccFlags = 0                   
dbi_dbccLastKnownGood = 1900-01-01 00:00:00.000                          
dbi_dbbackupLSN

<snip>

This database has a dbi_createVersion lower than 611, so we need to look at the dbi_dbccFlags field. A value of 0 means that the data purity checks are not enabled by default. A value of 2 means they are enabled by default. You can easily check this out for your own databases.

Have fun!

A long time ago, in a blog post far, far away (well before I went offline in July) I kicked off a weekly survey about how often you run consistency checks (see here for the survey). Now I'm back online again, and so here are the results as of 8/3/09.

 

The results are actually surprising - I didn't expect so many people to be running consistency checks so frequently - 25% daily and another 37% weekly - very cool!

The number of people who don't know what consistency checks are may look a little depressing but I think it's probably a symptom of the number of people coming into the SQL world as involuntary DBAs. For those people, consistency checks are a way of proactively checking for database corruption, which is nearly always caused by the I/O subsystem. You can read a good introduction to consistency checking and other database maintenance topics in the article I wrote for TechNet Magazine in August 2008 - Top Tips for Effective Database Maintenance and bit more in the previous survey Importance of how you run consistency checks.

Basically you need to run regular consistency checks. There's a myth that you don't need to run consistency checks - this was perpetuated by various marketing folks when SQL 7.0 shipped, because SQL 6.5 used to cause allocation corruptions and the rewrite for 7.0 removed all the corruption problems. Now, of course there have been bugs in SQL Server that cause problems, but they account for a tiny fraction of the corruptions out there. Nearly all corruptions are caused by something going wrong in the I/O subsystem - and you can't predict when that will or won't happen. Jim Gray once likened the disk heads in a hard drive as akin to a 747 flying at 500mph 1/4 inch above the ground - scary stuff. You DO need to run consistency checks, because corruptions do happen. You can read more about the causes of corruptions in this blog post: Search Engine Q&A #26: Myths around causing corruption.

So - how often should you run consistency checks? Well, it depends. When I'm teaching I like to give two examples:

  • You have a dodgy I/O subsystem that is causing corruptions. You have no backups. You have a zero data-loss requirement. With no backups, your only way to get rid of corruptions is to run repair, which usually leads to data loss (see Misconceptions around database repair). In that case, you want to know about corruption as soon as possible to limit the amount of data loss from running repair. Contrived example for sure, but you'd be surprised what I've seen...
  • You have a rock-solid I/O subsystem, with all drivers and firmware up-to-date. You have a comprehensive backup strategy which you've extensively tested and you've confident you can recover from corruptions with zero-to-minimal downtime and data loss. In this case you may be comfortable running consistency checks once a week, say.

The overwhelming factor in how often to run consistency checks is you. How comfortable are you with the integrity of your I/O subsystem and your ability to recover from a corruption problem. If you have corruptions in your database today, you'll probably run DBCC CHECKDB on it every day for a month, right?!?

Part of any database maintenance of high-availability strategy is proactively making sure that corruption doesn't exist in the database - otherwise when you DO discover it, it may be more pervasive and it will take you longer to recover, and potentially with more downtime. Therefore the answer really is that you should run it as often as you can. Sometimes that can be difficult for very large databases (your definition is likely to be different than mine - I think 500GB and larger - depends on your hardware etc) as CHECKDB can take a long time to run (see CHECKDB From Every Angle: How long will CHECKDB take to run?), but there are ways you can effectively consistency check even a VLDB - see CHECKDB From Every Angle: Consistency Checking Options for a VLDB.

So - I'm very pleased to see so many people running regular consistency checks. However, maybe I'm just a pessimist but I did expect some of the less than optimal options to have higher numbers. Let's look at each in turn.

Never. The only time this is acceptable is if you don't know what consistency checks are and why you need them. Enough said. Even if you think you *can't* run consistency checks because it takes too many resources/time/etc, take a look at the blog post I mention above. There's always a way.

Only when corruption is detected some other way. By the time corruption has occurred and is detected through regular operations, it's likely to be more pervasive. Many databases do not see all their data read/updated as part of regular operations, which means that if a part of the database that's not used for a while gets corrupted, and you're not running consistency checks, you're not going to know. That means you're not going to know that your I/O subsystem is causing problems - and so more corruption will occur. You might think this isn't a big deal, but it can be depending on what part of the database (or maybe system databases) gets corrupted. Some things can be restored or repaired relatively simply - but what if, say, the boot page of master is corrupted? Your entire instance is down until you sort that out. It's always better to proactively discover corruption before it hits you in a way that really messes you up.

Only during an event like an upgrade or migration. In this category I see people running consistency checks AFTER doing an upgrade but not before. From release to release, DBCC CHECKDB has improved - especially from 2000 to 2005, where system catalog consistency checks were added. If you run consistency checks after an upgrade and find the database is corrupt, what do you do? Run repair? Go back to the earlier version and run repair? Restore your backups? The odds are that if you've got corruption, it's also in your backups too. Also, you may think that the upgrade itself caused the corruption - I've never seen this. It's always been that the corruption was there before and only got discovered after the upgrade. Always run consistency checks BEFORE doing an upgrade - to make sure you've got a chance of putting things right before (potentially irrevocably) moving to the newer version.

Only after performing a restore, or after a failover. This is, of course, a good practice, but shouldn't be the only time a consistency check is performed - for most of the same reasons as I've explained above. What do you do if the database is found to be corrupt after restoring backups? You're looking at data loss now.

To summarize: make sure you're running regular consistency checks - with the regularity in-line with your comfort zone and your maintenance/high-availability strategies.

Next up - the next survey!

In this week's survey I'd like to know how often you run consistency checks on your *most critical* production database, regardless of *how* you run them (we did that survery already - see Importance of how you run consistency checks). I'll report on the results around July 4th.

I'd only like you to answer for your *most critical* production database, as the frequency will probably vary wildly by database, server, production vs. dev/QA and so on. If everyone answers for their most critical database then we won't get skewed results.

*Please* no comments on this post - wait for the survey results post to avoid skewing the answers. I'm very interested in your reasoning, but not until everyone else responds.

As always, a big Thanks! for contributing to the blog/community by responding. Please shoot me an email (Contact button, bottom left of the blog), or ping me on Twitter (@PaulRandal) if you have an idea for a good survey.

PS Thanks to Pat Wright for suggesting this week's topic on Twitter.

There are a couple of issues that I've heard of in the last few weeks (one while onsite at a customer) and I think they might bite some people so I'd like to share them with you.

DBCC CHECKDB in 2005 onwards uses a hidden database snapshot to create the transactionally-consistent point-in-time view of the database that it requires to run the consistency checks. The hidden database snapshot is created as a set of NTFS alternate streams on the existing database data files. The alternative to having DBCC CHECKDB do this automatically is to manually create your own database snapshot and run DBCC CHECKDB against that - it's the same thing really.

More info on DBCC CHECKDB's use of snapshots, and potential problems can be found at:

The two issues that I've heard of both are around an inability of DBCC CHECKDB to create the hidden snapshot. In that case it is forced to use locks to stabilize the database, which usually fails because the exclusive database lock required for running the allocation checks portion cannot be acquired.

The first issue is around the permissions of the SQL Server service account. To be able to create the NTFS alternate streams, the service account must have the privileges to create files in the DATA directory of the SQL Server instance. This is a really difficult problem to track down as the actual NTFS failure message is not surfaced by the snapshot creation code.

The second issue is around the use of HP PolyServe. Upgrading to Matrix Server 3.6.1 disables support for alternate streams in the filesystem, effectively breaking DBCC CHECKDB. Here's the paragraph from the 3.6.1 upgrade guide (available here):

In previous releases, MxDB for SQL Server provided ADS support internally for use with various SQL Server features such as the DBCC CHECKDB command. This internal support has been removed in HP PolyServe Software for Microsoft SQL Server. Instead, after all servers are upgraded to 3.6.1, you will need to enable ADS support on all filesystems previously used with MxDB for SQL Server. During the upgrade to 3.6.1, SQL Server operations requiring ADS will fail, as the new ADS support feature is not yet in place on the nodes running 3.6.1. For continuity of SQL Server operations, it is important to upgrade all nodes to 3.6.1 and upgrade filesystems for ADS as quickly as possible.

Enabling support after the upgrade means running the PolyServe psfscheck command (which I believe just runs the NTFS fsutil command under the covers), which unfortunately means taking the volume momentarily offline.

Hope this helps!

This week's been a busy one on the forums and Twitter, with lots of interesting problems people are hitting. One of the things I've noticed is that there are lot's of misconceptions about running repair, so to round out Friday I'm going to run through a list of them for you. Here are the misconceptions, some of which I've had to argue several times with people and eventually resort to 'Look, I wrote the repair code, I'm sorry but you're wrong', which I hate doing:

  • Repair will not cause data loss. It depends. If you have to use REPAIR_ALLOW_DATA_LOSS, then you're going to lose data. That's why the option is named that - seriously.
  • Repair should be run as the default. No. Figure out what's wrong first before deciding what to do about it. If you've got a damaged 1TB clustered index, it's going to get rebuilt by repair. If you don't have an extra 1TB of disk space, it will fail, and then you're back to square one after hours of fruitless effort. You might be able to get away with doing something that doesn't involve taking the database (essentially) offline.
  • You can run repair without running DBCC CHECKDB. No. Repair is an option to one of the consistency-checking commands (DBCC CHECKALLOC, DBCC CHECKTABLE, or DBCC CHECKDB - note that DBCC CHECKFILEGROUP and DBCC CHECKCATALOG don't support repair).
  • As soon as you've run repair, everything's fine. No. You should always run DBCC CHECKDB a second time after running repair, to make sure the first repair fixed everything. Sometimes a corruption prevents some deeper checks being done, and when it's fixed, the next DBCC CHECKDB can run the deeper check and find more corruptions. I call this 'corruption masking'. The other reason is that repair probably just deleted some of your data. What effect is that going to have on the application? What if that's medical patient data? Or insurance records? Or back account details?
  • Repair can always fix everything. No. There are some things that DBCC CHECKDB cannot fix. See CHECKDB From Every Angle: Can CHECKDB repair everything? for the list.
  • Repair is safe to use on system databases. No. It cannot be used on master or tempdb because they cannot be put into single-user mode. You can do it on model, but it's unlikely to have an effect as there aren't any user tables in model (unless you create them) and system tables generally don't get repaired. You can run it on msdb, but it might have strange side-effects. See Is running repair on msdb safe?.
  • You can run repairs online. No. Repairs are always offline, in that the database must be in single-user mode.
  • REPAIR_REBUILD will fix everything. No. REPAIR_REBUILD only fixes problem in nonclustered indexes. In 2005 onwards, REPAIR_FAST does nothing at all.
  • Repairs on a repl Publisher propagate to the Subscribers. No. Anything done by repair is NOT marked for replication. You must reinitialize your Subscribers if you repair a Publisher.
  • Repairs always fix-up constraints. No. It has no idea that constraints exist. After repairing a database with constraints, you should run DBCC CHECKCONSTRAINT to make sure they're still valid.
  • Repairs try to save data. No. It doesn't go out of it's way to delete data, but it doesn't go out of it's way to save data in most cases. 'Delete what's broken and fix up all the links' is my sound-bite explanation of what repair does. Fix things as fast as possible and as provably correct as possible.
  • EMERGENCY mode repair will always work. No. I've seen cases where something broken in the file-system caused it to fail. Don't rely on repair.
  • You can undo repairs. It depends. If you started an explicit transaction, then you can roll everything back if you don't like what it did. People rarely do this though. EMERGENCY mode repair can never be rolled back.

Repair's a dangerous beast and should only be used as a last resort, or to bring a VVVLDB online again potentially much faster than a full restore, when a small amount of data loss can be tolerated. This is just a quick brain-dump of things people get wrong about repair.

Have a great weekend - tomrorow I'll report on last week's survey and kick off a new one. Cheers!

Hopefully all of you reading my blog already know to use the WITH ALL_ERRORMSGS option whenever you run DBCC CHECKDB (which is now the default in 2005 SP3, haven't checked if it made it into 2008 SP1). If you're doing some quick analysis of the results to see, for example, whether it's just nonclustered indexes that are corrupt then you need to see all the errors - as it may look like all the errors are (relatively) benign but there's one error you didn't see that says a clustered index data page is corrupt.

That option is all well and good *except* when you run the DBCC command through SSMS. It will only show the first 1000 errors (long story, but let's just says it involves me being able to create corruption cases during 2005 development that would generate so many errors it would cause SSMS to crash) and there's no way to get around this. So, if you have more than 1000 errors, as in a case I'm helping with today where there are 19000+ errors, you can't use SSMS and expect to get them all back. Problem is, you don't know that you've got that many errors until you run it - so if you used SSMS and find there are more than 1000 errors, AND you want to do some analysis of them, you'll need to go and run it again using a different connection to the server.

So - something I advise is to run DBCC from the command line. Either osql or sqlcmd will do the trick. For instance:

osql -E -Q"DBCC CHECKDB (master) WITH ALL_ERRORMSGS, NO_INFOMSGS" -oC:\outputfile.txt

This will run the DBCC command and store the output in a file for later analysis. You can substitute sqlcmd for osql in the command above if you want.

Recently there's been a spate of people noticing strange behavior from active transactions when DBCC CHECKDB (or any of the other DBCC consistency checking commands run).

For example, I've create a database call DbccTest with a single table. In one connection I do:

BEGIN TRAN
INSERT INTO t1 VALUES (1, 1);
GO

And in another connection I do:

DBCC CHECKDB (DbccTest) WITH ALL_ERRORMSGS, NO_INFOMSGS;
GO
 

Look what gets printed in the error log:

2009-05-04 16:03:21.55 spid54      1 transactions rolled back in database 'DbccTest' (14). This is an informational message only. No user action is required.
2009-05-04 16:03:21.91 spid54      DBCC CHECKDB (DbccTest) WITH all_errormsgs, no_infomsgs executed by ROADRUNNERPR\paul found 0 errors and repaired 0 errors. Elapsed time: 0 hours 0 minutes 0 seconds.

I've highlighted the weird part in bold - it looks like crash recovery ran on the DbccTest database. What's going on?

Well, crash recovery *DID* run on the DbccTest database - only it ran crash recovery into a hidden database snapshot. In the first part of the very long (well, 13 pages - not as long as the 70 page description in the 2008 internals book) post on how DBCC CHECKDB works (see CHECKDB From Every Angle: Complete description of all CHECKDB stages), I explain why DBCC CHECKDB needs a transactionally-consistent, point-in-time view of the database to run consistency checks on. In 2005 I changed the code to use a database snapshot - which by it's very nature provides a point-in-time, transactionally-consistent view of the database. When a database snapshot is created on a database, the transaction log of that database is examined, and crash-recovery is run on it, but *into the database snapshot* - the source database is totally unaffected. The message in the error log is from DBCC CHECKDB's hidden database snapshot starting up - but it's pretty misleading, I must admit.

One more twist to this behavior is that if there is a very long-running transaction in the database being consistency-checked, the database snapshot creation could take hours (because it has to wait for the long-running transaction to rollback, in the snapshot). If you think that the consistency check has hung, and try to kill it, you won't be able to and the SPID will appear as if it's in rollback. This is confusing because DBCC CHECKB doesn't do anything - so shouldn't have anything to rollback - but it's the database snapshot creation that's the problem. Once crash recovery has started on the database snapshot, it can't be interrupted - so you have to wait for it to finish (and the database snapshot to be created) before the DBCC command will actually stop, and remove the database snapshot. It's a weird side-effect, but a necessary one unfortunately. The recovery code could be changed to check for attention signals every so often I suppose, but I'm not holding-my-breath for that change.

Hope this helps.

Last week's survey was on what method you use to run consistency checks (see here for the survey). Here are the results as of 3/27/09 - again, very encouraging:

As you can see, 70% of respondents run DBCC CHECKDB on the production server, either with PHYSICAL_ONLY or without. For those running without it, be aware that using PHYSICAL_ONLY turns DBCC CHECKDB from a CPU-bound process into an I/O-bound process, and makes it run (potentially) magnitudes faster. It will still evaluate page checksums and torn-page protection, just skipping the higher-level logical checks.

I'm surprised to see so many people using a completely separate system to run consistency checks (restoring a backup and running a CHECKDB on the restored backup) - I've been a proponent of this method for a few years now, but I didn't think it had caught on so much (assuming a reasonably representative sample of readers responded to the survey). This method allows the entire consistency checking workload to be offloaded, and completely validates the backups used - but has the downside of requiring extra disk space on another server to restore the backup (I wish they'd build the system I got a patent for - to consistency check the database inside a backup without restoring it - see here).

Three of these choices I threw in to see if anyone was doing them so I could explain why they're not good methods to use.

  • "Don't run any consistency checks at all". I'm sure I don't need to labor the point with this one - you need to run consistency checks as well as having some kind of page protection turned on. If you don't proactively check for corruption, when it does occur it will likely be more widespread, and harder to recover within your data-loss and recovery time objectives than if you'd discovered it earlier. I've written lots about this in the Corruption and CHECKDB From Every Angle categories.
  • "Run DBCC CHECKDB on a database snapshot on a mirror database". I was asked about this several times while here at SQL Connections too. Database mirroring works by shipping transaction log records between the principal and mirror databases, NOT by shipping database pages - so if a page gets corrupted on the principal, the corruption will not be transferred to the mirror. This means that DBCC CHECKDB on a snapshot of the mirror does not reflect the consistency state of the principal database at all. Saying that, it is however, possible for corruption in the principal to affect the mirror. Imagine a column value is corrupted on disk by the principal's I/O subsystem. If that value is then read and used to calculate another value, which is then persisted - that calculated value is also 'corrupt' and will be reflected in the mirror database. Kind of an insidious, second-order corruption effect.
  • "Use BACKUP WITH CHECKSUM to validate page checksums, no DBCCs". Another option that sounds feasible, but in fact isn't for a couple of reasons. Firstly, BACKUP WITH CHECKSUM will stop when it finds a bad checksum, whereas DBCC CHECKDB will continue reading the rest of the database and tell you everything that's wrong with it. Secondly, BACKUP WITH CHECKSUM will only check those pages that HAVE page checkums, whereas DBCC CHECKDB will consistency check everything, regardless of whether a page has a checksum or not. For a database upgraded from an earlier version, where not every page will have a page checksum since you enabled them after upgrading, this is a critical point.

To summarize, this week's results were great, with almost 90% of respondents running some kind of consistency check. If you're not running any, for whatever reason, rethink your decision - there's always a way to run some kind of consistency checks and give yourself more peace of mind. See CHECKDB From Every Angle: Consistency Checking Options for a VLDB for more info.

Next post - this week's survey!

Just saw this on a forum - running REPAIR_ALLOW_DATA_LOSS on msdb to cope with corruption. Yes, this will work but it's certainly not safe and something I'd only recommend as a last resort.

Here's why: msdb stores your backup history and your Agent jobs. REPAIR_ALLOW_DATA_LOSS works by usually "delete what's broken and fix up all the links" (quoting myself). If you run repair on msdb, you need to pay close attention to what was deleted by repair, as it could lead to big problems later on. Some examples:

  1. What if repair deletes a record which just happened to store the Agent job that runs the transaction log backups for your main production database? Suddenly your log isn't being backed up and you don't know about it. The log starts to grow and eventually runs out of space. The database stops and your application is down until you figure out what's wrong.
  2. What if repair deletes a record which just happened to store the details of a log backup of the production database? Your have a disaster and run the script that looks through the backup history tables and auto-generates RESTORE statements to get you up and running again with up-to-the minute recovery. Because of the missing record, there's a missing RESTORE LOG statement in the middle of the restores of the log-backup-chain for the production database. So the restores fail, and you're down until you figure out what's wrong.

Bottom line, it's not safe to run repair on msdb - proceed with extreme caution. Not that repair won't work, or will cause corruption, but that it's far better to have backups of msdb that you can restore from.

A thread cropped up on SQLServerCentral involving IAM chain corruption (see Inside the Storage Engine: IAM pages, IAM chains, and allocation units for details of IAM chains). The error from DBCC CHECKDB was:

Server: Msg 2576, Level 16, State 1, Line 1
IAM page (0:0) is pointed to by the previous pointer of IAM page (1:394336) object ID 229575856 index ID 4 but was not detected in the scan.

and there was some discussion of what the error meant, and why the initial page ID in the error of (0:0) means something special. There was a further question of how the errors would differ if the IAM page header was partially zero'd out by an I/O subsystem error.

We're on-stage here at SQL Connections doing a pre-con and I'm not on until this afternoon so I can bang out a quick blog post! I'm going to create a small database and show the difference between the two cases. The error above was from a SQL 2000 database, but the behavior is the same on SQL 2005 and 2008. Here's the script to create the database.

CREATE DATABASE CorruptIAMEXample;
GO
USE CorruptIAMExample;
GO

CREATE TABLE test (c1 INT IDENTITY, c2 CHAR (8000) DEFAULT 'a');
GO
CREATE CLUSTERED INDEX test_cl on test (c1);
GO

SET NOCOUNT ON;
GO
INSERT INTO test DEFAULT VALUES;
GO 1000

After corrupting the database, I can reproduce the error above by running DBCC CHECKDB on it:

Msg 2576, Level 16, State 1, Line 1
The Index Allocation Map (IAM) page (0:0) is pointed to by the previous pointer of IAM page (1:153) in object ID 0, index ID -1, partition ID 0, alloc unit ID 72057594042384384 (type Unknown), but it was not detected in the scan.

The error is saying that the first IAM page in the IAM chain (page 1:153) does not have a reference from metadata. The sysallocunits system table contains a link to the first IAM page, the root page, and the first page. You can see these by querying the sys.system_internals_allocation_units catalog view, or you can see this blog post  - Inside The Storage Engine: sp_AllocationMetadata - putting undocumented system catalog views to work. I corrupted the sysallocunits table so that the link to the first IAM page of the test table was removed. The clue is the first page ID in the error - if it's a (0:0), that's the missing metadata case. For SQL 2000, this can happen if someone manually updates the sysindexes table.

Now what about the other case, where the IAM page itself has a corrupt header? I recreated the database again and corrupted the header of the first IAM page of the test table. Here's the output from DBCC CHECKDB:

Msg 2575, Level 16, State 1, Line 1
The Index Allocation Map (IAM) page (1:153) is pointed to by the next pointer of IAM page (0:0) in object ID 2073058421, index ID 1, partition ID 72057594038386688, alloc unit ID 72057594042384384 (type In-row data), but it was not detected in the scan.
Msg 7965, Level 16, State 2, Line 1
Table error: Could not check object ID 2073058421, index ID 1, partition ID 72057594038386688, alloc unit ID 72057594042384384 (type In-row data) due to invalid allocation (IAM) page(s).
Msg 8928, Level 16, State 6, Line 1
Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown): Page (1:153) could not be processed. See other errors for details.

and a whole bunch of

Msg 8905, Level 16, State 1, Line 1
Extent (1:216) in database ID 21 is marked allocated in the GAM, but no SGAM or IAM has allocated it.

errors. This is because the IAM page no longer looks like an IAM page and so DBCC CHECKDB can't process it as such. If I zero out the IAM page completely, DBCC CHECKDB returns basically the same errors as above. 

I've created a zipped backup of the SQL 2005 database in each case:

You'll need to do a RESTORE FILELISTONLY and then maybe move the files when you restore. Have fun!

 

Microsoft SQL Server 2008 Internals finally starts to roll off the presses at Microsoft Press today! For me this is the first time in print (in book form at least) and my consistency checking chapter (about 1/10th of the book) represents a complete brain-dump of everything I know about how DBCC CHECKDB works.

Kimberly also wrote a great chapter on index internals with some nice examples of how and why they're linked together the way they are. And of course the rest of the book is excellent too, mostly by Kalen but with additional guest chapters on the query optimizer by one of its architects, Conor Cunningham, and on tracing and auditing by fellow-MVP Adam Machanic - all good friends of ours.

Check it out on Amazon.com here.

The April edition of TechNet Magazine is available on the web now and has the latest installment of my regular SQL Q&A column.

This month's topics are:

  • Disappearing errors with DBCC CHECKDB
  • Provisioning tempdb when moving from 2000 to 2008
  • Does fillfactor prevent fragmentation and should it be set instance-wide
  • Avoiding FILESTREAM performance problems

Check it out at http://technet.microsoft.com/en-us/magazine/2009.04.sqlqa.aspx

As you may know, DBCC CHECKDB (and some of the other DBCC CHECK*) commands use an internal database snapshot to get a transactionally consistent view of the database (if you didn't, see my blog post CHECKDB From Every Angle: Complete description of all CHECKDB stages for an explanation). It's entirely possible that the database snapshot may grow and run out of space, causing the DBCC command to fail, when the database has a lot of updates running concurrently with the DBCC command (see the blog post Database snapshots - when things go wrong for some examples of this happening). There have also been some nasty bugs in SQL Server and in Windows that can cause the database snapshot to fail, and possibly cause SQL Server to hang when DBCC CHECKDB is running multi-threaded.

These bugs have been fixed and are available in a variety of ways. Suresh Kandoth, an old friend and Senior Escalation Engineer in PSS, has published a nice post that summarizes the various bugs, along with associated KB articles and release vehicle details. Check it out at Sparse File Errors: 1450 or 665 due to file fragmentation: Fixes and Workarounds.

In SQL Server 2000 and before, the symptoms of database corruption would occasionally manifest themselves as asserts, such as:

SQL Server Assertion: File: <recbase.cpp>, line=1378 Failed Assertion = 'm_offBeginVar < m_sizeRec'.

To reduce the number of assertions being fired by the SQL Engine, my team changed these asserts into real error messages for SQL Server 2005 onwards - either 5242 or 5243 - such as:

Msg 5242, Level 22, State 1, Line 1

An inconsistency was detected during an internal operation in database 'MyCorruptDatabase'(ID:12) on page (1:34923). Please contact technical support. Reference number 4.

Error 5243 is exactly the same except that the database couldn't be determined for some reason. Both of the errors above say that the offset of the variable length column offset array is beyond the end of the record.

I've noticed increasing confusion from these errors being reported - sometimes the worry is that maintenance jobs are causing them, or DBCC CHECKDB is causing them. This may appear to be so but is really just an artifact of the fact that DBCC CHECKDB reads all allocated pages in the database and may read a corrupt page that your normal queries just don't happen to cause to be read. The messages may disappear because other maintenance jobs cause indexes to be rebuilt and so corrupt pages may be deallocated from the database - meaning they won't be read by DBCC CHECKDB.

If you have either torn-page detection or page checksums enabled, you may not ever see these 5242 or 5243 errors as you'll likely see an 824 error instead. The 824 error is raised by the buffer pool when the page is read - before the page can be processed by the record-cracking code that would raise the 5242 or 5243 errors. Bottom line is that 5242 and 5243 says you've got corruption in the structure of a record - the reference numbers at the end of error say exactly what kind of corruption there is - for instance that the record size is invalid or the record type is wrong for the type of page it resides. If you see these errors, you need to go through the motions of recovering from corruption and figuring out what's wrong with your I/O subsystem.

Hope this helps.

This blog post describes the demo "2 - NC Indexes" from my Corruption Survival Techniques conference session from various conferences in 2008. The links to the scripts and databases to use are in this blog post.

The aim of this demo is to show that sometimes its just redundant data (i.e. nonclustered indexes) that get corrupted, and so you don't have to do anything that takes the actual data offline - like restoring from a full backup or running one of the repair options (both of which require the database to be in single-user mode).

Let's look at an example. Extract and restore the DemoNCIndex database, and the NCIndexCorruption.sql script. What do we get from running DBCC CHECKDB on the DemoNCIndex database (lines 39-42)?

DBCC CHECKDB (DemoNCIndex) WITH NO_INFOMSGS, ALL_ERRORMSGS;
GO

Msg 8951, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Data row does not have a matching index row in the index 'CustomerName' (ID 2). Possible missing or invalid keys for the index row matching:
Msg 8955, Level 16, State 1, Line 1
Data row (1:45:28) identified by (CustomerID = 29) with index values 'LastName = 'Adams' and CustomerID = 29'.
Msg 8951, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Data row does not have a matching index row in the index 'CustomerName' (ID 2). Possible missing or invalid keys for the index row matching:
Msg 8955, Level 16, State 1, Line 1
Data row (1:180:164) identified by (CustomerID = 2118) with index values 'LastName = 'Adams' and CustomerID = 2118'.

<snip - removed for brevity>

Msg 8952, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Index row in index 'CustomerName' (ID 2) does not match any data row. Possible extra or invalid keys for:
Msg 8956, Level 16, State 1, Line 1
Index row (1:24482:16) with values (LastName = 'Andersen' and CustomerID = 18718) pointing to the data row identified by (CustomerID = 18718).
Msg 8952, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Index row in index 'CustomerName' (ID 2) does not match any data row. Possible extra or invalid keys for:
Msg 8956, Level 16, State 1, Line 1
Index row (1:24482:127) with values (LastName = 'Arthur' and CustomerID = 9758) pointing to the data row identified by (CustomerID = 9758).
CHECKDB found 0 allocation errors and 26 consistency errors in table 'Customers' (object ID 453576654).
CHECKDB found 0 allocation errors and 26 consistency errors in database 'DemoNCIndex'
repair_rebuild is the minimum repair level for the errors found by DBCC CHECKDB (DemoNCIndex).
 

Lots of errors. Now, in this example there are only 26 errors, but in cases where there are hundreds of errors it can be really hard to tell whether all the corruptions are in nonclustered indexes (i.e. indexes with IDs > 1). Luckily, there's an undocumented option to all the DBCC CHECK* commands - WITH TABLERESULTS. The option is undocumented because the output can change from release to release, but basically this converts the DBCC output into tabular form. Try running lines 48-50 in the script and you'll see what I mean. One of the columns in the output is IndexId - so you can easily see whether all the errors are in nonclustered indexes. In this case, they are, and all in one index of the Customers table.

You could run lines 55-57 of the script to prove to yourself that repairs can't be run online, and then realize that we can address the problem without having to run repair or restore. First off we need to figure out the name of the index to fix - index ID 2 of the Customers table. Lines 77-80 run sp_HelpIndex on the table (although I should really be using Kimberly's sp_HelpIndex2):

USE DemoNCIndex
GO
EXEC sp_HelpIndex 'Customers';
GO

index_name     index_description                                   index_keys
-------------- --------------------------------------------------- ------------
CustomerName   nonclustered located on PRIMARY                     LastName
CustomerPK     clustered, unique, primary key located on PRIMARY   CustomerID

The nonclustered index is called CustomerName. Plug in the index name to line 82, then try fixing the index by doing an online index rebuild, and run DBCC CHECKDB afterwards (lines 82-89). The corruption hasn't been fixed! Online index rebuild reads the old index to build the new one so the new index has the same missing rows as the old one. We need to do an offline index rebuild - with lines 110-115. After the last DBCC CHECKDB, the index is fixed up. Now, on SQL Server 2008, you may or may not get a query plan for the index rebuild that doesn't use the old index, because the query optimizer has some more plan choices available to it - so on SQL Server 2008 you may need to do an actual drop and create of the broken index (carefully, if its enforcing a constraint).

So - just because DBCC CHECKDB reports a ton of errors, that doesn't necessarily mean that the database needs to be taken (essentially) offline to repair it - check through the errors to see if it's just nonclustered indexes that are affected.

For those of you who couldn't make it to a conference this year where I presented my Corruption Survival Techniques session, the folks at TechEd EMEA have just posted an 80 minute long video of the presentation I did in Barcelona in early November. It walks through I/O errors, what CHECKDB does, how it works, how to run it, CHECKDB FAQ, how to interpret the output, choosing between repair and restore and has a bunch of demos of recovering from corruptions. Lots of fun stuff!

The video is available at http://www.msteched.com/2008/NorthAmerica/DAT375. The accompanying scripts and corrupt databases are all posted on our website - see this blog post for details.

Enjoy!

This blog post describes the demo "1 - Fatal Errors" from my Corruption Survival Techniques conference session from various conferences in 2008. The links to the scripts and databases to use are in this blog post.

The aim of this demo is to show that sometimes a database is so corrupt that DBCC CHECKDB just cannot run on it. In that case, there's no way to force DBCC CHECKDB to get past the fatal corruption and so there's no way to run a repair either - you're looking at restoring from a backup or at worst, extracting as much data as possible into a new database.

Let's look at a couple of examples. Extract and restore the DemoFatalCorruption1 and DemoFatalCorruption2 databases, and the FatalErrors.sql script. What do we get from running DBCC CHECKDB on DemoFatalCorruption1 (lines 47-49 in the script)?

DBCC CHECKDB (DemoFatalCorruption1)
WITH NO_INFOMSGS, ALL_ERRORMSGS;
GO

Msg 8928, Level 16, State 6, Line 1
Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown): Page (1:71) could not be processed. See other errors for details.
CHECKDB found 1 allocation errors and 0 consistency errors not associated with any single object.
Msg 8906, Level 16, State 1, Line 1
Page (1:19) in database ID 8 is allocated in the SGAM (1:3) and PFS (1:1), but was not allocated in any IAM. PFS flags 'MIXED_EXT ALLOCATED 0_PCT_FULL'.
Msg 2575, Level 16, State 1, Line 1
The Index Allocation Map (IAM) page (1:71) is pointed to by the next pointer of IAM page (0:0) in object ID 15, index ID 1, partition ID 983040, alloc unit ID 983040 (type In-row data), but it was not detected in the scan.
Msg 7965, Level 16, State 2, Line 1
Table error: Could not check object ID 15, index ID 1, partition ID 983040, alloc unit ID 983040 (type In-row data) due to invalid allocation (IAM) page(s).
Msg 8906, Level 16, State 1, Line 1
Page (1:71) in database ID 8 is allocated in the SGAM (1:3) and PFS (1:1), but was not allocated in any IAM. PFS flags 'IAM_PG MIXED_EXT ALLOCATED 0_PCT_FULL'.
Msg 8939, Level 16, State 5, Line 1
Table error: Object ID 15, index ID 1, partition ID 983040, alloc unit ID 983040 (type In-row data), page (1:71). Test (m_headerVersion == HEADER_7_0) failed. Values are 0 and 1.
Msg 8939, Level 16, State 6, Line 1
Table error: Object ID 15, index ID 1, partition ID 983040, alloc unit ID 983040 (type In-row data), page (1:71). Test ((m_type >= DATA_PAGE && m_type <= UNDOFILE_HEADER_PAGE) || (m_type == UNKNOWN_PAGE && level == BASIC_HEADER)) failed. Values are 0 and 0.
Msg 8939, Level 16, State 5, Line 1
Table error: Object ID 15, index ID 1, partition ID 983040, alloc unit ID 983040 (type In-row data), page (1:71). Test (m_headerVersion == HEADER_7_0) failed. Values are 0 and 1.
Msg 8939, Level 16, State 6, Line 1
Table error: Object ID 15, index ID 1, partition ID 983040, alloc unit ID 983040 (type In-row data), page (1:71). Test ((m_type >= DATA_PAGE && m_type <= UNDOFILE_HEADER_PAGE) || (m_type == UNKNOWN_PAGE && level == BASIC_HEADER)) failed. Values are 0 and 0.
CHECKDB found 5 allocation errors and 3 consistency errors in table 'sys.syshobts' (object ID 15).
Msg 7995, Level 16, State 1, Line 1
Database 'DemoFatalCorruption1': consistency errors in system catalogs prevent further DBCC checkdb processing.
CHECKDB found 0 allocation errors and 1 consistency errors in table 'ALLOCATION' (object ID 99).
CHECKDB found 6 allocation errors and 4 consistency errors in database 'DemoFatalCorruption1'.

A bunch of errors that look like regular DBCC CHECKDB output - but if you look carefully near the end of the output you'll see error 7995 stating that the system catalogs are so corrupt that DBCC CHECKDB can't continue. Notice also that there's nothing at the end of the output stating what the minimum repair level is to fix the errors - because repair cannot be run on this database.

The second example is even worse (running lines 53-55 in the script):

DBCC CHECKDB (DemoFatalCorruption2)
WITH NO_INFOMSGS, ALL_ERRORMSGS;
GO

Msg 211, Level 23, State 51, Line 1
Possible schema corruption. Run DBCC CHECKCATALOG.

In this case, the corruption is so bad that DBCC CHECKDB didn't even get a chance to terminate gracefully - the metadata subsystem in the Query Processor just blew away the whole command. Running DBCC CHECKCATALOG as the error message states doesn't do any better - it just prints the same error! (I didn't write that error message Wink)

So - just because DBCC CHECKDB completes, doesn't always mean it completes successfully. Make sure you always check the output.

This blog post explains the demo scripts and databases I've posted to cover all the Corruption Survival Techniques and DBCC CHECKDB sessions I've presented at conferences this year. There are two zip files you need to download: the example corrupt databases (36-MB zip) (and this one if you're on 2008) and the demo scripts. These are also both available through our past events pages.

The databases zip contains the following databases:

  • DemoDataPurity
    • 192-MB SQL Server 2005 database with a single 2570 (data purity) error
  • DemoFatalCorruption1
    • 1-MB SQL Server 2005 database with a corrupt system table (that allows CHECKDB to complete)
  • DemoFatalCorruption2
    • 1-MB SQL Server 2005 database with a corrupt system table (that terminates CHECKDB)
  • DemoNCIndex
    • 192-MB SQL Server 2005 database with a bunch of nonclustered index corruptions
  • DemoRestoreOrRepair
    • 1-MB SQL Server 2005 database with a page checksum failure (in fact a zero'd out page)
  • DemoCorruptMetadata
    • 1-MB SQL Server *2000* database with corrupt syscolumns table

The scripts zip contains the following directories:

  • 1 - Fatal Errors
    • This makes use of the DemoFatalCorruption1 and DemoFatalCorruption2 databases. The FatalErrors.sql script has the steps to follow and see this blog post for a complete walk-through.
  • 2 - NC Indexes
    • This makes use of the DemoNCIndex database. The NCIndexCorruption.sql script has the steps to follow and see this blog post for a complete walk-through.
  • 3 - Data Purity
    • This makes use of the DemoDataPurity database. The DataPurityCorruption.sql script has the steps to follow. I'll blog more details on this script sometime in the next week (and then update this post).
  • 4 - Metadata
    • This makes use of the DemoCorruptMetadata database. The CorruptMetadata.sql script has the steps to follow and see this blog post for a complete walk-through.
  • 5 - Restore or Repair
    • This makes use of the DemoRestoreOrRepair database. There's a setup script (01CreateRestoreOrRepair.sql) and two demo scripts - FixUsingPageRestore.sql and FixUsingRepair.sql. I'll blog more details on these scripts sometime in the next week (and then update this post).
  • 6 - Suspect Database
    • This demo doesn't come with a corrupt database - you create your own one. The SuspectDatabase.sql script has the steps to follow and see this blog post for a complete walk-through.

Enjoy!

(Posted with permission of the dev team)

Here's an interesting bug that people are hitting. I found out about this while here in Barcelona at TechEd when I got roped into a discussion with a couple of Microsoft colleagues. Their customer was seeing errors like the following:

Msg 8914, Level 16, State 1, Line 1
Incorrect PFS free space information for page (1:35244) in object ID 1683128146, index ID 1, partition ID 223091033422352, alloc unit ID 81405523118118176 (type LOB data). Expected value 0_PCT_FULL, actual value 100_PCT_FULL

This error says that the PFS page (see this blog post) has the wrong free-space tracking bits for a text page. In SQL Server 2000, the algorithm to keep track of free space in the PFS pages wasn't perfect so CHECKDB never reported these errors - it would silently fix them. In 2005 we fixed the algorithm (supposedly), so I turned on the reporting of these errors again in CHECKDB.

When I first saw the description from the customer, my first reaction was that it was an I/O subsystem problem causing corruption, but the customer has page checksums turned on, so a corruption would result in an 824 error before an 8914 error. My conclusion then was that there's a bug in the new free-space tracking algorithm. After checking with the dev team, it turns out my suspicions were correct - there is a bug in 2005 SP2. It's fixed in 2005 SP3 and in 2008, but you may see these 8914 errors if you're not running one of those.

Here are the technical details of the problem (slightly edited from the dev team explanation):

The issue was that when minimal logging for LOBs was used (under the SIMPLE recovery model, during BULK INSERT/BCP/Large insert with TABLOCK), extents are being pre-allocated and the pages were being marked 100% full in the PFS page when the pages were allocated. The idea was that all pages eventually will be filled up with LOB data, and by marking them full during allocation we avoid an extra update to the PFS pages when the data is actually put on the page.

Suppose now that 64 pages are pre-allocated, and only 40 or so pages are used and have rows on them. When the transaction commits, the Storage Engine is supposed to deallocate the 24 extra pages that were pre-allocated, and marked 100% full, even though they don’t have any rows on them yet. There was a bug where in a certain case the deallocation wouldn't happen, so you end up with empty pages that have a PFS state of 100%, but don’t have any rows on them.

In general, even though this is a bug in the code, functionality wise, there is nothing wrong with the database, besides a number of additional pages that are empty and allocated to the LOB tree, so as long as the message tells you that the page is supposed to be empty, but is actually marked 100%, nothing can really go wrong with that page. Unfortunately, DBCC will keep reporting these errors.

Although I haven't tried this, my guess is that you can get rid of the empty LOB pages using ALTER INDEX ... REORGANIZE WITH (LOB_COMPACTION = ON). So, if you see some of these errors with no other errors, you may have hit this bug and have nothing really to worry about.

 

Very exciting! The SQL Server 2008 Internals book we're working on with Kalen is now available to pre-order on Amazon, ahead of the planned publication date of February 18th next year.

I'm neck-deep in the Consistency Checking Internals chapter right now - its shaping up to be about 100 pages on its own!

There's no description yet on Amazon but at least you can pre-order - see here.

(Be sure to join our community to get our monthly newsletter with exclusive content, advance notice of classes with discount codes, and other SQL Server goodies!)  

(If you're having to read this because no-one is managing your SQL Server, consider contacting us to manage them for you - see here for details.)

This is one I've been trying to get to since I started blogging a couple of years ago: how to re-attach a detached suspect database. This is a pretty common scenario I see on the forums - a database goes suspect so the DBA tries to detach/attach, which fails. I wrote a demo for my corruption session at TechEd this year that shows how to create a suspect database with hex editor, then detaches it and shows how to re-attach and fix it. It's going to be a long blog post, but I've got everything wrapped up before we fly out to the UK tomorrow so I've got a bit of spare time.

Creating a Suspect Database

First off I'm going to create a simple database to use, called DemoSuspect with a table and some random data.

USE MASTER
GO
CREATE DATABASE DemoSuspect
GO

USE DemoSuspect;
GO

CREATE TABLE Employees (FirstName VARCHAR (20), LastName VARCHAR (20), YearlyBonus INT);
GO
INSERT INTO Employees VALUES ('Paul', 'Randal', 10000);
INSERT INTO Employees VALUES ('Kimberly', 'Tripp', 10000);
GO

Now I'll perform an update in an explicit transaction and force it to be written out to disk with a CHECKPOINT. I've accidentally deleted Kimberly's bonus!

-- Simulate an in-flight transaction
BEGIN TRAN;
UPDATE Employees SET YearlyBonus = 0 WHERE LastName = 'Tripp';
GO

CHECKPOINT;
GO

Then in another window, I'll simulate a crash using:

SHUTDOWN WITH NOWAIT;
GO

Now that SQL Server is shutdown, I'm going to simulate an I/O failure that corrupts the log file. I'm going to use a hex editor to do this - my editor of choice is the immensely popular and useful XVI32, written by Christian Maas. I opened the log file, filled the first section with zeroes, and then saved it again. See the screenshot below.

When I startup SQL Server again, it will try to run recovery on the DemoSuspect database and fail. This will put the database into the SUSPECT state.

So I restarted SQL Server, let's try getting in to the DemoSuspect database.

USE DemoSuspect;
GO

Msg 945, Level 14, State 2, Line 1
Database 'DemoSuspect' cannot be opened due to inaccessible files or insufficient memory or disk space. See the SQL Server errorlog for details.

Now let's check the database status:

SELECT DATABASEPROPERTYEX ('DemoSuspect', 'STATUS') AS 'Status';
GO

Status
--------

SUSPECT

Now at this point, the correct procedure is to restore from backups. If there are no backups available, then the next best thing is to get the database into EMERGENCY mode and extract as much data as possible, or run emergency-mode repair. However, I'm going to try the detach/attach route instead.

Detaching the Database

I'll try detaching the database using sp_detach_db:

EXEC sp_detach_db 'DemoSuspect';
GO

Msg 947, Level 16, State 1, Line 1
Error while closing database 'DemoSuspect'. Check for previous additional errors and retry the operation.

Hmm - did it work or didn't it?

SELECT * FROM sys.databases WHERE NAME = 'DemoSuspect';
GO

And there are no results, so the detach must have succeeded.

Note: from SQL Server 2008 you cannot detach a suspect database unless you jump through a few more hoops (e.g. put the database offline, then copy off the files, then drop the database).

Re-attaching the Database

Let's try the obvious sp_attach_db:

EXEC sp_attach_db @dbname = N'DemoSuspect', 
   
@filename1 = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\DemoSuspect.mdf', 
   
@filename2 = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\DemoSuspect_LOG.ldf';
GO

Msg 5172, Level 16, State 15, Line 1
The header for file 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\DemoSuspect_LOG.ldf' is not a valid database file header. The PageAudit property is incorrect.

Hmm. How about using the ATTACH_REBUILD_LOG option on CREATE DATABASE? That should create a new log file for me:

CREATE DATABASE DemoSuspect ON
   (NAME = DemoSuspect, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\DemoSuspect.mdf')
FOR ATTACH_REBUILD_LOG;
GO

Msg 5172, Level 16, State 15, Line 1
The header for file 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\DemoSuspect_LOG.ldf' is not a valid database file header. The PageAudit property is incorrect.
File activation failure. The physical file name "C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\DemoSuspect_LOG.ldf" may be incorrect.
The log cannot be rebuilt because the database was not cleanly shut down.
Msg 1813, Level 16, State 2, Line 1
Could not open new database 'DemoSuspect'. CREATE DATABASE is aborted.

Hmm. The database knows that there was an active transaction. Using the ATTACH_REBUILD_LOG command only works if the database was cleanly shut down and the log is missing. What about if I actually remove the log file and try that - maybe I could fool it? I took a copy of the data and log file, and then deleted the original log file (this is getting more and more dodgy...). Let's try again:

CREATE DATABASE DemoSuspect ON
   
(NAME = DemoSuspect, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\DemoSuspect.mdf')
FOR ATTACH_REBUILD_LOG;
GO

File activation failure. The physical file name "C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\DemoSuspect_LOG.ldf" may be incorrect.
The log cannot be rebuilt because the database was not cleanly shut down.
Msg 1813, Level 16, State 2, Line 1
Could not open new database 'DemoSuspect'. CREATE DATABASE is aborted.

Well, that was a long-shot but SQL Server's smarter.

Basically the problem is that the database wasn't cleanly shutdown, which means that recovery HAS to run and complete before the database can be attached again. Given that our log file is corrupt, that's impossible.

So, never detach a suspect database.

The only way to get the database back into SQL Server is to use a hack. I'm going to create a new dummy database with the exact same file layout and as close as possible to the file sizes of the detached database. Then I'm going to shutdown SQL Server, swap in the corrupt database files, and re-start SQL Server. If all goes well, the corrupt, suspect database will be attached again.

The one major downside of this is that if the SQL Server instance doesn't have instant initialization enabled (see How to tell if you have instant initialization enabled?), then creating the dummy database could take a long time if the data files are very big. This means that your application is offline while the files are created and zero'd out.

So, I've already got a copy of the corrupt database files, so now I need to delete the data file too. Before doing this you want to make absolutely sure you've got multiple copies of the corrupt database files... just in case. After deleting the data file, I can create my dummy database.

CREATE DATABASE DemoSuspect
GO

If you forget to delete the existing corrupt files first, you'll get the following error:

Msg 1802, Level 16, State 4, Line 1
CREATE DATABASE failed. Some file names listed could not be created. Check related errors.
Msg 5170, Level 16, State 1, Line 1
Cannot create file 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\DemoSuspect.mdf' because it already exists. Change the file path or the file name, and retry the operation.

Ok - delete the data file and try again. Now I need to check the dummy database's files are there (at this point I'm totally paranoid), shutdown SQL Server, delete the dummy databases' files (this just seems wrong!), and swap in the corrupt files. I took another copy of the corrupt files before swapping them back in, just in case something goes wrong.

After restarting SQL Server, I can check the database state:

SELECT DATABASEPROPERTYEX ('DemoSuspect', 'STATUS') AS 'Status';
GO

Status
--------
SUSPECT

Woo-hoo - I'm back to having a suspect database attached again - after having to shutdown the server and mess about deleting and copying files around. Not good. Now I can actually fix it.

Fixing the Database

If you don't have any backups, then the only way to get into the database is to use EMERGENCY mode. This lets you into the database but you need to be aware that recovery has not completed so the contents of the database are transactionally (and possibly structurally) inconsistent. I'm going to choose to repair the database using emergency-mode repair. See CHECKDB From Every Angle: EMERGENCY mode repair - the very, very last resort for a detailed description of this tool.

ALTER DATABASE DemoSuspect SET EMERGENCY;
GO
ALTER DATABASE DemoSuspect SET SINGLE_USER;
GO
DBCC CHECKDB (DemoSuspect, REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS, ALL_ERRORMSGS;
GO

Msg 5172, Level 16, State 15, Line 1
The header for file 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\DemoSuspect_log.LDF' is not a valid database file header. The PageAudit property is incorrect.
File activation failure. The physical file name "C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\DemoSuspect_log.LDF" may be incorrect.
The log cannot be rebuilt because the database was not cleanly shut down.
The Service Broker in database "DemoSuspect" will be disabled because the Service Broker GUID in the database (9E879BFC-B742-4A69-AB14-4D6BD6F99E02) does not match the one in sys.databases (B4568D23-7018-40CF-B189-9C29DE697C09).
Warning: The log for database 'DemoSuspect' has been rebuilt. Transactional consistency has been lost. The RESTORE chain was broken, and the server no longer has context on the previous log files, so you will need to know what they were. You should run DBCC CHECKDB to validate physical consistency. The database has been put in dbo-only mode. When you are ready to make the database available for use, you will need to reset database options and delete any extra log files.

First off it tries to do the regular ATTACH_REBUILD_LOG. When that fails, DBCC CHECKDB takes over and forces the log to be rebuilt, after trying to force as much of the damaged log to be processed for recovery as it can. It then runs a full repair, in case there's anything corrupt in the database - in this case there isn't so there are no corruption messages in the output.

Notice the line about the Service Broker GUID being wrong. I had to use the hack method to get the database attached again, but when I created the dummy database, it created a Service Broker GUID for the DemoSuspect database in master.sys.databases. When I swapped in the corrupt database, it has a different GUID - so now I can't use Service Broker! And all because I detached the suspect database instead of repairing it.

So what's the state of the data?

USE DemoSuspect;
GO

SELECT * FROM Employees;
GO

FirstName   LastName   YearlyBonus
----------  ---------  ------------
Paul        Randal     10000
Kimberly    Tripp      0

Kimberly doesn't get a bonus this year - she won't be happy! This is contrived and flippant, of course, but it illustrates the point that after doing an emergency-mode repair, transactions that were active at the time the log was damaged will not get a chance to roll-back, most likely. In this case, I know what was going on when the crash occured, but what about on a busy OLTP system with hundreds or thousands of active transactions? What state will the data be in?

Summary

Yes, you can recover from a detached suspect database, but it's not pretty and you have to be very careful. The best course of action is always to have a comprehensive backup strategy that allows you to restore as quickly as possible. If you do have a suspect database and no backups, use EMERGENCY mode to access and/or repair the database. Hopefully this article will help people that find themselves in these situations.

Every so often I'll see posts on the various data corruption forums discussing causes of corruption. In this post I want to debunk some of the myths around what can cause corruption. There are really two types of corruption to deal with, physical corruption and logical corruption.

Physical corruption

This is where something has altered the contents of a data or log file sector with no regard for what is being stored there. Possible causes of physical corruption are:

  • Problem with the I/O subsystem (99.8% of all cases I've ever seen - only 3 nines as I'd estimate I've seen around about a thousand corruption cases). Remember the I/O subsystem is everything underneath SQL Server in the I/O stack - including the OS, 3rd-party file system filter drivers, device drivers, RAID controllers, SAN controllers, network hardware, drives themselves, and so on. Millions of lines of code and lots of moving parts spinning very fast, very close to very fragile pieces of metal oxide (I once heard Jim Gray liken a disk drive head to a 747 jumbo jet flying at 500 mph at a height of 1/4 inch from the ground...)
  • Problem with the host machine hardware (0.1% of cases). Most of the time this is a memory error.
  • SQL Server bugs (0.1% of cases). Yes, there have been corruption bugs. Every piece of software has bugs. There are KB articles describing bugs.
  • Deliberate introduction of corruption using a hex editor or other means.

Physical corruption is what DBCC CHECKDB usually reports and the majority of cases are caused by a physical failures of some kind, with the minority caused by humans - software bugs.

Logical corruption

This is where something has altered some data so that a data relationship is broken. Possible causes of logical corruption are:

  • Humans

:-) Okay...

  • Application bug. The application deletes one part of an inherent data relationship but not the other. Or the application designer doesn't implement a constraint properly. Or the application designer doesn't cope with a transaction roll-back properly. You get the idea.
  • Accidental update/delete. Someone deletes or updates some data incorrectly.
  • SQL Server bug. See above.
  • DBCC CHECKDB when using the REPAIR_ALLOW_DATA_LOSS option. As is documented in Books Online, and I've blogged about and mentioned when lecturing, if you run repair, it doesn't take into account any inherent or explicit constraints on the data.

The point here is that a physical failure of a component does not cause logical corruption, it causes physical corruption. Conversely, application errors cause logical corruption, not physical corruption. DBCC CHECKDB errors are about physical corruption (okay, with the inclusion of DBCC CHECKCATALOG code in 2005, it will find cases where the DBA has manually altered the system tables, causing logical corruption) and applications cannot cause physical corruption as they can only manipulate data through SQL Server. If an application hits a SQL Server bug which causes physical corruption, that's still not the application causing physical corruption, it's SQL Server.

So - on to the myths.

  • Can an application cause physical corruption? No.
  • Can stopping a shrink operation cause corruption of any kind? No.
  • Can stopping an index rebuild cause corruption of any kind? No.
  • Can running DBCC CHECKDB without repair cause corruption of any kind? No.
  • Can creating a database snapshot cause corruption of any kind? No.

Hope this helps.

Today I presented my brand new session Surviving Corruption: From Detection to Recovery at TechEd. I had a lot of fun putting together the demos, presenting the session, and talking to people afterwards. During the session, I promised to blog each of the demos so that everyone can run through them - here's the first one.

On SQL 2000, it was pretty easy to get into the system tables and manually change them - all you had to do was:

EXEC sp_configure 'allow updates', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO

And then you could insert, update, and delete whatever you wanted in the all the system tables, including the critical three - sysindexes, sysobjects, and syscolumns. The problem was that sometimes people actually did this and messed things up - for instance, by manually deleting an object from sysobjects, but leaving around all the other info about the object - such as indexes and columns. DBCC CHECKCATALOG in SQL 2000 would find this, but DBCC CHECKDB would not - as it didn't run the DBCC CHECKCATALOG code - any most people do not run DBCC CHECKCATALOG at all. Many times now, I've seen databases upgraded to 2005 and suddenly DBCC CHECKDB is reporting metadata corruption errors - all because someone had manually changed the system tables on 2000, and I changed DBCC CHECKDB in 2005 to include the DBCC CHECKCATALOG checks.

This demo is all about that. I created a 2000 database, manually deleted a row in sysobjects and then upgraded the database to 2005. The corrupt database is available in a zip file - DemoCorruptMetadata.zip. If you unzip it into a folder C:\SQLskills then you can attach it using:

RESTORE DATABASE DemoCorruptMetadata FROM DISK = 'C:\SQLskills\DemoCorruptMetadata.bak'
   
WITH MOVE 'DemoCorruptMetadata' TO 'C:\SQLskills\DemoCorruptMetadata.mdf',
   
MOVE 'DemoCorruptMetadata_log' TO 'C:\SQLskills\DemoCorruptMetadata_log.ldf',
   
REPLACE;
GO

So what does the corruption look like on 2005?

DBCC CHECKDB (DemoCorruptMetadata) WITH NO_INFOMSGS, ALL_ERRORMSGS;
GO
Msg 8992, Level 16, State 1, Line 1
Check Catalog Msg 3853, State 1: Attribute (object_id=1977058079) of row (object_id=1977058079,column_id=1) in sys.columns does not have a matching row (object_id=1977058079) in sys.objects.
Msg 8992, Level 16, State 1, Line 1
Check Catalog Msg 3853, State 1: Attribute (object_id=1977058079) of row (object_id=1977058079,column_id=2) in sys.columns does not have a matching row (object_id=1977058079) in sys.objects.
CHECKDB found 0 allocation errors and 2 consistency errors not associated with any single object.
CHECKDB found 0 allocation errors and 2 consistency errors in database 'DemoCorruptMetadata'.

This is what we expect. Notice that there's no recommended repair level at the end of the output - this is because CHECKDB can't repair metadata corruptions. We can't fix this with a backup - unless we have a backup from 2000 from before the manual delete in the system tables. To fix this we'd need to go back to 2000, fix the corruption, and then upgrade again - usually not feasible.

Instead, we're going to fix it by manually altering the system tables in 2005 - something that's purportedly not possible. First let's see what tables there are that could include column information (remembering that the system catalogs were completely rewritten between 2000 and 2005):

SELECT [name] FROM DemoCorruptMetadata.sys.objects WHERE [name] LIKE '%col%';
GO

name
------------------
sysrowsetcolumns
syshobtcolumns
syscolpars
sysiscols

I know that sysrowsetcolumns and syshobtcolumns are involved at low-levels of the Storage Engine and don't contain relational metadata, so let's try syscolpars. I want to see what columns there are to see if one of the looks like an object ID, and another looks like a column ID. This query will just return the table columns, with no rows (because the condition 1=0 is always false:

SELECT * FROM DemoCorruptMetadata.sys.syscolpars WHERE 1 = 0;
GO

Msg 208, Level 16, State 1, Line 1
Invalid object name 'DemoCorruptMetadata.sys.syscolpars'.

I can't bind to internal system tables in 2005. But - I can bind to internal system tables using the Dedicated Admind Connection (or DAC for short). This is documented in Books Online at http://msdn.microsoft.com/en-us/library/ms179503.aspx. You can get to the DAC through SQLCMD using the /A switch. So - assuming I'm now connected through the DAC, I'll try that command again:

C:\Documents and Settings\paul>sqlcmd /A
1> USE DemoCorruptMetadata;
2> GO
Changed database context to 'DemoCorruptMetadata'.
1> SELECT * FROM sys.syscolpars WHERE 1=0;
2> GO
id          number colid       name

xtype utype       length prec scale collationid status      maxinrow xmlns
 dflt        chk         idtval

----------- ------ ----------- -------------------------------------------------
-------------------------------------------------------------------------------
----- ----------- ------ ---- ----- ----------- ----------- -------- -----------
 ----------- ----------- -------------------------------------------------------
-----------

(0 rows affected)
1>

This looks like the table. Now I'll query against it using the object ID from the original corruption message:

1> SELECT colid, name FROM sys.syscolpars WHERE id = 1977058079;
2> GO
colid       name
----------- --------------------------------------------------------------------
------------------------------------------------------------
          1 SalesID
         
2 CustomerID
(2 rows affected)
1>

Cool. So I'll try deleting the orphaned columns:

1> DELETE FROM sys.syscolpars WHERE id = 1977058079;
2> GO
Msg 259, Level 16, State 1, Server ROADRUNNERPR, Line 1
Ad hoc updates to system catalogs are not allowed.
1>

Hmm. And it doesn't help if I set 'allow updates' to 1, or try putting the database into single-user mode.

There IS a way though. You can put the SERVER into single-user mode, then connect with the DAC and you can then update the system tables. This particular twist on using the DAC isn't documented anywhere except in an MSDN forum thread answered by someone from Microsoft (see here).

BEWARE (if I could put little flashing lights around this too then I would...) that this is undocumented and unsupported - misuse will lead to unrepairable corruption of your databases.

The sequence of events to follow is:
  • make a backup of the database just in case something goes wrong
  • shutdown the server
  • go to the binaries directory (e.g. C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn) and start the server in single-user mode using 'sqlservr -m'
  • connect back in using SQLCMD /A, and run the deleta again. This time it will work, but will give an error about metadata cache consistency:

C:\Documents and Settings\paul>sqlcmd /A
1> USE DemoCorruptMetadata;
2> GO
Changed database context to 'DemoCorruptMetadata'.
1> DELETE FROM sys.syscolpars WHERE id = 1977058079;
2> GO

(2 rows affected)
Warning: System table ID 41 has been updated directly in database ID 12 and cache coherence may not have been maintained. SQL Server should be restarted.
1>

  • The system table has been updated, but the in-memory cache of metadata is now out-of-sync with the system tables. So, shutdown the server again as the message suggests and restart it normally
  • run CHECKDB again and you'll see the corruption has been fixed.

 

Hope this helps some of you. Watch this space for the next demo from TechEd of repairing corruption when no backup is available.

(I'm actually on-stage here at TechEd doing the  DAT track pre-con with Kimberly - she's on now until lunch so I'm catching up on forum problems...)

Here's a question that came up on of the SQLServerCentral.com corruption forums I monitor that I think is worth blogging about. To paraphrase:

I have a bunch of corruptions in a database, that look like they've been there for a while. Repair is my only option - it works but I'd like to know what data is being deleted. How can I do that? Here are some of the errors:

Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168576) could not be processed. See other errors for details.
Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168577) could not be processed. See other errors for details.
Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168578) could not be processed. See other errors for details.
Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168579) could not be processed. See other errors for details.
Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168580) could not be processed. See other errors for details.
Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168581) could not be processed. See other errors for details.
Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168582) could not be processed. See other errors for details.
Server: Msg 8976, Level 16, State 1, Line 2
Table error: Object ID 645577338, index ID 1. Page (1:168576) was not seen in the scan although its parent (1:165809) and previous (1:168575) refer to it. Check any previous errors.
Server: Msg 8978, Level 16, State 1, Line 2
Table error: Object ID 645577338, index ID 1. Page (1:168583) is missing a reference from previous page (1:168582). Possible chain linkage problem.

This is a clustered index that CHECKDB  will repair by deleting pages at the leaf-level - essentially deleting a bunch of records. The pages look to be trashed (there were a bunch more errors that I didn't include here that said the page headers were all corrupted - looked like the IO subsystem trashde a whole 64KB chunk of the disk) so there's nothing much else you can do. As the table has a clustered index, you can use the error messages to find the pages on either 'logical' side of the pages being deleted - and hence figure out the range of records that have been deleted.

The errors show that pages 168576 through 168582 in file 1 are corrupt. There are also errors that say the previous page of 168576 is 168575, and the next page of 168582 is 168583. If you do a DBCC PAGE of these two pages, you can find the lower and upper bound of the clustered index key values that have been lost. Think of three ranges:

  • the lower range of records that are intact, logically before the corrupt pages in the index
  • the range of records that will be deleted by repair
  • the upper range of records that are intact, logically after the corrupt pages in the index

To find the upper bound of the lower range:

DBCC TRACEON (3604); -- allows the output to come to the console
DBCC PAGE ('dbname', 1, 168575, 3);
GO

The key value in the slot at the end of output is the upper bound of the bottom range that's intact.

Then do:

DBCC PAGE ('dbname', 1, 168583, 3);
GO

The key value in the slot at the beginning of the output is the lower bound of the upper range that's intact.

Everything in the middle will be deleted. You could also try a DBCC PAGE on the corrupt pages themselves too - you might be able to see some data in them.

I'll be blogging a bunch more about repair after my corruption session this week at TechEd - watch this space!

Before I start, I want to make it clear that you can only hit this bug if you ALREADY have corruption, that it's quite rare, and that there is a workaround.

I've noticed a few more people in the forums having CHECKDB fail with this particular error in the last month

Msg 8967, Level 16, State 216, Line 1
An internal error occured in DBCC which prevented further processing. Please contact Product Support.

instead of completing properly and listing the corruptions in the database.

Whenever CHECKDB is using a database snapshot, it must check that the page it read through the snapshot does not have an LSN (Log Sequence Number) higher than that when the snapshot was created. If it did, this would mean that the page was modified AFTER the snapshot was created and hence CHECKDB would be working from an inconsistent view of the database. If this case is discovered, CHECKDB stops immediately. When I rewrote CHECKDB for SQL Server 2005, I changed a bunch of code assertions into seperate states of the 8967 error, so that CHECKDB would fail gracefully if some condition occured that indicates a bug or something that should never happen. State 216 is for the bad LSN condition I've just described.

I used to think it was caused by a race condition with the NTFS code that implements sparse files, which is used by the hidden database snapshot that CHECKDB uses by default. However, I've come to learn that this is a bug in CHECKDB (not one of mine I should say :-)) that causes this behavior under certain circumstances when corruption is present. The bug is that if a corrupt page fails auditing inside CHECKDB, the LSN check is still performed. If the corruption affects the LSN stamped in the page header, the 8967 error could be triggered. I've seen this a handful of times in the last few weeks - hence the need for a blog post. I've discussed this with the dev team and hopefully the fix will make it into the next SPs for 2005 and 2008 (too late to fix such a rare problem in such a critical component at this stage of 2008 development). They're going to put a KB article together too - but in the meantime, I wanted to get this on the Internet so Google/Live Search pick it up.

[Edit: June 24 2010 - this still hasn't been fixed AFAIK - see KB 960791 which just discusses the workaround.)

Now let's repro the problem. Starting with a simple database and table, I'll find the first page so I can corrupt it.

CREATE DATABASE TestCheckdbBug;
GO
USE TestCheckdbBug;
GO
CREATE TABLE test (c1 INT, c2 CHAR (5000));
INSERT INTO test VALUES (1, 'a');
GO EXEC sp_AllocationMetadata 'test';
GO

Object Name  Index ID  Alloc Unit ID      Alloc Unit Type  First Page  Root Page  First IAM Page
-----------  --------  -----------------  ---------------  ----------  ---------  --------------
test         0         72057594042318848  IN_ROW_DATA      (1:143)     (0:0)      (1:152)

Now I'm going to corrupt the page type on page (1:143) to be 255 (an invalid page type), which will guarantee the page fails the audit inside CHECKDB.

DBCC CHECKDB ('TestCheckdbBug') WITH ALL_ERRORMSGS, NO_INFOMSGS;
GO

Msg 8928, Level 16, State 1, Line 1
Object ID 2073058421, index ID 0, partition ID 72057594038321152, alloc unit ID 72057594042318848 (type In-row data): Page (1:143) could not be processed. See other errors for details.
Msg 8939, Level 16, State 6, Line 1
Table error: Object ID 2073058421, index ID 0, partition ID 72057594038321152, alloc unit ID 72057594042318848 (type In-row data), page (1:143). Test ((m_type >= DATA_PAGE && m_type <= UNDOFILE_HEADER_PAGE) || (m_type == UNKNOWN_PAGE && level == BASIC_HEADER)) failed. Values are 255 and 255.
CHECKDB found 0 allocation errors and 2 consistency errors in table 'test' (object ID 2073058421).
CHECKDB found 0 allocation errors and 2 consistency errors in database 'TestCheckdbBug'.
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (TestCheckdbBug).

Now I'm going to corrupt the LSN on that page such that it's guaranteed to be higher than the creation LSN of the database snapshot (basically by filling the first part of the page header LSN field with 0xFF).

DBCC CHECKDB ('TestCheckdbBug') WITH ALL_ERRORMSGS, NO_INFOMSGS;
GO

Msg 8967, Level 16, State 216, Line 1
An internal error occurred in DBCC that prevented further processing. Contact Customer Support Services.
Msg 8921, Level 16, State 1, Line 1
Check terminated. A failure was detected while collecting facts. Possibly tempdb out of space or a system table is inconsistent. Check previous errors.

Bingo! And in the error log, there's some diagnostic information so we can tell which page caused the problem:

2008-05-22 14:55:01.95 spid53   DBCC encountered a page with an LSN greater than the current end of log LSN (31:0:1) for its internal database snapshot. Could not read page (1:143), database 'TestCheckdbBug' (database ID 15), LSN = (-1:65535:18), type = 255, isInSparseFile = 0. Please re-run this DBCC command.
2008-05-22 14:55:01.95 spid53   DBCC CHECKDB (TestCheckdbBug) WITH all_errormsgs, no_infomsgs executed by ROADRUNNERPR\paul terminated abnormally due to error state 1. Elapsed time: 0 hours 0 minutes 0 seconds.

Note the page ID (in black bold above) tells us the bad page and the LSN (in blue bold above) reflects the corruption that I caused. If the page ID field of the header was corrupt, it wouldn't be possible to tell from these diagnostics which page is corrupt. However, all is not lost. This bug means that under these circumstances the default online behavior of CHECKDB can't run. The workaround is to use the WITH TABLOCK option of CHECKDB, which does offline checking and doesn't need the snapshot - but the trade-off is that an exclusive database lock is required for a short time and then shared table locks for all tables in the database (this is why online is the default). Running this option on my corrupt database gives:

DBCC CHECKDB ('TestCheckdbBug') WITH TABLOCK, ALL_ERRORMSGS, NO_INFOMSGS;
GO

Msg 8928, Level 16, State 1, Line 1
Object ID 2073058421, index ID 0, partition ID 72057594038321152, alloc unit ID 72057594042318848 (type In-row data): Page (1:143) could not be processed. See other errors for details.
Msg 8939, Level 16, State 6, Line 1
Table error: Object ID 2073058421, index ID 0, partition ID 72057594038321152, alloc unit ID 72057594042318848 (type In-row data), page (1:143). Test ((m_type >= DATA_PAGE && m_type <= UNDOFILE_HEADER_PAGE) || (m_type == UNKNOWN_PAGE && level == BASIC_HEADER)) failed. Values are 255 and 255.
CHECKDB found 0 allocation errors and 2 consistency errors in table 'test' (object ID 2073058421).
CHECKDB found 0 allocation errors and 2 consistency errors in database 'TestCheckdbBug'.
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (TestCheckdbBug).

Which are the exact same results we had before I corrupted the LSN field (this is expected, as there is no check of a page's LSN field EXCEPT when running from a database snapshot). Now we can proceed to restore/repair as appropriate.

So - a scary little bug that has caused some people headaches, but I want to stress again - this can only happen if the database is ALREADY corrupt, and that it's quite rare. Hope this helps some of you picking this up from search engines in the future.

This came up several times during the week so I thought it was about time to blog about it. One of the new features we put into SQL Server 2005 was storing the last time that DBCC CHECKDB completed successfully (called the last-known good time). What does successfully mean? This is the catch - if DBCC CHECKDB runs to completion then it considers that a successful run - EVEN if it found some corruptions. However, the last-known good time is ONLY updated if there were NO corruptions found. Slightly confusing I know.

Cool - but how can you see it? Well, the only time it's ever reported is when the database starts up. Not too useful if the database has been running for months. So how to see it??? The trick is to use DBCC PAGE. The last-known good time is stored in the boot page of the database - page 9. The following code will dump page 9 for you:

-- you need this to get the DBCC PAGE output to your console
DBCC TRACEON (3604);
GO

-- page 9 is the boot page
DBCC PAGE (dbname, 1, 9, 3);
GO

You need to look for the dbi_dbccLastKnownGood field. That was the last time that DBCC CHECKDB ran without finding any corruptions.

Now - what about if you're trusting your SQL Agent jobs to run DBCC CHECKDB for you every so often (how often is a whole other discussion...) and relying on the Agent job failing if DBCC CHECKDB finds corruptions. Well, if all you do is run the DBCC command, you're never going to know about corruption unless DBCC CHECKDB itself fails for some reason - remember, it returns successfully even it it found corruptions.The key is to add another statement after you run DBCC that checks the value of @@ERROR (the last error code reported by SQL Server). Although DBCC CHECKDB doesn't stop with an error when it finds corruption, it will set @@ERROR if there are any error messages in its output.

Edit: The stuff I've struck-through above is not true. I thought I'd remembered it from testing previously and I've heard plenty of anecdotal evidence from customers too BUT an Agent job WILL fail if a DBCC CHECKDB within it fails. I went back and forth with Tara Kizer on SQLteam.com and we tested on 2005 and 2000 - and it worked. Now here is a real issue - the Job History that's captured will not contain all the output from DBCC CHECKDB - especially if you didn't use the WITH NO_INFOMSGS option. You should make sure you capture the output to a file to avoid having to go back and run DBCC CHECKDB all over again.

Summary - make sure you really know when DBCC CHECKDB runs successfully, without finding any corruptions!

Almost a year ago to the day I asked a question on my old blog - how long does *your* CHECKDB take to run? Well, I had a bunch of replies but I never got around to posting the results. I was reminded of this recently so threw together some graphs of the results. I've plotted database size against run-time of DBCC CHECKDB, for a number of different numbers of CPUs, and mostly on SQL Server 2000. I've also added a linear trend line too.

Now - bear in mind this doesn't take into account the multitude of other factors that can affect how long DBCC CHECKDB takes to run (see my previous post here for the list).

Hopefully these are interesting to some of you!

Here's an issue that I thought was a one-off but it just popped up again over the weekend so I want to publicize it.

DBA runs a DBCC CHECKDB and gets output like the following:

Msg 5172, Level 16, State 15, Line 1
The header for file 'E:\Data\namechanged.mdf:MSSQL_DBCC14' is not a valid database file header. The PageAudit property is incorrect.
Msg 5120, Level 16, State 9, Line 1
Unable to open the physical file "E:\Data\namechanged.mdf:MSSQL_DBCC14". Operating system error 0: "(null)".
Msg 1823, Level 16, State 2, Line 1
A database snapshot cannot be created because it failed to start.
Msg 7928, Level 16, State 1, Line 1
The database snapshot for online checks could not be created. Either the reason is given in a previous error or one of the underlying volumes does not support sparse files or alternate streams. Attempting to get exclusive access to run checks offline.

etc

Wow! Looks like something's seriously wrong with that database. Until you find that this happens on *all* the databases on the server. And on all the other servers in the data center too. What on earth's going on? I saw a similar issue once before, back in 2004 or 2005 while customers were testing pre-RTM IDW builds of SQL Server 2005. After running a test and doing some investigation for me, the person who just hit this issue over the weekend confirmed my hypothesis that he'd hit the problem I saw a few years ago.

The problem is a 3rd-party file encryption solution. It installs a file-system filter driver that filters all reads and writes to the database files so it can perform the encryption/decryption. Unfortunately it doesn't cope with NTFS alternate streams - which is how the internal database snapshots that DBCC CHECKDB uses are created - and so returns garbage for all reads from the database snapshot. The simple workaround is to create your own database snapshot (whose constituent files will exist in their own right, rather than being alternate streams of the source database's files) and then run DBCC CHECKDB against that.

Now - I don't want to cry 'wolf' here, but be careful of attributing massive corruption problems to the real IO subsystem if there's any kind of file-system filter driver installed.

This is a question that comes up a lot - how to run consistency checks on a VLDB?

We're talking hundreds of GBs or 1 TB or more. These databases are now common on SQL Server 2000 and 2005 - at TechEd IT Forum this week (and at SQL Connections the week before) there are many customers with multi-TB databases. Any experienced DBA knows the value of running consistency checks, even when the system is behaving perfectly and the hardware is rock-solid. The two problems that people have with running a full CHECKDB on their VLDB are:

  • It takes a long time to run (based on many factors – see my previous post here for details).
  • It uses lots of resources – memory, CPU, IO bandwidth, tempdb space.

So it uses lots of resources for a long time. Even with a decent sized maintenance window, the CHECKDB may run over into normal operations. There's also the case of a system that's already pegged in more or more resource dimensions. Whatever the case, there are a number of options:

  • Don't run consistency checks
  • Run CHECKDB using the WITH PHYSICAL_ONLY option
  • Use SQL Server 2005's partitioning feature and devise a consistency checking plan around that
  • Figure out your own scheme to divide up the consistency checking work over several days
  • Offload the consistency checks to a separate system

Let's look at each in turn.

Don't run consistency checks

A lot of people end up doing this as they can’t figure out how to run consistency checks on their DB. Don't even think about using this option – there’s always a way to do it. If you absolutely cannot figure out a way to get consistency checks on your system, send me email and I'll help you – I’ve helped many customers with all the schemes below, both in and out of Microsoft.

Use WITH PHYSICAL_ONLY

A full CHECKDB does a lot of stuff - see previous posts in this series for more details. You can vastly reduce the run-time and resource usage of CHECKDB by using the WITH PHYSICAL_ONLY option. With this option, CHECKDB will:

  • Run the equivalent of DBCC CHECKALLOC (i.e. check all the allocation structures)
  • Read and audit every allocated page in the database

So it skips all the logical checks, inter-page checks, and things like DBCC CHECKCATALOG. The fact that all allocated pages are read means that:

  • Any pages that cannot be read at all (i.e. 823 errors) will be discovered
  • If page checksums are enabled in SQL Server 2005, any corruptions caused by the IO subsystem will be discovered as the page checksum will be checked as part of reading the page into the buffer pool

So there's a trade-off of consistency checking depth against run-time and resource usage - but this option will pick up problems caused by the IO subsystem as long as page checksums are enabled and present.

Use the SQL Server 2005 partitioning feature

One of the obvious ways to reduce the time/resources issue is to partition the load. If you're using the partitioning feature in SQL Server 2005 then you're already setup for this. Given that you've hopefully got your partitions stored on separate filegroups, you can use the DBCC CHECKFILEGROUP command.

Consider this example - you have the database partitioned by date such that the current month is on a read-write filegroup and the past 11 months are on 11 read-only filegroups (data from more than a year ago is on some offline storage medium). The prior months also have multiple backups on various media so are considered much 'safer' than the current month. It makes sense that you don't need to check the read-only filegroups as often as the current month's filegroup so an example consistency checking scheme could be:

  • Run a DBCC CHECKFILEGROUP on each read-only filegroup every week or two
  • Run a DBCC CHECKFILEGROUP on the read-write filegroup every day or two (depending on the stability of the hardware, the criticality of the data, and the frequency and comprehensiveness of your backup strategy).

I know of several companies who've made the decision to move to SQL Server 2005 in part because of this capability to easily divide up the consistency checking.

Beware that until SP2 of SQL Server 2005, DBCC CHECKFILEGROUP would not check a table at all if it was split over multiple filegroups. This is now fixed and DBCC CHECKFILEGROUP will check partitions on the specified filegroup even if the table is now completely contained on the filegroup.

Figure out your own way to partition the checks

If you're on SQL Server 2000, or you just haven't partitioned your database on SQL Server 2005, then there are ways you can split up the consistency checking workload so that it fits within a maintenance window. You basically need to simulate what CHECKDB does in a staggered manner. Here's one scheme that I've recommended to several customers:

  • Figure out your largest tables (by number of pages) and split the total number into 7 buckets, such that there are a roughly equal number of database pages in each bucket.
  • Take all the remaining tables in the database and divide them equally between the 7 buckets (using number of pages again)
  • On Sunday:
    • Run a DBCC CHECKALLOC
    • Run a DBCC CHECKCATALOG
    • Run a DBCC CHECKTABLE on each table in the first bucket
  • On Monday, Tuesday, Wednesday:
    • Run a DBCC CHECKTABLE on each table in the 2nd, 3rd, 4th buckets, respectively
  • On Thursday:
    • Run a DBCC CHECKALLOC
    • Run a DBCC CHECKTABLE on each table in the 5th bucket
  • On Friday and Saturday:
    • Run a DBCC CHECKTABLE on each table in the 6th and 7th buckets, respectively

In pre-RTM builds of SQL Server 2005, DBCC CHECKTABLE could not bind to the critical system tables, just like with T-SQL - but that's fixed so you can cover all system tables in SQL Server 2000 and 2005 using the method above.

There's one drawback to this method - a new internal database snapshot is created each time you start a new DBCC command, even for a DBCC CHECKTABLE. If the update workload on the database is significant, then there could be a lot of transaction log to recover each time the database snapshot is created - leading to a long total run-time. In this case, you may need to alter the number of buckets you use to make the total operation fit within your available window.

Use a separate system

This alternative is relatively simple - restore your backup (you are taking regular backups, right?) on another system and run a full CHECKDB on the restored database. This offloads the consistency checking burden from the production system and also allows you to check that your backups are valid. There are some drawbacks to this however:

  • You need to have sufficient disk space on the spare system to be able to restore the backup onto. If the production database is several TB, you need the same several TB on the spare box. This equates to a non-trivial amount of money - initial capital investment plus ongoing storage management costs. (Hopefully a future release will alleviate this – while at Microsoft I invented and patented a mechanism for consistency checking a database in a backup without restoring it.)
  • If the consistency checks find an error, you don't know for sure that the database is corrupt on the production system. It could be a problem with the spare box that's caused the corruption. The only way to know for sure is to run a consistency check on the production system. This is a small price to pay though, because most of the time the consistency checks on the spare system will be ok, so you know the production database was clean at the time the backup was taken.

Summary

You've got a bunch of choices to allow you to run consistency checks, so there's really no excuse for not knowing (within a reasonable timeframe) that something's gone wrong with your database.

In my previous post on interpreting CHECKDB output, plus in my DBCC Internals session at TechEd IT Forum yesterday, I mentioned there are some things that CHECKDB can’t repair. In this post I want to go into a bit more detail – based on a post from my old Storage Engine blog.

Before anyone takes this the wrong way - what do I mean by "can't be repaired"? Remember that that purpose of repair is to make the database structurally consistent, and that to do this usually means deleting the corrupt data/structure (that's why the option to do this was aptly named REPAIR_ALLOW_DATA_LOSS – see this post for more explanation on why repair can be bad). A corruption is deemed unrepairable when it doesn't make sense to repair it given the damage the repair would cause, or the corruption is so rare and so complicated to repair correctly that it's not worth the engineering effort to provide a repair. Remember also that recovery from corruptions should be based on a sound backup strategy, not on running repair, so making this trade-off in functionality makes sense.

Here's a few of the more common unrepairable corruptions that people run into along with the reasons they can't be repaired by DBCC.

PFS page header corruption

An example of this is on SQL Server 2005:

Msg 8946, Level 16, State 12, Line 1
Table error: Allocation page (1:13280496) has invalid PFS_PAGE page header values.
Type is 0. Check type, alloc unit ID and page ID on the page.

CHECKDB uses the PFS pages to determine which pages are allocated - and so which pages to read to drive the various consistency checks. The only repair for a PFS page is to reconstruct it - they can't simply be deleted as they're a fixed part of the fabric of the database. PFS pages cannot be rebuilt because there is no infallible way to determine which pages are allocated or not. There are various algorithms I've experimented with to rebuild them, with optimistic or pessimistic setting of page allocation statuses and then re-running the various consistency checks to try to sort out the incorrect choices, but they all require very long run-times. Given the frequency with which these corruptions are seen, and the engineering effort required to come up with an (imperfect) solution, I made the choice to leave this as unrepairable, and I don’t think that will change in future.

Critical system table clustered-index leaf-page corruption

An example of this is on SQL Server 2000:

Server: Msg 8966, Level 16, State 1, Line 1
Could not read and latch page (1:18645) with latch type SH. sysindexes failed.

And on SQL Server 2005:

Msg 7985, Level 16, State 2, Server SUNART, Line 1
System table pre-checks: Object ID 4. Could not read and latch page (1:51) with
latch type SH. Check statement terminated due to unrepairable error.

In a previous post in the series I described why how and why we do special checks of the clustered indexes of the critical system tables. If any of the pages at the leaf-level of these indexes are corrupt, we cannot repair them. Repairing would mean deallocating the page, wiping out the most important metadata for potentially hundreds of user tables and so effectively deleting all of these tables. That's obviously an unpalatable repair for anyone to allow and so CHECKDB doesn't do it.

Column value corruption

Here's an example of this on SQL Server 2005:

Msg 2570, Level 16, State 3, Line 1
Page (1:152), slot 0 in object ID 2073058421, index ID 0, partition ID 72057594038321152, alloc unit ID 72057594042318848 (type "In-row data"). Column "c1" value is out of range for data type "datetime".  Update column to a legal value.

This is where a column has a stored value that is outside the valid range for the column type. There are a couple of repairs we could do for this:

  1. delete the entire record
  2. insert a dummy value

#1 isn't very palatable because then data is lost and it’s not a structural problem in the database so doesn't have to be repaired. #2 is dangerous - what value should be chosen as the dummy value? Any value put in may adversely affect business logic, or fire a trigger, or have some unwelcome meaning in the context of the table - even a NULL. Given these problems, I chose to allow people to fix the corrupt values themselves.

Metadata corruption

Here's an example of this on SQL Server 2005:

Msg 3854, Level 16, State 1, Line 2
Attribute (referenced_major_id=2089058478) of row (class=0,object_id=2105058535,column_id=0,referenced_major_id=2089058478,referenced_minor_id=0) in sys.sql_dependencies has a matching row (object_id=2089058478) in sys.objects (type=SN) that is invalid.

This example is relatively benign. There are other examples that will cause CHECKDB to terminate - not as bad as the critical system table corruption example above, but enough that CHECKDB doesn't trust the metadata enough to use it to drive consistency checks. Repairing metadata corruption has the same problems as repairing critical system table corruption - any repair means deleting metadata about one or more tables, and hence deleting the tables themselves. It's far better to leave the corruption unrepaired so that as much data as possible can be extracted from the remaining tables.

Summary

Repair can't fix everything. You may end up having to perform manual and time-consuming data extraction from the corrupt database and losing lots of data because of, say, a critical system table corruption. Bottom line (as usual) - make sure you have valid backups so you don't get into this state!

This is a subject I posted about last year on my old blog but it came up at SQL Connections last week several times so I want to repost it for those who’ve just started following my blog.

There's only one time when you should be trying to work out how long a CHECKDB is going to take - when you're planning your regular database maintenance. If you're faced with a corrupt (or suspected corrupt) database and you're only just starting to think about how long a CHECKDB is going to take - you've made a mistake while planning your disaster recovery strategy. You always need to know how long CHECKDB takes (on average) to run for your database so:

  • You can tell whether a particular run of CHECKDB is taking longer than usual - a sign that it's found some corruption
  • You know how long it will take to get results in a disaster recovery situation

At every conference I go to, someone asks me how long CHECKDB will take to run on their database. There are several ways I could answer this:

  • The unhelpful answer - I've got no idea.
  • The almost-helpful answer - how long did it take to run last time and are the conditions exactly the same?
  • The answer I usually give - it depends.

Now, many people would see the third answer as being somewhat equivalent to the first answer - unhelpful. The problem is that there are many factors which influence how long CHECKDB will take to run. Let me explain the ten most important factors so you get an idea why this is actually a helpful answer. These aren't in any particular order of importance.

1) The size of the database

Pretty obvious... CHECKDB has to read every allocated page in the database so the bigger it is, the longer it will take to read all the pages.

2) Concurrent IO load on the server

At the simplest level, what is CHECKDB going to do? It reads every allocated page in the database. That's a lot of IO. CHECKDB takes great pains to do the most efficient IO it can and read the database pages in their physical order with plenty of readahead so that the disk heads move smoothly across the disks (rather than jumping around randomly and incurring disk head seek delays). If there's no concurrent IO load on the server, then the IOs will be as efficient as CHECKDB can make them. However, introducing any additional IO from SQL Server means that the disk heads will be jumping around - slowing down the CHECKDB IOs. If the IO subsystem is at capacity already from CHECKDB's IO demands, any additional IO is going to reduce the IO bandwidth available to CHECKDB - slowing it down.

3) Concurrent CPU activity on the server

At the next level of simplicity, CHECKDB is going to process every page it reads in some way. Depending on the various options you've specified and the database schema (details below), that's going to use a lot of CPU - it's possible that the server may be pegged at 100% CPU when CHECKDB is running. If there's any additional workload on the server, that's going to take CPU cycles away from CHECKDB and it going to slow it down.

Basically what points #2 and #3 are saying is that CHECKDB is very resource intensive! It’s probably one of the most resource intensive things you can ask SQL Server to do and so it's usually a good idea to not run it during peak workload times, as you'll not only cause CHECKDB to take longer to run, you will slowdown the concurrent workload, possibly unacceptably.

4) Concurrent update activity on the database

This is relevant for both SQL 2000 and SQL 2005, but for different reasons.

In SQL 2000, CHECKDB gets its consistent view of the database from transaction log analysis of concurrent DML transactions (see here for details). The more concurrent DML there is while CHECKDB is running, the more transaction log will be generated - and so the longer it will take for CHECKDB to analyze that transaction log. It's possible that on a large multi-CPU box with a ton of concurrent DML and CHECKDB limited to a single CPU that this phase of CHECKDB could take several times longer than the reading and processing of the database pages! (I've seen this in real-life several times.)

In SQL 2005, CHECKDB gets its consistent view of the database from a database snapshot, which is stored on the same disk volumes as the database itself. If there are a lot of changes in the database while CHECKDB is running, the changed pages are pushed to the snapshot so that it remains consistent. As the snapshot files are stored in the same location as the database files, every time a page is pushed to the snapshot, the disk head has to move, which interrupts the efficient IO described in #2. Also, whenever CHECKDB goes to read a page and it needs to read the page from the snapshot files instead of the database files, that's another disk head move, and another efficient IO interruption. The more concurrent changes to the database, the more interruptions to efficient IO and the slower that CHECKDB runs.

5) Throughput capabilities of the IO subsystem

This one's simple. CHECKDB is going to do a boat-load of IOs and it could even end up being IO-bound (meaning that the CPUs are idle periodically waiting for IOs to complete) depending on the options specified and the database schema. This means that the throughput of the IO subsystem is going to have a direct effect on the run-time of CHECKDB. so, if you have a 1TB database and the IO subsystem can only manage 100MB/sec, it's going to take almost 3 hours just to read the database (1TB / 100MB / 3600 secs) and there's nothing you can do to speed that up except upgrade the IO subsystem.

I've lost count of the number of times I've heard customers complain that CHECKDB (or index rebuilds or other IO-heavy operations) are running sloooowly only to find that the disk queue lengths are enormous and the IO subsystem it entirely unmatched to the server and workload.

6) The number of CPUs (processing cores) on the box

This also really encompasses the Edition of SQL Server that's being run. In Enterprise Edition, CHECKDB can run in parallel across all the CPUs in the box (or as many as the query processor decides to parallelize over when the CHECKDB internal queries are compiled). Running in parallel can give a significant performance boost to CHECKDB and lower run times, as long as the database is also spread over multiple files too (so the IOs can be parallelized). There's a nifty algorithm that’s used that allows CHECKDB to run in parallel which I'll explain in detail in a future post.

On the other hand, the fact that CHECKDB can run in parallel in Enterprise Edition can be bad for some scenarios, and so some DBAs chose to force CHECKDB to be single-threaded. SAP typically recommends this to help with user query predictability. The way to do this is to turn on the documented trace flag 2528.

7) The speed of the disks where tempdb is placed

Running CHECKDB against a VLDB uses lots of memory for internal state and for VLDBs the memory requirement usually exceeds the amount of memory available to SQL Server. In this case, the state is spooled out to tempdb and so the performance of tempdb can be a critical factor in CHECKDB performance. See this post for more details of this and how CHECKDB can run out of disk space if tempdb is too small.

8) The complexity of the database schema

This can have a really high impact on the run-time of CHECKDB because it impacts the amount of CPU that CHECKDB requires. For example, the most expensive checks that CHECKDB does are for non-clustered indexes. It needs to check that each row in a non-clustered index maps to exactly one row in the heap or clustered index for the table, and that every heap/clustered index row has exactly one matching row in each non-clustered index. Although there's a highly efficient algorithm for doing this, it still takes around 30% of the total CPU that CHECKDB uses!

There are a bunch of other checks that are only done if the features have been used in the database - e.g. computed column evaluation, links between off-row LOB values, Service Broker, XML indexes, indexed views - so you can see that empirical factors along aren't enough to determine the run-time.

9) Which options are specified

This is almost the same as #7 in that by specifying various options you're limiting what checks CHECKDB actually performs. For instance, using the WITH NOINDEX option will turn off the non-clustered index checks that I described in #7 and using the WITH PHYSICAL_ONLY option will turn off all logical checks, vastly decreasing the run-time of CHECKDB and making it nearly always IO-bound rather than CPU-bound (in fact this is the most common option that DBAs of VLDBs use to make the run-time of CHECKDB manageable).

One thing to be aware of - if you specify any repair options, CHECKDB always runs single-threaded, even on a multi-proc box on Enterprise Edition.

10) The number and type of corruptions that exist in the database

Again, this is similar to #7 and #8. If there are any corruptions present, there may be extra checks triggered to try to figure out more details of the corruptions. For instance, for the non-clustered index checks, the algorithm is tuned very heavily for the case when there are no corruptions present (the overwhelming majority of cases considering the millions of times CHECKDB is run every day around the world). When a non-clustered index corruption is detected, a more in-depth algorithm has to be used to figure out exactly where the corruption is, which involves re-scanning a bunch of data and so taking a bunch more time. There are a few other algorithms like this too.

Summary

So you can see that there's no simple answer.

Last week at SQL Connections someone said that CHECKDB’s output is ‘useless’.  Given that I wrote CHECKDB for SQL Server 2005 I was only mildly offended :-) But there’s a lot of truth in that statement – the error messages from CHECKDB do tell you exactly what’s corrupt in the database but they’re not exactly easy to read and understand unless you’re intimately familiar with the on-disk structures used to store a SQL Server database. It’s pretty easy to understand the output if there’s only one or two error messages, but once you get past 10 or so it becomes extremely difficult to figure out what the actual problem is for normal people (i.e. not CHECKDB geeks like me, Ryan or a handful of senior people in Product Support)

So, how are you supposed to figure out what’s wrong? In this post I want to run through a few tips and tricks you can use.

  • If the recommended repair level (at the very bottom of the output) is REPAIR_REBUILD then only non-clustered indexes are damaged. This meanscyou don’t need to run repair (which needs the database in single-user mode) or restore from backups. You can just manually rebuild the damaged indexes to fix the problem, possibly even online if you're running Enterprise Edition and the index doesn't have any LOB columns.
  • There are a couple of cases where even if it’s only non-clustered indexes that are damaged, REPAIR_ALLOW_DATA_LOSS may still be the recommended repair option. In this case, check the index ID in all the error messages – if all index IDs are 2 or higher, then it’s only non-clustered indexes that are damaged and you can rebuild them yourself instead of running repair/restore.
  • Figure out a list of tables that are affected by looking at the summary messages about how many corruptions were found per table. If each of the tables is isolated on a separate filegroup then these filegroups can be taken offline and restored individually, possibly without interrupting the application workload. If there's no backup, then you can run DBCC CHECKTABLE with repair, for a faster repair operation than running a full CHECKDB.
  • Figure out the list of damaged pages. There are a number of error messages that could point to a damaged page – usually 8928 or 2537 – and then you can choose to do single-page restores from your backups to minimize downtime, possibly even online in Enterprise Edition.
  • Are there any errors that CHECKDB can’t repair?  If so, you have no choice but to restore from backups or to extract as much info as possible into a new database. Things to look for that mean repair won’t be able to fix everything are:
    • CHECKDB stops early and complains about system table pre-checks failing (errors 7984 – 7988 inclusive)
    • CHECKDB reports any metadata corruption (8992, 8995 errors)
    • CHECKDB reports any errors on PFS page headers (8939, 8946 errors with a possible 8998 error as well)

While I was at Microsoft I wrote two 200-page+ documents detailing all the CHECKDB error messages for SQL Server 2000 and 2005 – the team says they should all be translated into Books Online entries by next summer – a bunch of them are there already. In the meantime, I hope the hints above will take a little of the mystery out of CHECKDB’s output.

This is a question that came up yesterday in out Disaster Recovery class so I'm typing it up in between attending sessions at Microsoft Day here at SQL Connections. It's an interesting experience watching all the MS speakers walking around in the distinctive blue shirts and no longer having to wear one myself.

The question is the following - why does DBCC CHECKDB terminate with an out-of-space error in SS2005? A customer had a 500GB database spread over 17 LUNs on a SAN, with each LUN having only 5GB free. There's a heavy concurrent workload running while CHECKDB is running and very often it doesn't compelte but instead stops with an error. What's going on?

The reason for this is the way that CHECKDB gets a transactionally consistent view of the database. In a nutshell, it creates a internal database snapshot of the database and then checks the database snapshot (you can read more about this in my previous post detailing all the steps of CHECKDB). A database snapshot needs to have one snapshot file for each file in the source database. In the case of CHECKDB, the snapshot files are created as alternate-streams of the existing database files - i.e. stored on the same disk volume - and you have no control over this. This means any changes to the database while CHECKDB is running will cause these alternate streams to grow.
 
If there's a significant concurrent workload while CHECKDB is running, then these alternate streams can get very big very quickly. In the case described in the question, the workload caused one of the alternate streams to take up all available space on the LUN and then it ran out of space. When this happens the snapshot is no longer valid and so CHECKDB has to stop.
 
There are two solutions to this. The obvious first one is to run CHECKDB in a period with low concurrent workload. That's not feasible for many 24x7 shops so the better solution is to create your own database snapshot (so you can control the placement of it) and then run CHECKDB on that. This is no different than having CHECKDB create its own snapshot.
 
There's another reason that CHECKDB may stop with an out-of-space error.  CHECKDB has to validate the contents of the whole database. As such it needs to store info about things it has seen at one point during database scannig so that it can match that info against things it sees at later points during database scanning. These bits of info are stored in an in-memory worktable. In many cases though, the amount of info being stored exceeds the available memory and the worktable needs to spill over to tempdb. (For example, running CHECKDB against a 1TB database on a machine with 16GB of memory - the amount of info that CHECKDB needs to store the intermediate pieces of info will likely exceed the memory available to SQL Server). So - if tempdb isn't big enough to store the worktable, it has to grow. If it can't grow, then CHECKDB will fail because it needs the worktable to operate.
 
You can check in advance how much (estimated) space will be needed by CHECKDB by running it using the WITH ESTIMATEONLY option and then making sure that tempdb is sized accordingly to accomodate the CHECKDB requirements AND the regular tempdb space requirements. See Capacity Planning for tempdb in Books Online for more info.

(Be sure to join our community to get our monthly newsletter with exclusive content, advance notice of classes with discount codes, and other SQL Server goodies!)

(If you're having to read this because no-one is managing your SQL Server, consider contacting us to manage them for you - see here for details.)

This is a follow-on article from two posts:

People get themselves into situations where they have no backups (or damaged backups) and the data or log files are damaged such that the only way to access the database is with EMERGENCY mode. In these situations, prior to SQL Server 2005, there was no documented or supported way to fix a database while in EMERGENCY mode - the only guidance could be found on the Internet or from calling Product Support and paying for help. The sequence of events was:

  1. Hack the system tables to get the database into 'emergency' mode.
  2. Use the undocumented and unsupported DBCC REBUILD_LOG command to build a new transaction log.
  3. Run DBCC CHECKDB with the REPAIR_ALLOW_DATA_LOSS option to fix up corruptions in the data files - both those that may have caused the issue, and those caused by rebuilding the transaction log (e.g. because an active transaction altering the database structure was lost).
  4. Figure out what data was lost or is transactionally inconsistent (e.g. because a transaction altering multiple tables was lost) as far as your business logic is concerned
  5. Take the database out of emergency mode
  6. And then all the other stuff like root-cause analysis and getting a better backup strategy

I decided to add a new feature to SQL Server 2005 called EMERGENCY mode repair that will do steps 2 and 3 as an atomic operation. The reasons for this were:

  • Much of the advice of how to do this on the Internet missed steps out (particularly missing step 3!)
  • The DBCC REBUILD_LOG command was unsupported and undocumented and we didn't like advising customers to use it
  • Adding a documented last-resort method of recovering from this situation would reduce calls to Product Support - saving time and money for customers and Microsoft.

So, when in EMERGENCY mode, you can use DBCC CHECKDB to bring the database back online again. The only repair option allowed in EMERGENCY mode is REPAIR_ALLOW_DATA_LOSS and it does a lot more than usual:

  • Forces recovery to run on the transaction log (if it exists). You can think of this as 'recovery with CONTINUE_AFTER_ERROR' - see this post for more details on the real CONTINUE_AFTER_ERROR option for BACKUP and RESTORE. The idea behind this is that the database is already inconsistent because either the transaction log is corrupt or something in the database is corrupt in such a way that recovery cannot complete. So, given that the database is inconsistent and we're about to rebuild the transaction log, it makes sense to salvage as much transactional information as possible from the log before we throw it away and build a new one.
  • Rebuild the transaction log - but only if the transaction log is corrupt.
  • Run DBCC CHECKDB with the REPAIR_ALLOW_DATA_LOSS option.
  • Set the database state to ONLINE.

It's a one-way operation and can't be rolled back. I always advise taking a copy of the database files before doing this in case something goes wrong or there are unrepairable errors. And if it does? Probably time to update your resume for not having a water-tight backup and disaster-recovery strategies in place. Saying that, I've never seen it fail. I can think of some pathalogical cases where it would fail though (involving the file system itself having problems) but that's really unlikely.

Let's walk-through an example of using it. I'm assuming there's a database called emergencydemo that's in the same state as at the end of the Search Engine Q&A #4 blog post - the database has no log file, is in EMERGENCY mode and the salaries table is corrupt.

First off I'll try bringing the database online, just to see what happens:

ALTER DATABASE emergencydemo SET ONLINE;
GO

File activation failure. The physical file name "C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\emergencydemo_log.LDF" may be incorrect.
The log cannot be rebuilt because the database was not cleanly shut down.
Msg 945, Level 14, State 2, Line 1
Database 'emergencydemo' cannot be opened due to inaccessible files or insufficient memory or disk space. See the SQL Server errorlog for details.
Failed to restart the current database. The current database is switched to master.
Msg 5069, Level 16, State 1, Line 1
ALTER DATABASE statement failed.

The first message makes sense - the database knows it needs to be recovered because it wasn't cleanly shut down, but the log file simply isn't there. The second message is from the new feature in 2005 that will automatically create a log file if one is missing on startup or attach - as long as the database was cleanly shut down. The 945 and 5069 errors are self-explanatory but notice that the database gets switched to master underneath us. This has bitten me several times in the past.

Well, I expected that not to work. Let's run emergency-mode repair:

DBCC CHECKDB (emergencydemo, REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS;
GO

Msg 945, Level 14, State 2, Line 1
Database 'emergencydemo' cannot be opened due to inaccessible files or insufficient memory or disk space. See the SQL Server errorlog for details.

Hmm - looks like the failed ALTER DATABASE statement did change the state - but what to?

SELECT state_desc FROM sys.databases WHERE name='emergencydemo';
GO

state_desc
------------------------------------------------------------
RECOVERY_PENDING

That makes sense I guess. Ok - back to emergency mode and run repair:

ALTER DATABASE emergencydemo SET EMERGENCY;
GO
DBCC CHECKDB (emergencydemo, REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS;
GO

Msg 7919, Level 16, State 3, Line 1
Repair statement not processed. Database needs to be in single user mode.

EMERGENCY mode is not SINGLE_USER mode - a database needs to be in SINGLE_USER mode for repair to run and EMERGENCY mode allows multiple connections from members of the sysadmin role. You can set SINGLE_USER mode as well as EMERGENCY mode - however, the sys.databases field state_desc will still just say EMERGENCY.

ALTER DATABASE emergencydemo SET SINGLE_USER;
GO
DBCC CHECKDB (emergencydemo, REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS;
GO

File activation failure. The physical file name "C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\emergencydemo_log.LDF" may be incorrect.
The log cannot be rebuilt because the database was not cleanly shut down.
Warning: The log for database 'emergencydemo' has been rebuilt. Transactional consistency has been lost. The RESTORE chain was broken, and the server no longer has context on the previous log files, so you will need to know what they were. You should run DBCC CHECKDB to validate physical consistency. The database has been put in dbo-only mode. When you are ready to make the database available for use, you will need to reset database options and delete any extra log files.

This time it worked. First of all we get the same error as if we tried to bring the database online - that's from the code that's trying to run 'recovery with CONTINUE_AFTER_ERROR' on the transaction log. Next we get a nice long warning that the transaction log has been rebuilt and the consequences of doing that (basically that you need to start a new log backup chain by taking a full backup). If there had been any corruptions we'd have seen the usual output from DBCC CHECKDB about what errors it found and fixed. There's also a bunch of stuff in the error log

2007-10-02 17:21:20.95 spid51      Starting up database 'emergencydemo'.
2007-10-02 17:21:20.96 spid51      Error: 17207, Severity: 16, State: 1.
2007-10-02 17:21:20.96 spid51      FileMgr::StartLogFiles: Operating system error 2(The system cannot find the file specified.) occurred while creating or opening file 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\emergencydemo_log.LDF'. Diagnose and correct the operating system error, and retry the operation.
2007-10-02 17:21:20.96 spid51      Starting up database 'emergencydemo'.
2007-10-02 17:21:21.10 spid51      Starting up database 'emergencydemo'.
2007-10-02 17:21:21.18 spid51      Warning: The log for database 'emergencydemo' has been rebuilt. Transactional consistency has been lost. The RESTORE chain was broken, and the server no longer has context on the previous log files, so you will need to know what they were. You should run DBCC CHECKDB to validate physical consistency. The database has been put in dbo-only mode. When you are ready to make the database available for use, you will need to reset database options and delete any extra log files.
2007-10-02 17:21:21.18 spid51      Warning: The log for database 'emergencydemo' has been rebuilt. Transactional consistency has been lost. The RESTORE chain was broken, and the server no longer has context on the previous log files, so you will need to know what they were. You should run DBCC CHECKDB to validate physical consistency. The database has been put in dbo-only mode. When you are ready to make the database available for use, you will need to reset database options and delete any extra log files.
2007-10-02 17:21:21.99 spid51      EMERGENCY MODE DBCC CHECKDB (emergencydemo, repair_allow_data_loss) WITH no_infomsgs executed by ROADRUNNERPR\paul found 0 errors and repaired 0 errors. Elapsed time: 0 hours 0 minutes 1 seconds.

Note that the usual error log entry from running DBCC CHECKDB is preceded by 'EMERGENCY MODE' this time.

Checking the database state:

SELECT state_desc FROM sys.databases WHERE name='emergencydemo';
GO

state_desc
------------------------------------------------------------
ONLINE

we find that it's been brought back online again because everything worked. It's still SINGLE_USER though so let's make it MULTI_USER and see what happened to our table:

ALTER DATABASE emergencydemo SET MULTI_USER;
GO
USE EMERGENCYDEMO;
GO
SELECT * FROM salaries;
GO

FirstName            LastName             Salary
-------------------- -------------------- -----------
John                 Williamson           10000
Stephen              Brown                0
Jack                 Bauer                10000

(3 row(s) affected)

And of course its still corrupt - because even though the transaction log was rebuilt and repaired, the original transaction that changed the salary to 0 never got a chance to rollback becuase I deleted the transaction log (in the previous post).

Now remember, you should only use this as a last resort, but if you do get yourself into trouble, you know there's a command that should be able to help you.

Quickie this morning - I was surfing some MVP blogs this morning and came across a series of posts by Allen White around using VB Script and SMO to automate regular maintenance jobs. His latest post contains code to regularly run consistency checks and more, and report any failures. Very nice - check it out!

On the Storage Engine blog last year I started two series that I got distracted from - one on CHECKDB and one on fragmentation. With the benefit of hindsight, I'm going to start 3 series on my new blog here - one on CHECKDB ('CHECKDB from every angle'), one on indexes ('Indexes from every angle'), and one on internals ('Inside the Storage Engine'). The first few posts of each will be updated reposts of a few from the previous blog, just for completeness.

I realize that I promised to finalize the DBCC whitepaper this summer - well, that's been delayed a little by the wedding and other stuff. Once I've done a few blog posts in this series, I'll tie everything up into a PDF and add a link to it. This post is a good start - 6500 words. Phew.

First up is describing just what CHECKDB does. As with all things related to DBCC, this topic has its share of misinformation. In this post I'll set the record straight by running through all the stages of CHECKDB in SQL Server 2000 and 2005. This was originally split into 5 posts but I'll bring them all together today, add some more info, and make things a little clearer. There are a lot of terms in this post that may be new to some of you - I'll cover them in the internals series over time.

1. Get a transactionally consistent view of the database

CHECKDB needs a consistent view of the database. Why? Well, usually its running on a live database with in-flight transactions. It needs to read and analyze the whole database but it can't do it instantaneously so it has to take steps to ensure that what it reads is transactionally consistent.

Here's an example. Consider a transaction to insert a record into a table that is a heap and has a non-clustered index, with a concurrent CHECKDB that doesn't enforce a consistent view. The table record is inserted first, and then the non-clustered index record is inserted (that's just the way the operations are split up in the database engine).  Because this hypothetical CHECKDB doesn't have a consistent view, it could read the record in the table but not that in the index, conclude that the non-clustered index is out of sync with the table and flag an 8951 missing-index-record error.

How could this happen? Depending on the order in which the pages are read by the CHECKDB process, the page on which the new non-clustered index record should go could be read before the page on which the new heap record should go. If the index page read happens before the insert transaction, and the table page read happens after the insert transaction, then we see the inconsistent state.

The easy way to get the consistent state is through locking, which is what SQL Server 7.0 did. You can still do that in SQL Server 2000 and 2005 using the WITH TABLOCK option. Another way to do it is to put the database into single-user or read-only mode. However, locking causes workload blocking and setting a databases into read-only or single-use mode is essentially taking it (and your application) offline.

For SQL Server 2000 we came up with a neat way to get the consistent view and be able to run CHECKDB online - post-read log analysis. In a nutshell, after we've read through all the database pages, we read the transaction log to make sure we didn't miss any changes that occured while the full-database read was happening. This is more complex than it sounds - here's what it does:

  • When CHECKDB starts, it turns on 'replication-style' logging - where everything is fully logged - and prevents log truncation. Even if you're in full recovery mode and you're taking log backups - the log will not truncate until CHCEKDB releases it. This can be a problem in high-volume systems with limited log space.
  • Once CHECKDB has read through all the allocated pages in the database, it reads through all the active transaction log - from the LSN of the 'BEGIN TRAN' log record of the oldest transaction that is active at the time the database scan started, to the LSN at the time the database scan stops.
  • Log records from transactions that commit during that time are used to generate REDO facts. (A CHECKDB 'fact' is a piece of information about something in the database - e.g. page (1:345) is allocated to IAM chain 865398993 - I'll explain how and why we use these later in the series.) A REDO fact either reinforces something we've already seen (in which case it is ignored) or provides information on something we haven't seen. For example:
    • a page allocation log record would produce a REDO fact of 'page X is allocated to IAM chain Y'
    • a row insertion record (such as from the index example above) would produce a REDO fact of 'a row with these index keys was inserted into page A of table B, index C at slot position S'
  • Log records from transactions that rollback or don't commit during that time are used to generate UNDO facts. An UNDO fact either cancels something that we've already seen (e.g. the first half of the index example above, if it didn't commit while CHECKDB was doing the database scan) or reference something we haven't seen (in which case we ignore it). For example:
    • page allocation log record would produce an UNDO fact of 'page X was deallocated from IAM chain Y'. So, if we'd seen the actual page after it was allocated, we would have generated a 'page X is allocated to IAM chain Y' fact, and so this UNDO fact would cancel out that piece of information.
    • a row insert record would produce an UNDO fact of 'a row with these index keys was removed from page A of table B, index C at slot position S'

As you may have realized, what we're essentially doing is our own log recovery, inside CHECKDB, but without actually affecting the database. This can get excruciatingly complicated (e.g. having to generate UNDO facts from the compensation log records that wrap sections of a cancelled index rebuild transaction...) I spent too many days of 2000 working out what was going on in the log and making tweaks to this code. However, it worked really well and we had online CHECKDB finally. The kudos for writing most of that stuff goes to Steve Lindell - while he was busy writing the online code I was up to my eyes writing DBCC INDEXDEFRAG (another story).

There are some problems with this mechanism however.

  • As I mentioned above, the log is switched to everything being logged and is prevented from being truncated until CHECKDB has read and processed the active portion.
  • The log-reading part of CHECKDB is single-threaded. On a large, heavily-loaded multi-proc system, this can mean that the log-reading portion of the CHECKDB process can take a very long time as the single CPU struggles to catch-up with all the other CPUs generating lots of transaction log.
  • There are a few, rare pathalogical cases where not absolutely everything I needed to work out what had happened was logged in the transaction log. This meant I essentially had to make an educated guess - and the guess isn't always right. So - sometimes CHECKDB could give you false indication of corruption. This led to a certina level of paranoia about CHECKDB in SQL Server 2000 - people sometimes ran it twice if any errors were reported the first time round.

Back in late 2000, it became apparent that with all the new features we were planning for SQL Server 2005, including some changes to the transaction log to allow for fast recovery and deferred transactions and stuff like versioning and online index build, the transaction log analysis wasn't going to work any more. While it had given us the holy-grail of online consistency checks, with all the added complications of SQL Server 2005 features, it would become unfeasible to maintain and get right.

For SQL Server 2005, CHECKDB uses an internal database snapshot to provide the required transactional consistency. The in-depth details of database snapshots are beyond the scope of this post. A few things to be aware of:

  • Database snapshots use NTFS sparse-file technology (so that a database snapshot does not occupy the same amount of disk space as the source database)
  • Database snapshots use copy-on-write technology. A page is only copied from the source database to the database snapshot when it changes in the source database - and it only needs to be copied once - the first time it's changed. The only pages stored in a database snapshot are those that have changed since the snapshot was created.
  • Database recovery is run when the snapshot is created, but recovery is run into the snapshot, not the source database. This means any pages that need to be changed as part of recovery are read from the source database, altered by recovery, and then written into the snapshot.
  • Books Online has some more info about their use by DBCC - look up 'DBCC Statements' in the index, and go to the 'DBCC Internal Database Snapshot Usage' section.

By moving to database snapshots we changed to using mainline server code to get our transactionally consistent view, rather than our own version of the transaction log recovery code. This vastly reduced the complexity of the code and meant that someone else was responsible for finding and fixing its bugs during development.

There are a few slight gotchas (all documented) with this approach:

  • Sparse files are only available with NTFS so online checks can't be run on databases stored on FAT or FAT32 volumes (in fact database snapshots cannot be created on these databases either)
  • Recovery cannot be run on TEMPDB, so online checks can't be run on TEMPDB (CHECKDB automatically switches to locking in that case)

So when CHECKDB starts in SQL Server 2005, the first thing it does is work out whether it can run online - if so it creates a hidden database snapshot of the source database (i.e. CHECKDB's target database). As the database snapshot is hidden, there is no control over where the snapshot files are placed - in fact they are created as alternate streams of the files comprising the source database. That could cause a problem - depending on the transaction load concurrent with CHECKDB, the hidden database snapshot can grow in size. This means its possible for the disk to run out of space on high volume systems, which means the database snapshot (or database filaes) cannot grow - bringing your workload and CHECKDB to a halt. If this does become a problem, you can easily create your own database snapshot and run CHECKDB on that - it's exactly the same as running CHECKDBnormally and letting it create its own database snapshot.

Once the database snapshot is created, CHECKDB is guaranteed a transactionally consistent view of the database and can run the various check algorithms against the database snapshot. Although by creating a database snapshot CHECKDB is checking the database as it was at some point in the past, that point is the start time of the CHECKDB, just as it was with the log analysis mechanism in SQL Server 2000 - so there's no real difference.

2. Primitive checks of critical system tables

This stage is in SQL Server 2000 and 2005. First of all, what are critical system tables? These are the system tables that hold Storage Engine metadata. Without these CHECKDB have no idea where any data was stored in the database files or how to interpret records.

In SQL Server 2000, the critical system tables are:

  • sysindexes
  • sysobjects
  • syscolumns

These tables have to be checked first because CHECKDB uses the metadata they contain to access all the other tables and indexes in the database. These tables are freely queryable so poke about and see what's stored in there. In SQL Server 2005 these table names still exist, but they're actually views over new system tables, maintained for backwards compatibility.

In SQL Server 2005, the metadata layer has been rewritten and the critical system tables are:

  • sys.sysallocunits
  • sys.syshobts
  • sys.syshobtcolumns
  • sys.sysrowsets
  • sys.sysrowsetcolumns

More on allocation units, hobts, and rowsets in the internals series - for now you can assume they serve the same function as the three critical system tables in SQL Server 2000. You can't see these new system tables because they're 'hidden' - the parser won't allow them to be bound to in a query (except in some DBCC commands like CHECKTABLE). Try running the following to see what I mean:

SELECT * FROM sys.sysallocunits;

GO

The primitive checks are designed to check that internal queries on the metadata tables won't throw errors. Each of the critical system tables has a clustered index. The primitive checks verify that the leaf-level data pages of the clustered indexes are valid. For every one of these pages, the following is done:

  • Read and latch the page (a latch is a lightweight internal version of a lock).  This makes sure that there aren't any IO problems with the page such as a torn-page or bad page checksum and ensures that the page can be read into the buffer pool correctly. This is the most common cause of failure of the primitive system table checks and results in error 8966, which in SQL Server 2000 could look something like:

Server: Msg 8966, Level 16, State 1, Line 1
Could not read and latch page (1:33245) with latch type SH. sysobjects failed.

  • Audit the page. This is a series of checks of the page structures which I'll cover in a seperate post. If these pass, the page looks like a SQL Server page of the type its supposed to be.
  • Check the basic page linkage. Pages in each level of a clustered index are linked together in a doubly-linked list to allow range scans to work. At this stage we only check the left-to-right linkage to ensure the next page in the logical ordering actually exists.
  • Check the page linkage for loops. This is simple to do - have two pointers into the page linked-list with one advancing at every step and one advancing at every second step. If they ever point to the same thing before the faster-advancing pointer reaches the right-hand side of the leaf level then there's a loop. Its important that there are no linkage loops otherwise a range scan may turn into an infinite loop. I've never seen this occur in the field.

Any error found at this stage cannot be repaired so you must restore from a backup. This is because the repair would have to deallocate the page, effectively deleting metadata for a bunch of tables and indexes. As databases get larger and more complex (thousands of tables and indexes), the percentage of pages that comprise these critical system tables rises and so the chance of a hardware problem corrupting one of these pages also rises - I see several of these a month on the forums. Without a backup, the only alternative is to try to export as much data as you can - not good.

If all the critical system table clustered index leaf-level pages are ok then CHECKDB knows it has solid enough metadata on which to base the next set of checks.

3. Allocation checks

This stage is in SQL Server 2000 and 2005. These checks verify the various structures (IAM pages, IAM chains/allocation units, GAM/SGAM pages, PFS pages) that track the status of pages and extents that have been allocated within a database. I'll describe how we go about collecting information from the various pages and then describe what some of the actual checks are.

The allocation checks are very fast (orders of magnitude faster than the logical checks that follow in the next stage) because the number of database pages that have to be read is very small.  The algorithm for gathering allocation data is as follows:

  • For each file in each online filegroup in the database (except transaction log files):
    • Read all PFS pages (this provides a bitmap showing all IAM pages, plus another one showing all mixed pages).
    • Read the GAM pages (this provides bitmaps of all allocated extents).
    • Read the SGAM pages (this provides bitmaps of all mixed extents with at least one unallocated page).
    • Read the DIFF_MAP pages (a 'differential bitmap' page shows which extents in the GAM interval have been modified since the last full or differential backup - a differential backup only needs to backup those extents marked modified in the various DIFF_MAP pages). This is just to make sure the pages can be read.
    • Read the ML_MAP pages (a 'minimally-logged bitmap' page shows which extents in the GAM interval have been modified in bulk-logged recovery mode since the last log backup - a log backup must also backup all such extents to ensure that all changes to the database have been backed up. This can make the log backup quite large (although the log itself stays much smaller) - but that's a topic for another blog post. Again, this is just to make sure the pages can be read.
    • Read all IAM pages. This provides:
      • A list of all the mixed pages in the file, and by derivation, a list of all mixed extents in the file (remember that the first IAM page in an IAM chain/allocation unit contains an array to hold up to 8 mixed pages for the IAM chain/allocation unit it represents
      • A list of all the valid IAM pages in the file
      • A list of all the allocated dedicated extents in the file
      • Linkage information for IAM chains
  • After all the per-file stuff, read the Storage Engine metadata. This provides: 
    • Information about the root of each IAM chain (the first IAM page in the IAM chain). In SQL Server 2000, this is stored in sysindexes. In SQL Server 2005, this is stored in the sys.sysallocunits hidden system table.
    • Information about IAM chains currently waiting to be 'deferred-dropped'. Deferred-drop is a SQL Server 2005 optimization that prevents a transaction from running out of lock memory while dropping an IAM chain. It is the process by which an IAM chain with > 128 extents that is dropped - by dropping/rebuilding an index or dropping/truncating a table - does not have its actual pages and extents deallocated until after the transaction has committed. The IAM chain is unhooked from sys.sysallocunits though and hooked into an internal queue - if CHECKDB didn't scan that queue too as part of the allocation checks, it might see all kinds of inconsistencies with the various allocation bitmaps.

So, now CHECKDB has a whole bunch of allocation data that its collected and it needs to make sense of it all to ensure the allocation structures are correct. Here's a non-exhaustive list of checks that are done with this data:

  • Check that each extent is either allocated to:
    • the GAM page for the GAM interval, or
    • the SGAM page for the GAM interval, as a non-full mixed extent, or
    • exactly one IAM page that covers the GAM interval, or
    • to none of the bitmap pages, but all pages in that extent must be allocated to IAM pages as mixed pages
      • this could result in an 8903 (GAM and SGAM), 8904 (multiple IAMs), or 8905 (no page) errors depending on the combination of bitmaps that have the extent allocated
  • Check that all pages marked as being IAM pages in PFS pages really are IAM pages when they're read
  • Check that all pages marked as being mixed pages in PFS pages appear somewhere in a mixed page array on an IAM page
  • Check that each mixed page is only allocated in a single IAM page
  • Check that the IAM pages in an IAM chain have monatonically increasing sequence numbers
  • Check that the first IAM page in an IAM chain has a reference from a row in sys.sysallocunits (or sysindexes if on SQL Server 2000)
  • Check that no two IAM pages within the same IAM chain map the same GAM interval
  • Check that all IAM pages within an IAM chain belong to the same object/index/partition
  • Check that the linkages within an IAM chain are correct (no missing pages for instance)
  • Check that all IAM/GAM/SGAM pages that map the final GAM interval in a file do not have extents marked allocated that are beyond the physical end of the file

Any errors found here will require REPAIR_ALLOW_DATA_LOSS to repair and some of the repairs are very complicated (e.g. multiply-allocated extents) - topic for a future blog post.

So, the allocation checks lay the next foundation level over the system table primitive checks and then CHECKDB is ready to move on to the logical checks.

4. Logical checks

This section really has two parts - peform all the logical checks on the critical system tables, and then perform all the logical checks on all the other tables in the database.

If any errors are found in the critical system tables and repair is not specified, or repair is specified, but not all the errors can be repaired, then the CHECKDB finishes. An example of an unrepairable system table error is something that would require deleting data from one of the system tables - e.g. a corrupt key value in a clustered index data page of sys.sysallocunits (remember that this is the actual hidden table I'm talking about, not the sys.allocation_units catalog view you may have seen or used).

So if the critical system tables are clean, all the tables in the database are checked. This includes indexed views and primary XML indexes (which are both stored as clustered indexes - and as far as the Storage Engine is concerned are objects in their own right - its the Relational Engine that knows that they're not really separate objects).

The following checks are performed:

  1. Validate each table's storage engine metadata
  2. Read and check all data, index and text pages, depending on the page type
  3. Check all inter-page relationships
  4. Check the page header counts in each page
  5. Perform any necessary repairs (if a repair level was specified)

Let's look at the first four stages in more detail - there's too much to go into about repairs in this post.

4.1 Validate each table's storage engine metadata

The Storage Engine metadata is what tells CHECKDB how to crack open records on pages in particular rowsets - so if there's something wrong with it then CHECKDB will generate a bunch of misleading errors or, worse yet, miss some errors. Here's roughly what happens while the metadata is parsed for a single table (for SQL Server 2005):

  • Loop through all the indexes, and then all the rowsets of each index, building a list of known allocation units for the index (including all the DATA, LOB, and SLOB allocation units). This allows reverse lookups when CHECKDB reads a page, because the information stamped on the page is the allocation unit, it needs a very fast way to convert between an allocation unit and an index/object ID.
  • Make sure CHECKDB skips any indexes that are in the middle of being built/rebuilt online, because they will not have a complete set of data.
  • Build a list of computed columns and generate the necessary code to enable the column values to be recomputed.
  • Build a list of columns that are used in non-clustered indexes
  • Make sure the various USED, DATA, RSVD counts are not negative (see the BOL for DBCC CHECKDB for an explanation of how this can happen in versions prior to SQL Server 2005)
  • Figure out what each kind of column is (e.g. a regular column, a generated uniquifier, a dropped column)
  • Build a series of mappings to allow conversion between column IDs at different levels of Storage Engine abstraction
  • Check that the Relational Engine and Storage Eengine nullability flags for a column agree
  • Make sure the columns counters in metadata match what has just been seen

4.2 Read and check all data, index and text pages

No matter what type a page is, the page is audited and then all the records on it are audited.

Page audit first checks for IO errors when the page is read (e.g. page checksums). If there are some, then the page is not processed any further. This is what can lead to errors like 8976 being reported, for example:

Table error: Object ID 132765433, index ID 1, partition ID 72057594038321152, alloc unit ID 72057594042318848 (type DATA). Page (1:3874) was not seen in the scan although its parent (1:3999) and previous (1:3873) refer to it. Check any previous errors.

Then it checks for page header correctness and that the page has an appropriate type for the allocation unit its in (e.g. a DATA page should not be found in an allocation unit for a non-clustered index)

Record audits include checking the various fields in the record header and that the various offsets in the record make sense (e.g. the offset to the variable length columns section of the record should not point off the end of the record. I'll do a detailed post on record formats and cracking at the start of the internals series.

The more complex checks that are done per-page depend on what type the page is. As an example, here's what is done for a DATA page in the leaf level of a clustered index (excluding the inter-page relationships - I'll list those in the next section):

  • Records in the page must be strictly ordered by the defined keys of the index (although the records themselves aren't necessarily stored in sorted order in the data portion of the page, accessing the records through the slot array must yield them in the correct order)
  • No two records can have duplicate key values (remember that non-unique indexes have a hidden, automatically-generated uniquifier column added to the key - to make or extend the composite key - so that record uniqueness is guaranteed)
  • If the index is partitioned, each record is run through the user-defined partitioning function to ensure its stored in the correct partition
  • All the complex columns in each record are checked:
    • Complex columns are those storing legacy text or LOB values (text, ntext, image, XML, nvarchar(max), varchar(max), varbinary(max)) or in-row pointers to variable length columns that have been pushed off-row in rows that are longer than 8060 bytes
    • The column is checked to make sure it's storing the right kind of data - either the value itself or a text pointer or some kind of in-row root containing pointers to portions of an off-row value.
    •  The linkages between what is stored in-row and the off-row values stored in other pages are eventually checked too
  • Check computed columns:
    •  If the column is persisted (either because it's defined as a persisted computed column or because its used as a non-clustered index key), its value is recomputed and checked against the persisted value
    • This is also important when we come to do the non-clustered index cross-checks (see below) - as any discrepancy in the stored column values will cause mismatches
  •  Data purity checks
    • The column value is checked to ensure its within the bounds for its data-type (e.g. the minutes-past-midnight portion of the internal representation of a datetime value cannot be greater than 1440 - 24 hours x 60 minutes)
  • Non-clustered index cross-checks
    • This is probably my favorite part of CHECKDB and is one of the most complicated bits of code - I've rewritten it twice (for 2000 to make the locking work in the final deep-dive part of the algorithm and for 2005 to remove the locking and cope with all the new features that can affect indexes). In fact, it's been rewritten again for SQL Server 2008 to change the record matching algorithm - more details latrer.
    • What CHECKDB is trying to do is make sure that each record in a heap or clustered index has exactly one matching record in each non-clustered index, and vice-versa. The brute-force (n2 complexity) way to do this is to do a physical lookup of all the matching rows whenever one row in the relationship is read - but that's incredibly time consuming so instead there's a fast algorithm to detect problems.
    • Imagine a table defined by the following DDL and with a single row:

CREATE TABLE t (c1 INT, c2 char(10), c3 varchar(max))

CREATE INDEX index1 ON t (c1)

CREATE INDEX index2 ON t (c2) INCLUDE (c3)

    • Each row in the heap has to have two matching non-clustered index rows, one in each of index1 and index2. But how can CHECKDB tell without doing the direct lookup or having to store tremendous amounts of state? It uses a hash-bitmap algorithm.
      • Imagine a large bitmap - say half a million bits. Initially all the bits are 0.
      • CHECKDB happens to read the heap page first, with one record in it. It knows exactly what the two non-clustered index records should look like, so it generates them - exact byte-for-byte matches of what the real non-clustered index records should be. Then, for each generated non-clustered index record it computes a hash value of the record, maps the hash value to a bit in the bitmap and flips the bit. So, we have now have two bits set to 1 in the bitmap.
      • CHECKDB then reads the two non-clustered index pages, each with a single record on. For each record, just hash the record and flip the corresponding bit in the bitmap.
      • The idea is that the bit-flips should cancel each other out (as the real and generated non-clustered index records should be exactly the same and hash to the same value) and the bitmap should be left with all zeroes at the end of the checks.
    • Taking the view that corruptions are magnitudes rarer than clean databases, CHECKDB went a step further and allowed the checks for multiple tables to use the same bitmap. If you think about it, this won't cause any problems, even if two sets of records map to the same bit in the bitmap - as long as the number of bit-flips is a power of 2 (i.e. each record really does have its correct matching record) then there should be no problem.
    • Here's the catch - what happens when there's a bit left on in the bitmap? Well, this is where the trade-off comes into play. If there's a bit left on, CHECKDB can't tell which records in which table or index mapped to it so it has to re-scan the tables and indexs that used the bitmap to see which records map to the bit. For every one it find, it actually does the physical lookup of the matching record and then does a comparison of all the columns, including any LOB columns used as INCLUDE'd columns in non-clustered indexes. This process is called the deep-dive and can add a significant amount to the run-time of CHECKDB if it occurs.
    • Prior to SQL Server 2005 SP2 there was no way to tell whether CHECKDB was going to do a potentially long-running deep-dive. In SP2, error 5268 was added and will be printed to the SQL Server error log when this happens:

2007-09-15 12:26:34.45 spid52      DBCC CHECKDB is performing an exhaustive search of indexes for possible inconsistencies.  This is an informational message only. No user action is required.

4.3 Check all inter-page relationships

Inter-page relationships are relevant for:

  • Pages in heaps that have forwarding or forwarded records
  • Pages in indexes
  • Pages that have records with LOB columns that have their data stored off-row (these can be heap pages, clustered index data pages or non-clustered index leaf pages in SQL Server 2005)
  • Text pages that have records with child nodes on other pages

Here are the checks that are done for index pages, for example:

  • All pages in an index level should be pointed to by a page in the next level higher in the b-tree, and also by the left-hand and right-hand neighboring pages in the same level of the b-tree. Exceptions are made for the left-most and right-most pages in a level (where the m_prevPage and m_nextPage fields are (0:0)) and for the root page of the b-tree, where the parent page link comes from the storage-engine metadata. The 8976 error message that I referenced above is generated from this set of checks.
  • Key ranges in neighboring pages in a level should not overlap
  • Key ranges of pages should be correctly defined by the parent pages in the next level up in the b-tree (the parent pages contain the minimum key value that can exist on a child page)

I'll go into details of how the inter-page checks are done in a future post. For now, its enough to say that its not an n2 complexity algorithm.

4.4 Check the page header counts in each page

The page header contains a bunch of counters - the ones that need to be checked are:

  • slot count
  • ghost record count
  • free space count

The first two are obvious - count the rows as they're processed and make sure page header counts are valid. The free space count is only checked for text pages and data pages in a heap (the only pages for which free space is tracked in a PFS page). 

So, that's the majority of the work in running a DBCC CHECKDB.

5. Higher-level logical checks

The next set of checks are higher-level checks involving multiple structures and are only present in SQL Server 2005:

  1. Service Broker checks
  2. Metadata cross-checks
  3. Indexed view and XML index checks

Its important that these checks are done after the per-table logical checks. This is because any repairs done as part of the per-table checks may affect the outcome of these checks quite substantially. Imagine the case where an indexed view is based on a join between two tables, foo and bar. Table foo has a damaged page and is damaged in a way that reading the page as part of a query would not recognize - because full page audits are not done as part of regular page reads. The damage is such that the page has to be deleted as part of repair - thereby changing the results of the view. If the indexed view was checked before tables foo and bar, then the repair that was done would not get reflected in the indexed view and so the indexed view would essentially be corrupt. The same logic holds for checking XML indexes and Service Broker tables.

Let's look at these final checks in more detail.

5.1 Service Broker checks
The Service Broker development team wrote a comprehensive set of checks of the data stored in their internal on-disk structures. The checks validate the relationships between conversations, endpoints, messages and queues. For example:

  • A conversation must have two endpoints
  • A service must be related to a valid contract
  • A service must be related to a valid queue
  • A message must have a valid message type

What's even cooler is that they also implemented a set of logical repairs, so if one of their internal tables was damaged, and repaired by the earlier logical checks, then the Service Broker repair code can clean up any Service Broker entities and entity relationships that were damaged too.

5.2 Metadata cross-checks
The Metadata team also wrote a great set of checks for the relational metadata stored in the system tables. These checks that are run here are the same code that's run for
DBCC CHECKCATALOG. The actual checks themselves in SQL Server 2005 are far more involved than in SQL Server 2000, and they're done way more efficiently too. However, they're not comprehensive by any means - some more checks were added during SQL Server 2008.

The checks only cover the Relational Engine metadata - i.e. the relationships between system tables storing relational metadata. There are no such checks for the tables storing the Storage Engine metadata - at present the relationships between those tables are checked implicitly by the metadata checks I described above.

There are no repairs for metadata corruptions. Metadata corruptions are extremely difficult to deal with because changing/deleting metadata affects the entire table the corrupt metadata describes and could potentially be as bad as deleting the table - a table without metadata is just a collection of pages with indecipherable rows (well, not quite true - its possible to decipher any record without metadata but it requires human intervention and is incredibly hard).

Its possible the team may put in some limited metadata repairs in a future release, but the frequency of their occurence in the field is so low that I decided that the engineering investment was not justified for SQL Server 2005 or SQL Server 2005 and so didn't push it. So - if you get any metadata corruption, you need to restore from your backups.

5.3 Indexed view and XML index checks
These are very cool and many thanks to
Conor Cunningham (former Dev Lead of one of the two Query Optimizer teams) for helping work out how to do this.The indexed view contains the persisted results of a query, and the actual query is persisted in metadata - so the easiest way to check whether the indexed view is accurate is to recalculate the view results into a temp table in TEMPDB and then compare the calculated values with the values persisted in the indexed view. The view is regenerated into a temporary table and then two concurrent left-anti-semi-joins are run, that basically return all the rows in the indexed view that are not in the recalculated view results, and vice-versa. This gives CHECKDB all the rows that are extraneous in the indexed view and all the rows that are missing from it.

Indexed-view problems can also be repaired. The repair for extra rows is to delete them one by one (using internal query syntax that only works from DBCC), and the repair for missing rows is to rebuild the indexed view. This is done by simply disabling the indexed view and then bringing it back online (which rebuilds it).

There are two drawbacks to the indexed view checks if the views are large - it can take up a lot of space in tempdb and  it can take a lot of time to run the regeneration of the indexed views and to run the left-anti-semi-joins. So if you upgraded your database from SQL Server 2000 and you regularly run full CHECKDBs (i.e. without using WITH PHYSICAL_ONLY), then you may see a run-time increase for CHECKDB on SQL Server 2005 - this is documented in BOL and the README.

XML index checks work in a similar way. An XML index is an index over all the nodes in a shredded version of the XML blob. The XML blobs in the table are re-shredded and checked against the shredding that's persisted in the primary XML index. If anything is wrong, the primary XML index is recreated.

Summary

And that's it. Now you have a complete picture of what's going on with CHECKDB when it runs. In the next few posts I want to give some insight into how CHECKDB manages to do all these checks while only making a single pass through the database and shed some light on how repair works.

These next few posts are based off part of my Secrets of Fast Detection and Recovery from Database Corruptions session from TechEd and various user groups around the world (see here for a video recording from TechEd). I'll also be doing this session at ITForum in Barcelona in November, and as a live webcast for Microsoft sometime over the next month or so (I'll publicize the date nearer the time). I did a few posts on these subjects last year but now I want to reorder them, add a post about using emergency mode and walk you through some emergency mode demo scripts.

In this post I want to describe the two worst things I think you can do to your database - rebuilding your transaction log and running REPAIR_ALLOW_DATA_LOSS - that people often try doing instead of restoring from their backups.

Rebuilding your transaction log

It's pretty well known that in SQL Server 2000 (and before) there's an undocumented and unsupported command called DBCC REBUILD_LOG. What this does is pretty simple - it deletes the transaction log file(s) and creates a new one. It completely disregards any uncommitted transactions that may exist - it just deletes them. This means that these uncommitted transactions don't get a chance to roll back.

What does this mean? Well, in the best case, the only in-flight transactions were altering user data - so your business logic, inherent and constraint-enforced relationships between tables, and the basic logical integrity of your user data are all broken. In the worst case, the in-flight transactions were altering the structure of the database (e.g. doing a page split) so that fact that they didn't get achance to rollback means the databse may be structurally corrupt!

Here's are two examples (somewhat contrived) that illustrate the possible consequences of rebuilding a transaction log.

  1. Logical data loss.

    Imagine you're at an ATM transferring $1000 from your checking account to your savings account. On the SQL Server in the bank's datacenter, the transaction happens in two parts - update the record in the checking accounts table with the balance minus $1000 and then update the record in the savings account table with the balance plus $1000. The process is half-way through - $1000 has been debited from your checking account, but not yet credited to your savings account, when disaster strikes! A work crew outside the datacenter accidentally cuts the power and the machine hosting SQL Server powers down. The ATM displays one of those infuriating "We're sorry, our computers are unavailable at present" and you walk away grumbling but think nothing more about it.

    Meanwhile, the power's been restored to the datacenter and SQL Server is going through crash recovery. The partially completed transaction on your account should rollback and credit back the $1000 to your checking account. But the new DBA at the bank decides that its taking too long for the system to come back up. He delets the transaction log and rebuilds it to get the system up faster. Unfortunately, the portion of the transaction log that had not had a chance to recover included the transaction involving your bank account. Even more unfortunately, a checkpoint occured right before the power loss, and the database page containing the updated checking account balance was flushed to disk. Now, when the transaction log is deleted and rebuilt, your transaction can't rollback - because it's simply gone. So the $1000 debit from your checking account is not rolled back - you've lost $1000!!

  2. Physical database corruption.

Imagine an insert into a table with a single non-clustered index. Under the covers, the insert happens in two parts - insert the record into the table and then insert the corresponding non-clustered index record. Imagine a similar disaster recovery situation as I described above occuring after the table insert has been done but not the non-clustered index insert. If the log is rebuilt, then the table and its index are not in sync - actual phsyical corruption!

Whenever a transaction log is rebuilt, a message is output to the SQL Server error log and the Windows event log. In SQL Server 2005, the message is:

2007-09-14 15:50:48.82 spid52      Warning: The log for database 'test' has been rebuilt. Transactional consistency has been lost. The RESTORE chain was broken, and the server no longer has context on the previous log files, so you will need to know what they were. You should run DBCC CHECKDB to validate physical consistency. The database has been put in dbo-only mode. When you are ready to make the database available for use, you will need to reset database options and delete any extra log files.

In SQL Server 2005, the old DBCC command has been removed and replaced with undocumented (and still unsupported) syntax. However, there is now a fully documented and supported way to do this - which I'll get to in a couple of posts.

Sometimes there's no alternative to rebuilding a transaction log - when the log is physically damaged and there's no backup. In this case, Product Support will walk you through the process of correctly rebuilding the transaction log, running repair to fix up any corruptions, and ensuring you know that the logical integrity of your data could be broken. Unfortunately, all too often I see people simply rebuilding the transaction log and continuing with regular operations - no checks, no repairs, no root-cause analysis.

REPAIR_ALLOW_DATA_LOSS

The vast majority of the time, REPAIR_ALLOW_DATA_LOSS is the repair level that CHECKDB recommends when it finds corruptions. This is because fixing nearly anything that's not a minor non-clustered index issue requires deleting something to repair it. So, REPAIR_ALLOW_DATA_LOSS will delete things. This means it will probably delete some of your data as well. If, for instance it finds a corrupt record on a data page, it may end up having to delete the entire data page, including all the other records on the page, to fix the corruption. That could be a lot of data. For this reason, the repair level name was carefully chosen. You can't type in REPAIR_ALLOW_DATA_LOSS without realizing that you're probably going to lose some data as part of the operation.

I've been asked why this is. The purpose of repair is not to save user data. The purpose of repair is to make the database structurally consistent as fast as possible (to limit downtime) and correctly (to avoid making things worse). This means that repairs have to be engineered to be fast and reliable operations that will work in every circumstance. The simple way to do this is to delete what's broken and fix up everything that linked to (or was linked from) the thing being deleted - whether a record or page. Trying to do anything more complicated would increase the chances of the repair not working, or even making things worse.

The ramifications of this are that running REPAIR_ALLOW_DATA_LOSS can lead to the same effect on your data as rebuilding a transaction log with in-flight transactions altering user data - your business logic, inherent and constraint-enforced relationships between tables, and the basic logical integrity of your user data could all be broken. BUT, the database is now structurally consistent and SQL Server can run on it without fear of hitting a corruption that could cause a crash.

To continue the contrived example from above, imagine your bank checking and savings accounts just happen to be stored on the same data page in the bank's SQL Server database. The new DBA doesn't realize that backups are necessary for disaster recovery and data preservation and so isn't taking any. Disaster strikes again in the form of the work crew outside the datacenter accidentally cutting the power and the machine hosting SQL Server powers down. This time, one of the drives has a problem while powering down and a page write doesn't complete - causing a torn page. Unfortunately, it's the page holding your bank accounts. As the DBA doesn't have any backups, the only alternative to fix the torn-page is to run REPAIR_ALLOW_DATA_LOSS. For this error, it will delete the page, and does so. In the process, everything else on the page is also lost, including your bank accounts!!

Summary

So, you can see how these two operations are really very, very bad things to do to a database and can cause havoc with your data. And yet people still have to  use these operations because they don't have valid backups...

In the next post I'll introduce EMERGENCY mode and how to use it.

Theme design by Nukeation based on Jelle Druyts