The September edition of TechNet Magazine is available on the web now and has the latest installment of my regular SQL Q&A column.

This month's topics are:

  • Online index operations logging changes in SQL Server 2008
  • Is that transaction contained in my full backup?
  • ALTER INDEX ... REBUILD vs. ALTER INDEX ... REGORGANIZE
  • Avoiding regular log file shrinking

Check it out at http://technet.microsoft.com/en-us/magazine/hh395481.aspx.

We're teaching on-site with a financial client in New York for two weeks and in a session this morning on log file architecture I was asked why the VLF (virtual log file) sequence numbers in a new database don't start at one.

Here's an example:

CREATE DATABASE foo;
GO
DBCC LOGINFO ('foo');
GO

FileId  FileSize  StartOffset FSeqNo Status  Parity  CreateLSN
------- --------- ----------- ------ ------- ------- ------------------
2       253952    8192        137    2       64      0
2       253952    262144      0      0       0       0

Why is the first sequence number (FSeqNo) 137 and not 1? (For an explanation of the various fields in the output of the undocumented DBCC LOGINFO, see the post Inside the Storage Engine: More on the circular nature of the log.)

The answer is that it's one more than the highest VLF sequence number currently in the model database. Let's see:

DBCC LOGINFO ('model');
GO

FileId  FileSize  StartOffset FSeqNo Status  Parity  CreateLSN
------- --------- ----------- ------ ------- ------- ------------------
2       253952    8192        136    2       128     0
2       262144    262144      135    0       128     0

The highest FSeqNo in the output above is 136, which accounts for the 137 in the new user database.

Now I'll prove it by increasing the sequence number in the model database:

USE model;
GO
CREATE TABLE BigRows (c1 INT IDENTITY, c2 CHAR (8000) DEFAULT 'a');
GO
INSERT INTO BigRows DEFAULT VALUES;
GO 50
DBCC LOGINFO;
GO

FileId  FileSize  StartOffset FSeqNo Status  Parity  CreateLSN
------- --------- ----------- ------ ------- ------- ------------------
2       253952    8192        136    0       128     0
2       262144    262144      137    2       128     0

And now I'll create another user database and the first VLF sequence number should be 138:

CREATE DATABASE foo2;
GO
DBCC LOGINFO ('foo2');
GO

FileId  FileSize  StartOffset FSeqNo Status  Parity  CreateLSN
------- --------- ----------- ------ ------- ------- ------------------
2       253952    8192        138    2       64      0
2       360448    262144      0      0       0       0

And it is - proven!

The eagle-eyed among you may notice that the log file size of the database foo2 I just created is larger than the size of the foo database I initially created. This is because the default log file size is the higher of 0.5MB or 25% of the sum of the data files created. In this case, the data file is copied from the model database, and I made it larger when I created the BigRows table, so the log file of the new database has to be slightly larger.

One question that came up in the post comments - why is the highest VLF sequence number in my model database 136? No particular reason except I've done a bunch of things in model which generated transaction log and so the VLF sequence numbers have increased. It's almost certain to be different in your SQL Server instances for that reason.

Next up tomorrow - results from the index count survey from last month.

The July edition of TechNet Magazine is available on the web now and has the latest installment of my regular SQL Q&A column.

This month's topics are:

  • Deferred log truncation from concurrent data and log backups
  • Database mirroring monitoring
  • Multiple transaction log files
  • Best use of SSDs in a SQL environment (high-level)

Check it out at http://technet.microsoft.com/en-us/magazine/hh334997.aspx.

The January edition of TechNet Magazine is available on the web now and has the latest installment of my regular SQL Q&A column.

This month's topics are:

  • Diagnosing I/O subsystem bottlenecks
  • Capacity planning for transaction logs
  • Why there are no non-logged operations in user databases

Check it out at http://technet.microsoft.com/en-us/magazine/gg552991.aspx.

(Happy Independence Day to everyone back home in the USA! We're in Dublin for another week...) 

Back at the end of April I conducted a survey of transaction log characteristics (see here) and now I present the results to you.

I had a great response to the survey from almost 100 people, and the total data set for these observations is 16841 databases on 847 SQL Server instances (SQL Server Edition is irrelevant).

This isn't the most exciting blog post, but I at least found the numbers interesting - hope you do too! I've got some pointers in the summary to log management blog posts.

Number of Databases Per Instance 

Each plot on the graph below is a count of the number of instances having that many databases, including the four system databases.

 

Just under 75% of instances have 20 or fewer databases (including system databases). The highest number per instance is 571.

Number of Log Files Per Database

This wasn't worth plotting. The results are:

Log Files   Number of Databases
1           16657
2           158
3           9
4           8
5           1
6           4
7           1
24          1

I'm impressed by this - I expected to see more databases with multiple log files. The one with 24 is tempdb - someone setting the number of log files to equal the number of processor cores I'd guess.

System Databases: Log File Size

Each plot on the graph below shows the number of log files of that size across all the instances, color-coded for master, msdb and tempdb.

 

89% of master database log files are less than 10MB (a number I arbitrarily chose as a comparison point).

Only 40% of msdb log files are less than 10MB. This would suggest that there are a high proportion of msdb databases not using the SIMPLE recovery model. However, only 5% of them aren't using SIMPLE.

Only 33% of tempdb log files are less than 10MB. I would expect this number to be low.

Looking at the distribution on the graph, master log files (blue) are generally smaller, tempdb log files (green) are generally larger, with msdb (red) in the middle. Again, this is what I'd expect to see.

System Databases: Number of VLFs

Each plot on the graph below shows the number of log files with that many VLFs across all the instances, color-coded for master, msdb and tempdb. (If you're unfamiliar with the term 'VLFs', see the TechNet Magazine article I link to at the bottom).

 

98.6% of all master log files have less than 100 VLFs (our arbitrary 'you may have too many VLFs' tipping point), with the maximum being 331.

Only 65% of all msdb log files have less than 100 VLFs. From the other 35%, only 5% of them aren't using the SIMPLE recovery model. The maximum number of VLFs is 7646.

Surprisingly, 75% of tempdb log files have less than 100 VLFs. I expected this number to be higher in general, because of tempdb autogrowth. The maximum number of VLFs is 59708.

Looking at the graph, master (blue) is as I expected, but msdb (red) and tempdb (green) are the opposite of what I expected - with more msdb log files having more VLFs.

User Databases: Log File Size

Out of all 13451 user databases in the sample:

  • 6797 use the SIMPLE recovery model (50.5%)
  • 177 use the BULK_LOGGED recovery model (1.3%)
  • 6477 use the FULL recovery model (48.2%)

This in itself is surprising - I expected to see a higher proportion of FULL recovery model databases. 

Given that the log file behavior is the same for FULL and BULK_LOGGED recovery models, and as so few databases use the BULK_LOGGED recovery model, I lumped the FULL and BULK_LOGGED databases together for analysis.

Each plot on the graphs below shows the number of log files of a certain size across all the instances.

  

 

The graphs above have different sizes because more databases in the SIMPLE recovery model are larger sized. The overall distribution looks very similar though.

90.7% of user databases using the SIMPLE recovery model have a log file size less than 1GB, compared with 83.0% for non-SIMPLE user databases.

98.2% of SIMPLE user databases have a log file size less than 10GB, compared with 96.8% for non-SIMPLE user databases.

The maximum size of a log file for a SIMPLE user database is 1020GB, dropping to 326GB for a non-SIMPLE user database.

The non-SIMPLE user databases seem to be slightly better managed. My guess would be this represents databases that have been in the FULL or BULK_LOGGED recovery models, have grown out of control and been switched to SIMPLE, but not cleaned up.

User Databases: Number of VLFs

Each plot on the graphs below shows the number of log files with a certain number of VLFs across all the instances.

 

 

As with the log file sizes, the distributions of these two graphs look very similar. 

76.9% of user databases using the SIMPLE recovery model have less than 100 VLFs, dropping to 67.8% for those not using SIMPLE.

98.4% of user databases using the SIMPLE recovery model have less than 500 VLFs (definitely a point at which maintenance is required), dropping to 97.0% for those not using SIMPLE.

The maximum number of VLFs for the SIMPLE user databases is 34057, and 27093 for those not using SIMPLE.

Given that more user databases using SIMPLE are larger, but have slightly fewer VLFs, I'd guess this indicates that autogrowth of thse databases was slightly higher.

User Databases: Log File Size vs. Number of VLFs

At first glance, the graph below looks confusing. It shows the number of VLFs each log file has compared to its size in MB. If you look at the first few log file sizes, you'll see that there are multiple points plotted on the Y-axis. This is because there are databases that have the same size log files, but the log files have different numbers of VLFs.

 

There's definitely a loose trend-line in the graph, with a knee at around 50MB log file size.

Summary

I'm impressed. 75% of all databases in the sample have less than 100 VLFs, and 97.4% have less than 500 VLFs. I expected the results to be worse, but they seem show that the majority of you are managing your logs well. Now, these results are a bit skewed though because they're coming from people who read my blog, where I bang on and on about good maintenance and management.

I was very surprised by the 50/50 split of SIMPLE/non-SIMPLE recovery model for user databases. My guess is that more of these *should* be in FULL to comply with business SLAs around RPO/data-loss, but for some reason aren't, or aren't any longer because of out-of-control growth issues.

Thanks to everyone who sent me results!

Here are some links for further reading:

Enjoy!

Categories:
Surveys | Transaction Log

One of the first things you should always check when a database has been damaged and you're going to perform a restore operation is whether you need to back up the tail of the log.

The tail of the log contains the transaction log that's been generated since the most recent log backup was taken. If you want to be able to recover right up to the point of the disaster, you need to be able to get those log records so they can be part of the restore sequence.

There are two cases to consider here 1) the server is still available 2) the server is not available. For case 1, you can just perform a regular tail-of-the-log backup. For case 2, you'll need to hack-attach the log into another server to be able to back it up.

Case 1: tail-of-the-log backup when server is available

If the database is damaged but the server is still available, taking a tail-of-the-log backup is simple. The only exception is when there has been a minimally-logged operation performed in the BULK_LOGGED recovery model since the previous log backup - in that case a tail-of-the-log backup is not possible at all, and you'll have lost all transaction log generated since the last log backup. See A SQL Server DBA myth a day: (28/30) BULK_LOGGED recovery model for more details.

When the data files are damaged and you want to take a log backup, you'll get an error if you try to back up the log normally.

As an example, I'll create a database and put some transactions in it:

CREATE DATABASE DBMaint2008;
GO
USE DBMaint2008;
GO

CREATE TABLE TestTable (C1 INT IDENTITY, C2 CHAR (100));
GO

-- Take a full backup
BACKUP DATABASE DBMaint2008 TO DISK = 'D:\SQLskills\DemoBackups\DBMaint_Full.bck' WITH INIT;
GO

-- Insert some rows
INSERT INTO TestTable VALUES ('Transaction 1');
INSERT INTO TestTable VALUES ('Transaction 2');
GO

-- Take a log backup
BACKUP LOG DBMaint2008 TO DISK = 'D:\SQLskills\DemoBackups\DBMaint_Log1.bck' WITH INIT;
GO

-- Insert some more rows
INSERT INTO TestTable VALUES ('Transaction 3');
INSERT INTO TestTable VALUES ('Transaction 4');
GO

Now disaster will strike - I'll do the following to simulate a disaster:

  1. ALTER DATABASE DBMaint2008 SET OFFLINE
  2. Delete the data file

If I try to then access the database, I'll get:

ALTER DATABASE DBMaint2008 SET ONLINE;
GO

Msg 5120, Level 16, State 101, Line 1
Unable to open the physical file "C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\DBMaint2008.mdf". Operating system error 2: "2(The system cannot find the file specified.)".
Msg 945, Level 14, State 2, Line 1
Database 'DBMaint2008' cannot be opened due to inaccessible files or insufficient memory or disk space.  See the SQL Server errorlog for details.
Msg 5069, Level 16, State 1, Line 1
ALTER DATABASE statement failed.

So I'll try a normal log backup:

BACKUP LOG DBMaint2008 TO DISK = 'D:\SQLskills\DemoBackups\DBMaint_Log_Tail.bck' WITH INIT;
GO

Msg 945, Level 14, State 2, Line 1
Database 'DBMaint2008' cannot be opened due to inaccessible files or insufficient memory or disk space.  See the SQL Server errorlog for details.
Msg 3013, Level 16, State 1, Line 1
BACKUP LOG is terminating abnormally.

It doesn't work - as the data files aren't all accessible.

The trick is to use the NO_TRUNCATE option, which allows the log backup even if the database files aren't there:

BACKUP LOG DBMaint2008 TO DISK = 'D:\SQLskills\DemoBackups\DBMaint_Log_Tail.bck' WITH INIT, NO_TRUNCATE;
GO

Processed 2 pages for database 'DBMaint2008', file 'DBMaint2008_log' on file 1.
BACKUP LOG successfully processed 2 pages in 0.010 seconds (0.972 MB/sec).

I can then use the tail-of-the-log backup as the final backup in the restore sequence, saving transactions 3 and 4 from above. Try it for yourself.

Case 2: tail-of-the-log backup when server is no longer available

This is the case where the server crashed and cannot be brought online. If you have access to all the data and log files for the database, you can attach it to another server and crash recovery will run automatically. If you only have access to the log file, you'll need to perform some extra steps to allow a tail-of-the-log backup to be performed - basically performing what I call a hack-attach.

Assuming I've run the script above to setup the database, this time I'll do the following to simulate a server-crash disaster:

  1. ALTER DATABASE DBMaint2008 SET OFFLINE
  2. Delete the data file
  3. Copy the log file somewhere else
  4. DROP DATABASE DBMaint2008

Now all I have is some backups and a log file. I'll need to attach the log file to SQL Server somehow so that I can perform the vital tail-of-the-log backup. The way to do it is:

  1. Create a dummy database with the same name as the one that we're interested in (make sure you have instant file initialization enabled so the file creations don't take ages - see this blog post)
  2. Set the database offline (or shutdown the server)
  3. Delete all the files from the dummy database
  4. Drop in the log file from our real database

Like so for steps 1 and 2:

CREATE DATABASE DBMaint2008;
GO
ALTER DATABASE DBMaint2008 SET OFFLINE;
GO

Now I'll perform steps 3 and 4.

I need to attempt to bring the database online again:

ALTER DATABASE DBMaint2008 SET ONLINE;
GO

Msg 5120, Level 16, State 101, Line 1
Unable to open the physical file "C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\DBMaint2008.mdf". Operating system error 2: "2(The system cannot find the file specified.)".
Msg 945, Level 14, State 2, Line 1
Database 'DBMaint2008' cannot be opened due to inaccessible files or insufficient memory or disk space.  See the SQL Server errorlog for details.
Msg 5069, Level 16, State 1, Line 1
ALTER DATABASE statement failed.

And then I can perform the tail-of-the-log backup as before, and use it to recover everything up to the point of the disaster.

Note: This procedure does not work if I try to perform the hack-attach to a more recent version of SQL Server. I tried hacking a 2005 log into a 2008 server and taking the tail-of-the-log backup - which worked fine, but the tail-of-the-log backup could not be used in conjunction with the first set of backups from the 2005 server. The reason for this is that the database version number in the tail-of-the-log backup is 655 (SQL Server 2008) and those for the 2005 backups are 611 (SQL Server 2005). The database doesn't get upgraded when restoring on the 2008 server until the end of the restore sequence - so as far as the 2008 server is concerned, my 2008 tail-of-the-log backup can't be used in the restore seqeunce of a still-2005-really database. Hope that makes sense!

Summary

Checking whether a tail-of-the-log backup is required is essential during a disaster recovery, and it's not hard to do. As with any disaster recovery procedures, make sure you've practiced doing it in advance!

I was teaching an internals/maintenance/performance class for Microsoft DBAs this week on the Redmond campus and one of the students (thanks Crystal!) showed me a really strange bug that I'd never seen before.

One of the things that Kimberly and I both recommend is not having transaction log VLFs be too large, with 500MB being our recommended maximum so as not to interrupt the clearing/truncating of the log. See Kimberly's blog post Transaction Log VLFs - too many or too few? for more details.

If you try to grow a log file by a multiple of 4GB, it will fail on the first attempt and only grow by about 31KB, with no error message. The second attempt will succeed!

We've examined the SQL code and it's a bug that the code miscalculates the growth size when the specified size is an exact multiple of 4GB. Simple type conversion error. It's fixed in the SQL11 code already.

This could cause you problems if you follow our advice and try to grow a log file in increments of 8GB to keep the VLF size at 500MB (a growth of 1GB or more create 16VLFs, with each VLF being 1/16th the size of the growth).

I tested this on 2005 SP3 and 2008 SP1.

Here's an example script to show what I mean. First off I'll create a database and grow the log file to 1GB.

CREATE DATABASE TestLogFileGrowth;
GO

ALTER DATABASE TestLogFileGrowth MODIFY FILE
(
    NAME = TestLogFileGrowth_Log, SIZE = 1GB
);
GO

That took 30 seconds to run, performing the zero initialization of the log file (which must *always* occur - see Search Engine Q&A #24: Why can't the transaction log use instant initialization?).

Checking the log size:

DBCC SQLPERF (LOGSPACE);
GO

Database Name       Log Size (MB) Log Space Used (%) Status
------------------- ------------- ------------------ -------
<snip>
TestLogFileGrowth   1023.992      0.0733858          0

Now I'll try to grow the log file by 4GB:

ALTER DATABASE TestLogFileGrowth MODIFY FILE
(
    NAME = TestLogFileGrowth_Log, SIZE = 5GB
);
GO

This took zero seconds. Impossible if it actually grew the log by 4GB and zero initialized it!!

Let's check the log size:

DBCC SQLPERF (LOGSPACE);
GO

Database Name      Log Size (MB) Log Space Used (%) Status
------------------ ------------- ------------------ -----------
<snip>
TestLogFileGrowth  1024.234      0.07994729         0

Weird! It only grew by 1MB! If I try the grow again (and as long as I don't specify a size that's an exact multiple of 4GB from the current size), it takes a couple of minutes and grows correctly.

Don't be surprised if you see this.

Cheers

Categories:
Transaction Log

(Look in the Misconceptions blog category for the rest of the month's posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together: CommonSQLServerMyths.pdf (732.96 kb))

The month is finally over so time for the grand finale!

Although it's been fun debunking all these myths, it's been a tad stressful making sure I come up with an interesting and useful myth to debunk every day. I'd like to give kudos to fellow-MVP Glenn Berry (blog|twitter) who's been running an excellent DMV-a-Day series through April too!

To round out the month, I present to you 30 myths around backups - one for each day of the month of April. Last night I sat down to write this post and was a few myths short so reached out to the fabulous SQL community on Twitter (follow me!) for help - too many people to list (you know who you are) - I thank you!

A few folks have asked if I'll pull the month's posts into a PDF e-book - let me know if you'd like that.

I *really* hope you've enjoyed the series over the last month and have had a bunch of myths and misconceptions debunked once and for all - I know quite a few of you are going to use these explanations as ammunition against 3rd-party vendors, developers, and other DBAs who insist on incorrect practices.

Ok - here we go with the last one...

Myth #30: various myths around backups...

All are FALSE!!

For a good primer on understanding backups and how they work see my TechNet Magazine article Understanding SQL Server Backups. 

30-01) backup operations cause blocking

No. Backup operations do not take locks on user objects. Backups do cause a really heavy read load on the I/O subsystem so it might *look* like the workload is being blocked, but it isn't really. It's just being slowed down. There's a special case where a backup that has to pick up bulk-logged extents will take a file lock which could block a checkpoint operation - but DML is never blocked.

30-02) switching from the FULL recovery model to the BULK_LOGGED recovery model and back again breaks the log backup chain

No. It just doesn't. Switching from either FULL or BULK_LOGGED to SIMPLE *does* break the log backup chain however.

30-03) breaking the log backup chain requires a full backup to restart it

No. You can restart the log backup chain with either a full or differential backup - anything that bridges the LSN gap from the point at which the log backup chain was broken. See my blog post A SQL Server DBA myth a day: (20/30) restarting a log backup chain requires a full database backup for more details.

30-04) concurrent log backups are not possible while a full or differential backup is in progress 

No, this changed in SQL Server 2005. See my blog post Search Engine Q&A #16: Concurrent log and full backups.

30-05) a full or differential backup clears the log

No. A log backup includes all the log since the last log backup - nothing can change that - no matter whether that log was also backed up by a full or differential backup. I had a famous argument on Twitter last year and wrote this blog post as proof: Misconceptions around the log and log backups: how to convince yourself. In the FULL or BULK_LOGGED recovery models, the *only* thing that clears the log is a log backup.

30-06) using the BULK_LOGGED recovery model for minimally-logged operations reduces the size of the next transaction log backup

No. A minimally-logged operation is so-named because only the page allocations are logged. A log backup needs all the information necessary to resconstitute the transaction, so a log backup following a minimally-logged operation must backup the log plus all extents changed by the minimally-logged operation. This will result in the log backup being roughly the same size as if the operation was fully logged.

30-07) full and differential backups only contain the log generated while the backup was running

No. A full or differential backup contains enough log to be able to recover the database to a transactionally-consistent view of the database at the time the data-reading portion of the backup finished (or as far back as the oldest log record that transactional replication has not yet processed - to ensure that replication works properly after a restore). Check out these two blog posts for details:

30-08) backups always test existing page checksums

No. It only does it when you use the WITH CHECKSUM option - which you should.

30-09) backups read data through the buffer pool

No. The backup subsystem opens its own channels to the database files to avoid the performance hit of having to read everything into SQL Server's memory and back out to the backup device (and also effectively flushing the buffer pool in the process). If you ask the for page-checksum checking, it uses it's own small portion of memory.

30-10) backups perform consistency checks (a la DBCC CHECKDB)

No. Nothing else to say.

30-11) if the backup works, the restore will too

No. Please don't fall into this trap. You must regularly validate your backups to give yourself a high level of confidence that they will work if a disaster occurs. See Importance of validating backups for more details.

30-12) a mirrored backup will succeed if the mirror location becomes unavailable

No. If any one of the paths to a mirrored backup fails, the entire mirrored backup operation fails. I'd really like it to work the other way around - where the local backup succeeds and the remote backups fail, but it doesn't unfortunately.

30-13) a tail-of-the-log backup is always possible

No. A tail-of-the-log backup is one that backs up all the log generated since the last log backup, in an exceptional situation. If the data files are damaged, you can still do a tail-of-the-log backup EXCEPT if the un-backed-up log contains a minimally-logged operation. That would require reading data extents - which cannot be done if the data files are damaged. For this reason, the BULK_LOGGED recovery model should not be used on databases that have 24x7 user transactions.

30-14) you can use backups instead of DBCC CHECKDB

No. See A SQL Server DBA myth a day: (27/30) use BACKUP WITH CHECKSUM to replace DBCC CHECKDB.

30-15) you can backup a database snapshot

No. It's not implemented, but would be great if you could.

30-16) you can use database snapshots instead of log backups

No. A database snapshot is only usable while the database on which it is based is usable and online. If the source database is corrupted, the database snapshot most likely is too. If the source database goes suspect, so does the database snapshot.

Also, having multiple database snapshots (equating to multiple log backups) incurs an increasing performance drain - as every page that changes in the source database may need to be synchronously written to all existing snapshots before it can be written to the source database data files, and all existing database snapshots will grow as more pages are pushed into them.

30-17) log backups will be the size of the log

No. The log has to accomodate the space necessary to roll back active transactions, the amount of space returned by DBCC SQLPERF (LOGSPACE) on a busy system doesn't accurately refect the amount of log records in the log. This blog spot explains: Search Engine Q&A #25: Why isn't my log backup the same size as my log? And apart from that, a log backup is just all the log generated since the last log backup - not the whole log file usually - and if it happens to be, the first part of the explanation comes into play.

30-18) you cannot backup a corrupt database

No. In most cases you can use the WITH CONTINUE_AFTER_ERROR option to back up the corrupt database.  If that fails (maybe because of a damaged boot page or file-header page), there are no other options apart from OS-level file backups.

30-19) you cannot stop someone doing a BACKUP LOG .. WITH NO_LOG or TRUNCATE_ONLY operation

No. In SQL Server 2008 it's not possible any more (yay!) and in 2005 and before, use trace flag 3231 which turns the operation into a no-op.

30-20) log backups always clear the log

No.

If there's no concurrent data backup running, a log backup will always *try* to clear the log, and only succeed in clearing the inactive portion of the log - the log that's only considered 'required' by SQL Server because it hasn't yet been backed up. If anything else is holding the log 'required', it cannot be cleared, even though it has been backed up. Subsequent log backups will check again and again until the time comes when that portion of the log can be cleared. The TechNet Magazine article Understanding Logging and Recovery in SQL Server I wrote last year explains a lot more about how the log works.

Also, if there is a concurrent data backup running, the log clearing will be delayed until the data backup finishes. See the blog post in myth 30-04 for more details.

30-21) differential backups are incremental

No. Differential backups are all the data extents that have changed since the last full backup - so they are cumulative. Log backups are incremental - all log generated since the last log backup. Many people call differential backups 'incrementals', when they're not really.

30-22) once a backup completes, you can safely delete the previous one

No. No. No.

If you go to restore, and you find your full backup is corrupt, what do you do? Well, if you don't have an older full backup, you most likely start updating your resume. You need to keep a rolling-window of backups around in case a disaster occurs and you need to restore from an older set of backups.

30-23) you can back up a mirror database

No. A mirror database is not accessible except through a database snapshot. And you can't back up that either.

30-24) you can back up a single table

No. You can effectively back up single table if it happens to be wholely contained on a single filegroup, but there's no way to say BACKUP TABLE.

30-25) SQL Server has to be shut down to take a backup

No. No idea how this myth started... [Edit: apparently this myth started with Oracle - and we all know how good Oracle is compared to SQL Server... :-)]

30-26) my transaction is guaranteed to be contained in the backup if it committed before the backup operation completed

No. The commit log record for the transaction has to have been written out before the data-reading portion of the backup finished. See my blog post Search Engine Q&A #6: Using fn_dblog to tell if a transaction is contained in a backup for more details.

30-27) you should shrink the database before a backup to reduce the backup size

No. Shrink just moves pages around so won't make any difference. See my old blog post Conference Questions Pot-Pourri #10: Shrinking the database before taking a backup. And of course, shrink is evil. See A SQL Server DBA myth a day: (9/30) data file shrink does not affect performance. And what's even worse as someone reminded me, is if you do the shrink *after* the full backup, the next differential backup may be huge, for no actual data changes!

30-28) backups are always the best way to recover from a disaster

No. Backups are usually the best way to recover with zero data-loss (as long as you have log backups up to the point of the disaster), but not necessarily the best way to recover with minimal downtime. It may be way faster to fail over, or to run repair and accept some data loss if the business requirements allow it.

30-29) you don't need to back up master, msdb, model...

No. You should always back up the system databases. Master contains all the security info, what databases exist - msdb contains all the SSIS packages, Agent jobs, backup history - model contains the configuration for new databases. Don't fall into the trap of only backing up user databases otherwise you'll be in a world of hurt if you have to do a bare-metal install.

30-30) you should always plan a good backup strategy

No. Now you're thinking 'Huh?'...

You should plan a restore strategy. Use the business requirements and technical limitations to figure out what you need to be able to restore in what time, and then use that to figure out what backups you need to take to allow those restores to happen. See the blog posts:

The vast majority of the time people plan a backup strategy without testing or thinking about restores - and come a disaster, they can't restore within their SLAs. Don't let that be you.

(Look in the Misconceptions blog category for the rest of the month's posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together: CommonSQLServerMyths.pdf (732.96 kb))

The BULK_LOGGED recovery model continues to confuse people...

Myth #28: various myths around the BULK_LOGGED recovery model.

28a) regular DML operations can be minimally-logged

No.

Only a small set of bulk operations can be minimally-logged when in the BULK_LOGGED (or SIMPLE) recovery model. The list is in the Books Online topic Operations That Can Be Minimally Logged has the list. This is the 2008 link - make sure to check the link for the version you're running on.

28b) using the BULK_LOGGED recovery model does not affect disaster recovery

No.

Firstly, if a minimally-logged operation has been performed since the last log backup, and one or more data files were damaged and offline because of the disaster, a tail-of-the-log backup cannot be performed and so all user transactions performed since the last log backup will be lost.

Secondly, if a log backup contains a minimally-logged operation, a point-in-time restore cannot be performed to any time covered by the log backup. The log backup can either not be restored, or be restored in its entirety (plus additional log backups if required) - i.e. you can restore to a point:

  • Before the beginning of that log backup
  • At the end of that log backup
  • After the end of that log backup

But you can't restore to a point during that log backup.

28c) using the BULK_LOGGED recovery model also reduces the size of log backups

No.

A log backup that includes a minimally-logged operation must backup the minimal amount of transaction log *and* all the data file extents changed by that operattion - otherwise the restore of the log backup would not fully reconstitute the minimally-logged operation. This means that log backups are roughly the same size whether in the FULL or BULK_LOGGED recovery model.

(Look in the Misconceptions blog category for the rest of the month's posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together: CommonSQLServerMyths.pdf (732.96 kb))

Nested transactions are an evil invention designed to allow developers to make DBAs' lives miserable. In SQL Server, they are even more evil...

Myth #26: nested transactions are real in SQL Server.

FALSE!!!

Nested transactions do not actually behave the way the syntax would have you believe. I have no idea why they were coded this way in SQL Server - all I can think of is someone from the dim and distant past is continually thumbing their nose at the SQL Server community and going "ha - fooled you!!".

Let me explain. SQL Server allows you to start transactions inside other transactions - called nested transactions. It allows you to commit them and to roll them back.

The commit of a nested transaction has absolutely no effect - as the only transaction that really exists as far as SQL Server is concerned is the outer one. Can you say 'uncontrolled transaction log growth'? Nested transactions are a common cause of transaction log growth problems because the developer thinks that all the work is being done in the inner transactions so there's no problem.

The rollback of a nested transaction rolls back the entire set of transactions - as there is no such thing as a nested transaction.

Your developers should not use nested transactions. They are evil.

If you don't believe me, here's some code to show you what I mean. First off - create a database with a table that each insert will cause 8k in the log.

CREATE DATABASE NestedXactsAreNotReal;
GO
USE NestedXactsAreNotReal;
GO
ALTER DATABASE NestedXactsAreNotReal SET RECOVERY SIMPLE;
GO
CREATE TABLE t1 (c1 INT IDENTITY, c2 CHAR (8000) DEFAULT 'a');
CREATE CLUSTERED INDEX t1c1 ON t1 (c1);
GO
SET NOCOUNT ON;
GO

Test #1: Does rolling back a nested transaction only roll back that nested transaction?

BEGIN TRAN OuterTran;
GO

INSERT INTO t1 DEFAULT Values;
GO 1000

BEGIN TRAN InnerTran;
GO

INSERT INTO t1 DEFAULT Values;
GO 1000

SELECT @@TRANCOUNT, COUNT (*) FROM t1;
GO

I get back the results 2 and 2000. Now I'll roll back the nested transaction and it should only roll back the 1000 rows inserted by the inner transaction...

ROLLBACK TRAN InnerTran;
GO

Msg 6401, Level 16, State 1, Line 1
Cannot roll back InnerTran. No transaction or savepoint of that name was found.

Hmm... from Books Online, I can only use the name of the outer transaction or no name. I'll try no name:

ROLLBACK TRAN;
GO

SELECT @@TRANCOUNT, COUNT (*) FROM t1;
GO

And I get the results 0 and 0. As Books Online explains, ROLLBACK TRAN rolls back to the start of the outer transaction and sets @@TRANCOUNT to 0. All changes are rolled back. The only way to do what I want is to use SAVE TRAN and ROLLBACK TRAN to the savepoint name.

Test #2: Does committing a nested transaction really commit the changes made?

BEGIN TRAN OuterTran;
GO

BEGIN TRAN InnerTran;
GO

INSERT INTO t1 DEFAULT Values;
GO 1000

COMMIT TRAN InnerTran;
GO

SELECT COUNT (*) FROM t1;
GO

I get the result 1000, as expected. Now I'll roll back the outer transaction and all the work done by the inner transaction should be preserved...

ROLLBACK TRAN OuterTran;
GO

SELECT COUNT (*) FROM t1;
GO

And I get back the result 0. Oops - committing the nested transaction did not make its changes durable.

Test #3: Does committing a nested transaction at least let me clear the log?

I recreated the database again before running this so the log was minimally sized to begin with, and the output from DBCC SQLPERF below has been edited to only include the NestedXactsAreNotReal database.

BEGIN TRAN OuterTran;
GO

BEGIN TRAN InnerTran;
GO

INSERT INTO t1 DEFAULT Values;
GO 1000

DBCC SQLPERF ('LOGSPACE');
GO

Database Name         Log Size (MB) Log Space Used (%) Status
--------------------- ------------- ------------------ -----------
NestedXactsAreNotReal 12.05469      95.81983           0

Now I'll commit the nested transaction, run a checkpoint (which will clear all possible transaction log in the SIMPLE recovery model), and check the log space again:

COMMIT TRAN InnerTran;
GO

CHECKPOINT;
GO

DBCC SQLPERF ('LOGSPACE');
GO

Database Name         Log Size (MB) Log Space Used (%) Status
--------------------- ------------- ------------------ -----------
NestedXactsAreNotReal 12.05469      96.25324           0

Hmm - no change - in fact the Log Space Used (%) has increased slightly from writing out the checkpoint log records (see How do checkpoints work and what gets logged). Committing the nested transaction did not allow the log to clear. And of course not, because a rollback can be issued at any time which will roll back all the way to the start of the outer transaction - so all log records are required until the outer transaction commits or rolls back.

And to prove it, I'll commit the outer transaction and run a checkpoint:

COMMIT TRAN OuterTran;
GO

CHECKPOINT;
GO

DBCC SQLPERF ('LOGSPACE');
GO

Database Name         Log Size (MB) Log Space Used (%) Status
--------------------- ------------- ------------------ -----------
NestedXactsAreNotReal 12.05469      26.4339            0

And it drops right down.

Nested transactions - just say no! (a public service announcement from the nice folks at SQLskills.com :-)

Over the last few months there's been some noise (mostly of my making) on Twitter about the number of VLFs in transsction logs. Given the large numbers of people who read the blog and follow me on Twitter, I thought it would be very interesting to collect some statistics from a few hundred of you about how your transction logs are configured - in terms of the size, number of log files, and number of VLFs.

To that end, I've created a little script that you can download and run which will generate output of the form:

DB ID  Recovery Model  Log Size (MB)  Log Used (%)   Log File Count  VLF Count 
------ --------------- -------------- -------------- --------------- ----------
1      SIMPLE          6.12           34.42          1               22
2      SIMPLE          0.49           75.60          1               2 
3      FULL            14.68          97.90          1               51
4      SIMPLE          2.49           45.92          1               10
11     SIMPLE          1.99           33.55          1               4
12     FULL            359.99         77.84          1               96
13     FULL            0.48           45.67          1               2

I'd like as many of you as possible to send me the results for some of your systems - there are no database names in the output and the results will be completely anonymous. I'll collect all the results together and blog some scatter graphs showing how people have things set up.

You can get the code from SQLSkillsLogInfo.zip (2.05 kb) (it works on 2000, 2005, 2008) and send me as many results as you like (paul@sqlskills.com) either in plain text or spreadsheet format - or paste them totally anonymously in a comment. The script only takes a few seconds to run, won't affect performance in any way, and won't affect your transaction logs. Now - if your transaction log has many thousands of VLFs, it might take a few minutes to run...

You might think your system is boring - but as far as this is concerned, all results are useful results.

I'll publish the results in a couple of weeks.

Thanks!

(Look in the Misconceptions blog category for the rest of the month's posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together: CommonSQLServerMyths.pdf (732.96 kb))

Today's myth is very persistent, so it's high time it was debunked with a nice script to prove it too!

Myth #19: a TRUNCATE TABLE operation is non-logged.

FALSE

There is no such thing as a non-logged operation in a user database. The only non-logged operations that SQL Server performs are those on the version store in tempdb.

A TRUNCATE TABLE operation does a wholesale delete of all data in the table. The individual records are not deleted one-by-one, instead the data pages comprising the table are simply deallocated. The allocations are unhooked from the table and put onto a queue to be deallocated by a background task called the deferred-drop task. The deferred-drop task does the deallocations instead of them being done as part of the regular transaction so that no locks need to be acquired while deallocating entire extents. Before SQL Server 2000 SP3 (when this process was put into SQL Server), it was possible to run out of memory while acquiring locks during a TRUNCATE TABLE operation.

Here's an example script:

CREATE DATABASE TruncateTest;
GO
USE TruncateTest;
GO
ALTER DATABASE TruncateTest SET RECOVERY SIMPLE;
GO
CREATE TABLE t1 (c1 INT IDENTITY, c2 CHAR (8000) DEFAULT 'a');
CREATE CLUSTERED INDEX t1c1 on t1 (c1);
GO

SET NOCOUNT ON;
GO

INSERT INTO t1 DEFAULT VALUES;
GO 1280

CHECKPOINT;
GO

The database is in the SIMPLE recovery mode so the log clears out on each checkpoint (for simplicity - ha ha :-)

Wait for a minute or so (there may be some ghost record cleanup that occurs) and check how many rows are in the log:

SELECT COUNT (*) FROM fn_dblog (NULL, NULL);
GO

If you don't get a result of 2, do another checkpoint and check the log record count again until it comes back at 2. Now the database is completely quiescent and any new log records are from stuff we're doing. Now we'll do the truncate:

TRUNCATE TABLE t1;
GO

SELECT COUNT (*) FROM fn_dblog (NULL, NULL);
GO

I get back a result of 541 log records - clearly the operation is not non-logged, but it's clearly also not deleting each record - as I inserted 1280 records. If we look in the log we'll see:

SELECT
 [Current LSN], [Operation], [Context],
 [Transaction ID], [AllocUnitName], [Transaction Name]
FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation           Context             Transaction ID  AllocUnitName               Transaction Name
----------------------  ------------------  ------------------  --------------  --------------------------  ----------------
00000081:000001a6:0016  LOP_BEGIN_CKPT      LCX_NULL            0000:00000000   NULL                        NULL
00000081:000001a9:0001  LOP_END_CKPT        LCX_NULL            0000:00000000   NULL                        NULL
00000081:000001aa:0001  LOP_BEGIN_XACT      LCX_NULL            0000:00001072   NULL                        TRUNCATE TABLE
00000081:000001aa:0002  LOP_LOCK_XACT       LCX_NULL            0000:00001072   NULL                        NULL
00000081:000001aa:0003  LOP_LOCK_XACT       LCX_NULL            0000:00001072   NULL                        NULL
00000081:000001aa:0004  LOP_LOCK_XACT       LCX_NULL            0000:00001072   NULL                        NULL
00000081:000001aa:0005  LOP_COUNT_DELTA     LCX_CLUSTERED       0000:00000000   sys.sysallocunits.clust     NULL
00000081:000001aa:0006  LOP_COUNT_DELTA     LCX_CLUSTERED       0000:00000000   sys.sysrowsets.clust        NULL
00000081:000001aa:0007  LOP_COUNT_DELTA     LCX_CLUSTERED       0000:00000000   sys.sysrowsetcolumns.clust  NULL
00000081:000001aa:0008  LOP_COUNT_DELTA     LCX_CLUSTERED       0000:00000000   sys.sysrowsetcolumns.clust  NULL
00000081:000001aa:0009  LOP_COUNT_DELTA     LCX_CLUSTERED       0000:00000000   sys.sysrowsetcolumns.clust  NULL
00000081:000001aa:000a  LOP_HOBT_DDL        LCX_NULL            0000:00001072   NULL                        NULL
00000081:000001aa:000b  LOP_MODIFY_ROW      LCX_CLUSTERED       0000:00001072   sys.sysallocunits.clust     NULL
00000081:000001aa:000c  LOP_MODIFY_COLUMNS  LCX_CLUSTERED       0000:00001072   sys.sysallocunits.clust     NULL
00000081:000001aa:000d  LOP_DELETE_ROWS     LCX_MARK_AS_GHOST   0000:00001072   sys.sysserefs.clust         NULL
00000081:000001aa:000e  LOP_MODIFY_HEADER   LCX_PFS             0000:00000000   Unknown Alloc Unit          NULL
00000081:000001aa:000f  LOP_SET_BITS        LCX_PFS             0000:00000000   sys.sysserefs.clust         NULL
00000081:000001aa:0010  LOP_INSERT_ROWS     LCX_CLUSTERED       0000:00001072   sys.sysserefs.clust         NULL
00000081:000001aa:0011  LOP_MODIFY_ROW      LCX_SCHEMA_VERSION  0000:00000000   sys.sysobjvalues.clst       NULL
00000081:000001aa:0012  LOP_INSERT_ROWS     LCX_CLUSTERED       0000:00001072   sys.sysallocunits.clust     NULL
00000081:000001aa:0013  LOP_INSERT_ROWS     LCX_CLUSTERED       0000:00001072   sys.sysserefs.clust         NULL
00000081:000001aa:0014  LOP_HOBT_DDL        LCX_NULL            0000:00001072   NULL                        NULL
00000081:000001aa:0015  LOP_MODIFY_ROW      LCX_CLUSTERED       0000:00001072   sys.sysrowsets.clust        NULL
00000081:000001aa:0016  LOP_IDENT_SENTVAL   LCX_NULL            0000:00001072   NULL                        NULL
00000081:000001aa:0017  LOP_MODIFY_ROW      LCX_CLUSTERED       0000:00001072   sys.syscolpars.clst         NULL
00000081:000001aa:0018  LOP_COMMIT_XACT     LCX_NULL            0000:00001072   NULL                        NULL
00000081:000001b0:0001  LOP_BEGIN_XACT      LCX_NULL            0000:00001073   NULL                        DeferredAllocUnitDrop::Process
00000081:000001b0:0002  LOP_LOCK_XACT       LCX_NULL            0000:00001073   NULL                        NULL
00000081:000001b0:0003  LOP_MODIFY_ROW      LCX_IAM             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:0004  LOP_MODIFY_ROW      LCX_PFS             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:0005  LOP_SET_BITS        LCX_SGAM            0000:00000000   Unknown Alloc Unit          NULL
00000081:000001b0:0006  LOP_MODIFY_ROW      LCX_IAM             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:0007  LOP_MODIFY_ROW      LCX_PFS             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:0008  LOP_MODIFY_ROW      LCX_IAM             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:0009  LOP_MODIFY_ROW      LCX_PFS             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:000a  LOP_MODIFY_ROW      LCX_IAM             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:000b  LOP_MODIFY_ROW      LCX_PFS             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:000c  LOP_MODIFY_ROW      LCX_IAM             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:000d  LOP_MODIFY_ROW      LCX_PFS             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:000e  LOP_MODIFY_ROW      LCX_IAM             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:000f  LOP_MODIFY_ROW      LCX_PFS             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:0010  LOP_MODIFY_ROW      LCX_IAM             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:0011  LOP_MODIFY_ROW      LCX_PFS             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:0012  LOP_MODIFY_ROW      LCX_IAM             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:0013  LOP_MODIFY_ROW      LCX_PFS             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:0014  LOP_SET_BITS        LCX_SGAM            0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:0015  LOP_SET_BITS        LCX_GAM             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:0016  LOP_SET_BITS        LCX_IAM             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:0017  LOP_MODIFY_ROW      LCX_PFS             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:0018  LOP_SET_BITS        LCX_GAM             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:0019  LOP_SET_BITS        LCX_IAM             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:001a  LOP_MODIFY_ROW      LCX_PFS             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:001b  LOP_SET_BITS        LCX_GAM             0000:00001073   Unknown Alloc Unit          NULL
00000081:000001b0:001c  LOP_SET_BITS        LCX_IAM             0000:00001073   Unknown Alloc Unit          NULL
etc

The transaction with ID 0000:00001072 is the implicit transaction of my TRUNCATE TABLE statement (as you can see from the transaction name). It commits at LSN 00000081:000001aa:0018 and then straight afterwards is the start of the deferred-drop transaction. As you can see from the log records, it's just deallocating the pages and extents.

Well, you can't really see that unless you know what all the log records are doing, so let's have a quick look at the descriptions:

SELECT
 [Current LSN], [Operation], [Lock Information], [Description]
FROM fn_dblog (NULL, NULL);
GO

and you'll be able to see the locks that are logged to allow fast recovery to work (see my blog post Lock logging and fast recovery for an in-depth explanation) and also the description of the operations being performed. Here's a small selection from the start of the deferred-drop transaction:

Operation       Lock Information                                             Description
--------------  -----------------------------------------------------------  --------------------------------------------------------
LOP_BEGIN_XACT  NULL                                                         DeferredAllocUnitDrop::Process
LOP_LOCK_XACT   HoBt 0:ACQUIRE_LOCK_IX ALLOCATION_UNIT: 8:72057594042384384
LOP_MODIFY_ROW  HoBt 72057594042384384:ACQUIRE_LOCK_X RID: 8:1:153:0
LOP_MODIFY_ROW  HoBt 72057594042384384:ACQUIRE_LOCK_X PAGE: 8:1:152          Deallocated 0001:00000098
LOP_MODIFY_ROW  HoBt 72057594042384384:ACQUIRE_LOCK_X RID: 8:1:153:1
LOP_MODIFY_ROW  HoBt 72057594042384384:ACQUIRE_LOCK_X PAGE: 8:1:156          Deallocated 0001:0000009c
LOP_MODIFY_ROW  HoBt 72057594042384384:ACQUIRE_LOCK_X RID: 8:1:153:2
LOP_MODIFY_ROW  HoBt 72057594042384384:ACQUIRE_LOCK_X PAGE: 8:1:157          Deallocated 0001:0000009d
LOP_MODIFY_ROW  HoBt 72057594042384384:ACQUIRE_LOCK_X RID: 8:1:153:3
LOP_MODIFY_ROW  HoBt 72057594042384384:ACQUIRE_LOCK_X PAGE: 8:1:158          Deallocated 0001:0000009e
LOP_MODIFY_ROW  HoBt 72057594042384384:ACQUIRE_LOCK_X RID: 8:1:153:4
LOP_MODIFY_ROW  HoBt 72057594042384384:ACQUIRE_LOCK_X PAGE: 8:1:159          Deallocated 0001:0000009f
LOP_MODIFY_ROW  HoBt 72057594042384384:ACQUIRE_LOCK_X RID: 8:1:153:5
LOP_MODIFY_ROW  HoBt 72057594042384384:ACQUIRE_LOCK_X PAGE: 8:1:160          Deallocated 0001:000000a0
LOP_MODIFY_ROW  HoBt 72057594042384384:ACQUIRE_LOCK_X RID: 8:1:153:6
LOP_MODIFY_ROW  HoBt 72057594042384384:ACQUIRE_LOCK_X PAGE: 8:1:161          Deallocated 0001:000000a1
LOP_MODIFY_ROW  HoBt 72057594042384384:ACQUIRE_LOCK_X RID: 8:1:153:7
LOP_MODIFY_ROW  HoBt 72057594042384384:ACQUIRE_LOCK_X PAGE: 8:1:162          Deallocated 0001:000000a2
LOP_SET_BITS    NULL                                                         ClearBit 0001:000000a0
LOP_SET_BITS    NULL                                                         Deallocated 1 extent(s) starting at page 0001:000000a0
LOP_SET_BITS    NULL
LOP_MODIFY_ROW                                                               Deallocated 0001:000000a8;Deallocated 0001:000000a9;Deallocated 0001:000000aa;Deallocated 0001:000000ab;Deallocated 0001:000000ac;Deallocated 0001:000000ad;Deallocated 0001:000000ae;Deallocated 0001:000000af
LOP_SET_BITS    NULL                                                         Deallocated 1 extent(s) starting at page 0001:000000a8
LOP_SET_BITS    NULL
LOP_MODIFY_ROW                                                               Deallocated 0001:000000b0;Deallocated 0001:000000b1;Deallocated 0001:000000b2;Deallocated 0001:000000b3;Deallocated 0001:000000b4;Deallocated 0001:000000b5;Deallocated 0001:000000b6;Deallocated 0001:000000b7

The first 8 operations are deallocating the 8 pages that are allocated from mixed extents when the table was first populated and after that it switches to deallocating an entire extent at a time. Have a poke around - this stuff's really fascinating. Note also the LOP_LOCK_XACT log record, which just describes the acquisition of a lock - not a change to the database. You'll notice that the extent deallocations don't have any locks protecting them - that's what the allocation unit IX lock is doing.

By the way, if you have nonclustered indexes on the table too, they are also dealt with the same way and there will be a single deferred-drop transaction which deallocates all the pages from both the table and all nonclustered indexes, one allocation unit at a time. Try it and you'll see what I mean.

Myth debunked!

PS There's another myth that a TRUNCATE TABLE can't be rolled back - I debunk that in this old blog post: Search Engine Q&A #10: When are pages from a truncated table reused?

(Look in the Misconceptions blog category for the rest of the month's posts and check out the 60-page PDF with all the myths and misconceptions blog posts collected together: CommonSQLServerMyths.pdf (732.96 kb))

Today is the big day - 8am PST I'm presenting my DBA Mythbusters session at SQL Connections for the first time. I'll hopefully do it as a longer Spotlight Session at PASS this Fall too.

So a little quicky today as I'm doing 4 sessions in a row.

Myth #14: clearing the log zeroes out log records.

FALSE

The transaction log is *always* zero initialized when first created, manually grown, or auto-grown. Do not confuse this with the process of clearing the log during regular operations. That simply means that one or more VLFs (Virtual Log Files) are marked as inactive and able to be overwritten. When log clearing occurs, nothing is cleared or overwritten. 'Clearing the log' is a very confusing misnomer. It means the exact same as 'truncating the log', which is another unfortunate misnomer, because the size of the log doesn't change at all.

You can read more about zero initialization of the log in my blog post Search Engine Q&A #24: Why can't the transaction log use instant initialization? and about how log clearing works in my TechNet Magazine article from February 2009 Understanding Logging and Recovery in SQL Server.

You can prove this to yourself using trace flag 3004. Turning it on will let you see when SQL Server is doing file zeroing operations (as I described in A SQL Server DBA myth a day: (3/30) instant file initialization can be controlled from within SQL Server). Turn it on and watch for messages coming during the day - you shouldn't see anything unless the log grows.

Here's a script to show you what I mean:

DBCC TRACEON (3004, 3605);
GO

-- Create database and put in SIMPLE recovery model so the log will clear on checkpoint
CREATE DATABASE LogClearTest ON PRIMARY (
    NAME = 'LogClearTest_data',
    FILENAME = N'D:\SQLskills\LogClearTest_data.mdf')
LOG ON (
    NAME = 'LogClearTest_log',
    FILENAME = N'D:\SQLskills\LogClearTest_log.ldf',
    SIZE = 20MB);
GO

-- Error log mark 1
ALTER DATABASE LogClearTest SET RECOVERY SIMPLE;
GO

USE LogClearTest;
GO

-- Create table and fill with 10MB - so 10MB in the log
CREATE TABLE t1 (c1 INT IDENTITY, c2 CHAR (8000) DEFAULT 'a');
GO
INSERT INTO t1 DEFAULT VALUES;
GO 1280

-- Clear the log
CHECKPOINT;
GO

-- Error log mark 2
ALTER DATABASE LogClearTest SET RECOVERY SIMPLE;
GO

And in the error log we see:

2010-04-13 13:20:27.55 spid53      DBCC TRACEON 3004, server process ID (SPID) 53. This is an informational message only; no user action is required.
2010-04-13 13:20:27.55 spid53      DBCC TRACEON 3605, server process ID (SPID) 53. This is an informational message only; no user action is required.
2010-04-13 13:20:27.63 spid53      Zeroing D:\SQLskills\LogClearTest_log.ldf from page 0 to 2560 (0x0 to 0x1400000)
2010-04-13 13:20:28.01 spid53      Zeroing completed on D:\SQLskills\LogClearTest_log.ldf
2010-04-13 13:20:28.11 spid53      Starting up database 'LogClearTest'.
2010-04-13 13:20:28.12 spid53      FixupLogTail() zeroing D:\SQLskills\LogClearTest_log.ldf from 0x5000 to 0x6000.
2010-04-13 13:20:28.12 spid53      Zeroing D:\SQLskills\LogClearTest_log.ldf from page 3 to 63 (0x6000 to 0x7e000)
2010-04-13 13:20:28.14 spid53      Zeroing completed on D:\SQLskills\LogClearTest_log.ldf
2010-04-13 13:20:28.16 spid53      Setting database option RECOVERY to SIMPLE for database LogClearTest.
2010-04-13 13:20:29.49 spid53      Setting database option RECOVERY to SIMPLE for database LogClearTest.

The two ALTER DATABASE commands serve as markers in the error log. There's clearly no zeroing occuring from the CHECKPOINT between the two ALTER DATABASE commands. To further prove to yourself, you can add in calls to DBCC SQLPERF (LOGSPACE) before and after the CHECKPOINT, to show that the log is clearing when the CHECKPOINT occurs (watch the value in the Log Space Used (%) column decrease).

Now it's session time here in Vegas! (And there's no-one to blame but yourself if you're the Conference Chair and you've got an 8am session! :-)

(Edited 2/27/10 to add database name to the output) 

Here's a little script I knocked up this afternoon to tell me who has open transactions on the server - not just the single oldest active transaction that DBCC OPENTRAN returns.

It gives back:

  • session ID
  • login name
  • database context
  • transaction begin time
  • how many log records have been generated by the transaction
  • how much log space has been taken up by those log records
  • how much log space has been reserved in case the transaction rolls back
  • the last T-SQL that was executed in the context of the transaction
  • the last query plan that was executed (only for currently executing plans)

It's ordered by the transaction begin time. I had some trouble using CROSS APPLY with the sys.dm_exec_query_plan DMV - if the plan isn't available, it blows out the entire result-set for that transaction. After messing around for ten minutes I discovered the joys of the OUTER APPLY operator - which is the same as CROSS APPLY but allows NULL values from the function being cross-applied.

Also thanks to fellow MVP Aaron Bertrand (twitter|blog) for pointing out a mistake in the way I was calling sys.dm_exec_query_plan.

Here's the script with some example output:

SELECT s_tst.[session_id],
   s_es.[login_name] AS [Login Name],
   DB_NAME (s_tdt.database_id) AS [Database],

   s_tdt.[database_transaction_begin_time] AS [Begin Time],
   s_tdt.[database_transaction_log_record_count] AS [Log Records],
   s_tdt.[database_transaction_log_bytes_used] AS [Log Bytes],
   s_tdt.[database_transaction_log_bytes_reserved] AS [Log Rsvd],
   s_est.[text] AS [Last T-SQL Text],
   s_eqp.[query_plan] AS [Last Plan]
FROM sys.dm_tran_database_transactions s_tdt
   JOIN sys.dm_tran_session_transactions s_tst
      ON s_tst.[transaction_id] = s_tdt.[transaction_id]
   JOIN sys.[dm_exec_sessions] s_es
      ON s_es.[session_id] = s_tst.[session_id]
   JOIN sys.dm_exec_connections s_ec
      ON s_ec.[session_id] = s_tst.[session_id]
   LEFT OUTER JOIN sys.dm_exec_requests s_er
      ON s_er.[session_id] = s_tst.[session_id]
   CROSS APPLY sys.dm_exec_sql_text (s_ec.[most_recent_sql_handle]) AS s_est
   OUTER APPLY sys.dm_exec_query_plan (s_er.[plan_handle]) AS s_eqp
ORDER BY [Begin Time] ASC;
GO

session_id Login Name        Database Begin Time              Log Records Log Bytes Log Rsvd Last T-SQL Text                      Last Plan
---------- ----------------- -------- ----------------------- ----------- --------- -------- ------------------------------------ ---------
54         ROADRUNNERPR\paul foo      2010-02-01 15:28:48.560 2           236       8550     begin tran insert into t1 values (1) NULL
55         ROADRUNNERPR\paul foo      2010-02-01 16:38:18.373 3           356       8852     insert into t1 values (3)            NULL

(For the hardware setup I'm using, see this post. For the baseline performance measurements for this benchmark, see this post.)

In my previous post in the series, I described the benchmark I'm optimizing - populating a 1-TB clustered index as fast as possible using default values. I proved to you that I had an IO bottleneck because the IOs to the data file (from checkpoints) and the transaction log file (from transactions committing) were causing contention.

Several people commented that I might have mis-configured the iSCSI IO subsystem - so first off I want to look at that. Fellow MVP Denny Cherry (twitter|blog), who knows a lot more than me about IO subsystems, volunteered to discuss my iSCSI setup with me to make sure I hadn't goofed anywhere (many thanks Denny!). It seems like I haven't. I'm using a single iSCSI array right now, with a single NIC on the server dedicated to the iSCSI traffic - once I move to multiple volumes, I'll add in more NICs.

Looking at Task Manager in the middle of a 6-hour test run to see the network utilization through the NIC shows that it's not saturated, as shown below.

 

I ran the DELL smcli utility for two hours during the most recent test to see what peak throughput I'm getting, using the following command:

smcli -n Middle_MD3000 -c "set session performanceMonitorInterval=5 performanceMonitorIterations=1440;save storageArray performanceStats file=\"c:\\MiddlePerfStats.csv\";"

I saw around 101MBytes/sec. and earlier when testing the smcli settings I'd seen 106MBytes/sec. I'm sure once I remove some of the contention that this will get better.

On to the test!

The first thing I wanted to try was optimizing my use of the transaction log - i.e. doing as few and as large IOs as possible to the log. My hypothesis is that by changing the batch size from one to, say, 10 or 100 SQL Server can make more efficient use of the log.

I changed my T-SQL script to take a batch size parameter and use explicit transactions inserting <batch-size> records. Everything else remained the same as the baseline. I picked 128 concurrent threads as my test point. In the baseline, the 128-thread insert test took 8 hours 8 minutes 27 seconds to complete (29307 seconds). My simple T-SQL code changed to (with $(rows) and $(batch) being SQLCMD parameters to the script):

DECLARE @counter BIGINT;
DECLARE @inner   SMALLINT;
DECLARE @start   DATETIME;
DECLARE @end     DATETIME;

SELECT @counter = 0;
SELECT @start = GETDATE ();

WHILE (@counter < $(rows))
BEGIN
   SELECT @inner = 0;
   BEGIN TRAN;
   WHILE (@inner < $(batch))
   BEGIN
      INSERT INTO MyBigTable DEFAULT VALUES;
      SELECT @inner = @inner + 1;
   END
   COMMIT TRAN;
   SELECT @counter = @counter + $(batch);
END;

SELECT @end = GETDATE ();

INSERT INTO msdb.dbo.Results VALUES (CONVERT (INTEGER, DATEDIFF (second, @start, @end)));
GO

Below are the results for 128 threads with batch sizes varying from 10 to 10000:

Increasing the batch size from 1 record to 10 records caused the elapsed time to drop from 29307 seconds to 21167 seconds - a 28% improvement! However, increasing the batch size further didn't improve things any more. Hmmm.

Next I tried different numbers of concurrent connections with a batch size of ten to see if the improvement was universal:

 

So moving from a single-record batch to a ten record batch has the same dramatic effect on performance for varying numbers of connections.

What's going on?

The answer involves some deep internals of the transaction log structure and IO behavior. The transaction log is split up internally into sections called virtual log files (VLFs for short). These VLFs are used to manage which parts of the log can be overwritten with new log records. If this is all unfamiliar, go read my TechNet Magazine article on Understanding Logging and Recovery in SQL Server which explains about the log in more detail and about transactions, otherwise what's coming next may not make much sense.

Whenever a transaction commits, all the transaction log records up to and including the LOP_COMMIT_XACT log record for that transaction (including log records from other transactions that are inter-mingled with those from the one that just committed) must be written out to disk, so the transaction is durable (called write-ahead logging). But it's not individual log records that are written to disk, the unit of IO for the transaction log is a log block. VLFs are split internally into dynamically sized log blocks, ranging from 512-bytes to 60KB in 512-byte increments, as needed. There are algorithms to figure out how big log blocks should be based on what log records are being flushed out.

With the single record batch, the amount of log generated in the transaction totals 4952 bytes. You can find this by doing something like:

CHECKPOINT;  -- to force the log in the SIMPLE recovery model to clear
GO

INSERT INTO MyBigTable DEFAULT VALUES;
GO

SELECT * FROM fn_dblog (NULL, NULL); -- show me all active log
GO

Then add up the values in the Log Record Length column for all records from the first LOP_BEGIN_XACT log record to the LOP_COMMIT_XACT with the matching Transaction ID.

Anyway, a single-record transaction generates 4952 bytes of transaction log, which will be flushed out in our case in a log block that's 5120 bytes (the nearest multiple of 512 above 4952), with a bunch of zero-padding at the end. For ten single-record transactions, that's 10 small IOs.

The ten-record transaction generates 48688 bytes of transaction log and will be flushed out in a log block that's 49152 bytes (the nearest multiple of 512 above 48688). This is clearly more efficient than 10 smaller IOs and is why changing to batches makes things go faster.

A 100-record transaction generates 489628 bytes of transaction log, which is clearly more than the 60KB log-block maximum, so it actually covers multiple log blocks (varying from 14-16 in my tests - I don't remember the exact algorithms). You can see the log block changing when the middle number in the Current LSN column of the fn_dblog output increases. It looks like it's jumping up, and it is - this is the offset of the log block within the current VLF divided by 512.

Because the 100-record transaction is split into multiple log blocks, there's no real IO gain during log flushes over the 10-record transaction - which is illustrated by the results I saw.

Now, even with this speed increase from the increased batch size, the average-disk-write-queue-length is still anywhere from 20-40 when running the 128-connection test, so there's still a significant bottleneck there. In fact, the transaction log grew significantly still during these tests (up to 23GB in one case) - for an explanation of this phenomenon, see Interesting case of watching log file growth during a perf test. I picked up some wait stats queries from fellow-MVP Glenn Berry (twitter|blog) to run while the tests were running. I'm using his as they're published and anyone can download and use them (see his blog post for the queries, and Bob Ward's CSS blog post that's starting to document the wait types).

His queries tell me that with the 10-record per batch and 128-connections:

  • Average write-stall to the data file is 37ms
  • Average write-stall to the log file is 18ms
  • Top wait types are PAGELATCH_EX (55% of all waits), PAGELATCH_SH (28% of all waits), and WRITELOG (14% of all waits)

The first two waits are the Storage Engine waiting for the buffer pool to kick out dirty pages so that buffers can be used to hold newly created pages, and the third is the log manager waiting for log block flushes to disk to complete.

Clearly IO IO IO.  Let's look at a a perfmon capture during the 10-records per batch test with 128 connections:

 

Let's go through each counter (top to bottom in the list in the image) and explain what's going on. I deliberately picked this time-slice, as it really simply shows the effect of contention:

  • Pages Allocated/sec: this is the light blue line and is the Access Methods part of the Storage Engine (the dev team I used to run) creating new data and index pages for the clustered index we're populating.
  • Checkpoint pages/sec: this is the pink line at the bottom left and bottom right of the capture. This is the buffer pool writing out dirty pages during a periodic checkpoint.
  • Avg. Disk sec/Write: this is the dark blue line that's tracking just above the thick black line. It's the amount of time in ms for a write to complete. You can see that it has a minimum around 12:51:00 and then varies wildly, hitting as high as 50+ms for a single write.
  • Avg. Disk Write Queue Length: this is the highlighted line in thick black. It also has a minimum around 12:51:00 and varies wildly the rest of the time.
  • Disk Write Bytes/sec: this is the dark green line at the top that shows the number of bytes being written to disk from all IO sources. Same story around 12:51:00.
  • Log Growths: A simple counter since the database was created/server started. It's over 100 and off the chart.
  • Log Bytes Flushed/sec: this is the red, fairly constant line around 1/3 the way up and is log blocks being flushed to disk because of transaction commits or checkpoints.
  • Lazy writes/sec: this is the light green line at the bottom and is the buffer pool having to force data pages to be written to disk (along with all transaction log flushed up to the point of the last log record to change the page being written) to make space for images of newly created pages.

This time-slice is really cool in that it shows what happens when contention goes away. Just before 12:51:00, a checkpoint ends and the lazywriter has nothing to do - so the only IOs hitting the disks are those coming from the transaction log flushing out log blocks as transactions commit. You can see the Avg. Disk Write Queue Length drop down to 2-3, the Avg. Disk sec/Write drop to about 5ms, and most beautiful of all (look, I'm a big geek ok? :-), the Disk Write Bytes/sec (the green line) drops down to be exactly equal to the Log Bytes Flushed/sec - proving that it's just log flushes hitting the disk. This is the no-contention case. It happens again for brief spell about 10 seconds later - the lazywriter most likely created a temporary surfeit of empty buffers. All the rest of the time, the lazywriter and checkpoints play havoc with the write throughput on the drives by causing contention.

It's clearly time to try some separation of files to relieve the contention - and that's what I'll cover in the next post in the series.

Hope you're enjoying the series - these take a long time to write up!

I'm running some performance tests on the hardware we have (more details on the first of these tomorrow) and I was surprised to see some explosive transaction log growth while running in the SIMPLE recovery model with single row insert transactions!

Without spoiling tomorrow's thunder too much, I've got a setup with varying numbers of connections populating a 1TB table with default values, with the single data and log files on the same 8-drive RAID-10 array (again more details on why tomorrow). I was looking at perfmon while the test was running with 128 concurrent connections, and noticed some weird-looking behavior - basically what looked like a never-ending checkpoint occuring. So I dug in deeper and discovered the transaction log had grown over 7GB since the start of the test. Huh?

I restarted the test from scratch, with a 1TB data file and a 256MB log and watched what happened in perfmon; here's the screenshot of the first 90 seconds:

 

This is *so* interesting. The black line is the number of log growths, so you can see the log grows every time the line level goes up. The bright blue line is the number of pages being allocated per second to hold all the table rows my 128 connections are inserting. You can clearly see that every time there's a log growth, the allocations take a nose-dive - because no transactions can commit while the new portion of the log is being zeroed (remember that instant file initialization does not apply to the log - see this blog post). The green line (disk write bytes per second) spikes when the log grows because of the zeroes being written out to the log file by SQL Server (remember that SQL Server does the zeroing).

But why is the log growing in the first place? I'm in the SIMPLE recovery model and doing single-row implicit transactions (yes, I'm deliberately doing this) so there shouldn't be anything stopping the log from clearing during a checkpoint, right?

Wrong. The log starts out small (256MB) so one of the thresholds for triggering a checkpoint gets hit pretty fast (70% of the log file is used). So a checkpoint occurs (you can see the checkpoints occuring when the pink line at the bottom of the perfmon screen is above zero), and starts writing out to the data file, which is on the same disk as the log (see How do checkpoints work and what gets logged for an explanation of checkpoint mechanics), but it can't write fast enough (because of disk contention) to get to the log-clearing part before the log fills up completely and has to grow (because transactions are continuing at break-neck speed). So the log grows, and the insert transactions stop while the log is zeroed. And then the log starts to fill up again very quickly and another checkpoint is triggered, and so on and so on.

Eventually a steady state is reached where there's enough free log space during a checkpoint that no new log is required for the concurrent transactions to commit. You might ask why the checkpoint is so slow in the first place? Because I deliberately put the data file on the same RAID array as the log file, and both are being steadily hammered with writes - classic disk contention. Even though the RAID array is RAID-10 with 8x300GB 15k SCSI drives, the average disk queue write length is over 20 most of the time during the 128-way test because I'm simply trying to do too much.

The point of my blog post? Just because you don't have any of the classic causes of transaction log growth going on, doesn't mean you're going to be immune. In this case my (deliberate) poor physical layout of the database files and workload growth up to 128 concurrent connections caused the log to grow. What started out working when I was running 16 connections didn't work any more at 128 (actually I went back and re-ran some of the earlier tests and even with only 64 connections, the log grew to over 1GB before reaching steady-state).

Interesting eh?

Tomorrow I'll be continuing the perf/benchmarking series by creating my first benchmark and then tweaking the setup to see how I can improve performance (for instance with multiple data files, separation of log and data files - all the things I preach but have never *demonstrated*), but this behavior merited a post all on its own.

Earlier today there was a question on SQL Server Central where someone wanted to know what could be causing so many reads on their transaction log. I was asked to chime in by fellow MVP Jonathan Kehayias (who also sent me some questions that I've answered in this post - thanks Jon!), so I did, with a list of everything I could think of. I thought it would make for a good post, so here it is, with a few more things I remembered while writing the post.

Before I start, if you're not comfortable talking log records and transaction log architecture, see my TechNet Magazine article on Understanding Logging and Recovery, which explains everything clearly, including how having too many VLFs can affect operations on the log that have to scan VLFs.

Each of these things can cause reads of the log:

  • Transaction rollback: when a transaction has to roll back (either because you say ROLLBACK TRAN or something goes wrong and SQL Server aborts the transaction), the log records describing what happened in the transaction have to be read so that their effects can be removed from the database. This is explained in the TechNet Magazine article. Note that it doesn't matter if you're using explicit transactions or not (i.e. BEGIN TRAN), SQL Server always starts a transaction for you (called an implicit transaction) so that it can put a boundary on what needs to be rolled back in case of a failure.
  • Crash recovery: crash recovery must read the transaction log to figure out what to do with all the log records in the active portion of the log (all the way back to the earlier of the most recent checkpoint or the start of the oldest active transaction). The log is read twice - once going forward from that oldest point (called the REDO phase) and then going backwards (called the UNDO phase). Again, this is explained in great depth in the article.
  • Creating a database snapshot: a database snapshot is a point-in-time view of a database. What's more, it's a transactionally consistent point-in-time view of a database - which means that, essentially, crash recovery must be run on the real database to create the transactionally consistent view. The crash recovery is run into the database snapshot, the real database isn't affected - apart from having all the active transaction log read so that crash recovery can run.
  • Running DBCC CHECKDB: creates a database snapshot by default on 2005 onwards, and runs the consistency checks on the snapshot. See above. There's a much more detailed description, including how this worked in 2000, in the first part of the 10-page blog post CHECKDB From Every Angle: Complete description of all CHECKDB stages.
  • Transaction log backups: this one's kind of obvious. A transaction log backup contains all the transaction log records generated since the last log backup finished (or since the log backup chain was established). To back up the log it has to read it. What's not so obvious is that a log backup will also scan through all the VLFs in the log to see if any active ones can be made inactive (called clearing or truncating the log - both misnomers as nothing is cleared and nothing is truncated). See my TechNet Magazine article on Understanding SQL Server Backups and in the blog post Importance of proper transaction log size management.
  • Any kind of data backup: (full/differential backup of a file/filegroup/database). Yup - data backups always include transaction log - so the backup can be restored and give you a transactionally consistent view of the database. See Debunking a couple of myths around full database backups and More on how much transaction log a full backup includes for details if you don't believe me.
  • Transactional replication: transactional replication works by harvesting committed transactions from the transaction log of the publication database (and then sending them to the subscriber(s) via the distribution database - beyond the scope of this post). This is done by the Log Reader Agent job, running from the Distributor. It needs to read all the log records generated in the publication database, even if they're nothing to do with the publications. More log equals more reads. My whitepaper on combining database mirroring and transactional replication in 2008 has more details on this stuff, as does Books Online.
  • Change data capture (in 2008): CDC uses the transactional replication log reader agent to harvest changes from the transaction log. See above. This means the CDC can cause the log to not be able to clear properly, just like transactional replication or database mirroring - see my blog post Search Engine Q&A #1: Running out of transaction log space for more details. Note the I didn't say Change Tracking - it uses a totally different mechanism - see my TechNet Magazine article on Tracking Changes in Your Enterprise Database for more details.
  • Database mirroring: DBM works by sending physical log records from the principal to the mirror database. If the mirroring sessions drops out of the SYNCHRONIZED state, then the log records won't be able to be read from memory and the mirroring subsystem will have to get them from disk - causing log reads. This can happen if you're running asynchronous mirroring (where you're specifically allowing for this), or if something went wrong while running synchronous mirroring (e.g. the network link between the principal and mirror dropped out, and a witness wasn't configured or the principal could still see the witness - again, beyond the scope of this post). Regardless, this is called having a SEND queue on the principal.
  • Restoring a backup: whenever backups are restored, even is you've said WITH NORECOVERY, the REDO portion of recovery is run for each restore, which reads the log.
  • Restoring a log backup using WITH STANDBY: in this case, you've essentially said you'd like recovery to run, but not to affect the transaction log itself. Running recovery has to read the log. For more info on using WITH RECOVERY, NORECOVERY, or STANDBY, see my latest TechNet Magazine article on Recovering from Disasters Using Backups, which explains how restores work.
  • A checkpoint, in the SIMPLE recovery mode only: see my blog post How do checkpoints work and what gets logged for a description of what checkpoints are and what they do. In the SIMPLE recovery mode, checkpoints are responsible for clearing the log (described with links above) so must read through all the VLFs to see which can be marked inactive.
  • When processing a DML trigger (on 2000): (thanks to Clay Lenhart for the comment that reminded me of this). In SQL Server 2000, the before and after tables that you can process in a DML trigger body are actually found from looking at the log records generated by the operation that caused the trigger to fire. My dev team changed this in 2005 to store the before and after tables using the version store, giving a big perf boost to DML trigger processing. 
  • Manually looking in the log (with DBCC LOG or the table-valued function fn_dblog): this one's pretty obvious.

Phew - a lot of things can cause log reads, the trick is knowing which one it is!

As you can see, there could be a lot of activity reading from your log as well as writing to it, which could cause an IO bottleneck. Make sure that the IO subsystem on which you place the log file (note: you don't get ANY performance benefit from having multiple log files) can handle the read and write workload the log demands. RAID 1 or RAID 10 with a bunch of spindles to spread the IOs out (note/warning/achtung: that's a big generalization - don't reply with a comment saying it's wrong because you've seen something different - different scenarios have different demands), and a proper RAID configuration (64k multiple for a stripe size, NTFS allocation unit size, volume partition alignment).

One of the things I love teaching is how the transaction log and logging/recovery work. I presented a session on this at both PASS and SQL Connections in the last two weeks, and in both sessions I promised to write some blog posts about the deep internals of logging operations. This is the first one in the series. Previous blog posts that dive into logging operations are:

Ok, on with the show. 

SQL Server 2005 introduced a feature called 'fast recovery' in Enterprise Edition. This allows a database to become available for use after the first part of recovery (REDO) completes and before the (usually longer running) second part of recovery (UNDO) completes. See my TechNet Magazine article Understanding Logging and Recovery in SQL Server if you don't know what I'm talking about. But how does SQL Server do this?

The answer is lock logging. A log record describes a single change made to a database. For log records describing changes that can be used as part of UNDO (yes, some changes to the database are one-way only - for instance PFS page changes), from 2005 onwards the log record also includes a description of which locks were being held at the time the change was made. These locks were necessary to protect the change being made when the original transaction was running (before the crash) and so the same locks will be necessary to protect the anti-operation which reverses the change. I'll explain more about these anti-operations in one of the next in-depth logging blog posts.

The Storage Engine does two passes through the log as part of crash recovery. The first pass does REDO and also reads the log records that will be processed as part of the second pass (UNDO), looking at the lock description and actually acquiring those locks. For fast recovery, at that point the database is brought online. This is possible because the recovery system knows that it already has the correct locks to guarantee that it can safely generate and perform the anti-operations necessary to perfom UNDO. One side-effect of this is that although the database is available for use, a query may bump into one of the locks being held to allow fast recovery - in which case it will have to wait for that lock to be dropped as UNDO progresses.

Cool eh?

That was the introduction to allow me to do some gratuitous spelunking around the internals :-) I'm going to create a few simple examples to show you lock logging in the log. Now, don't get confused - it's not logging actual locks (the memory used to hold the lock itself), it's just logging a description of which locks were held and in which modes.

Here's the script to create a database with a simple table. I'm using a LOB column and specifically setting it to be stored off row (see Importance of choosing the right LOB storage technique) so we can see some text page locks too. I'm using the SIMPLE recovery model for simplicity (ha ha) - so I can clear the log when a checkpoint occurs rather than having to muck around with log backups. I'll insert the first row and then clear the log.

CREATE DATABASE LockLogging;
GO
USE LockLogging;
GO

CREATE TABLE LockLogTest (c1 INT, c2 INT, c3 VARCHAR (MAX));
GO
EXEC sp_tableoption 'LockLogtest', 'large value types out of row', 'on';
GO

INSERT INTO LockLogTest VALUES (1, 1, 'a');
GO

ALTER DATABASE LockLogging SET RECOVERY SIMPLE;
GO
CHECKPOINT;
GO

Now let's try the first operation - a simple insert - and look at the log records using fn_dblog (and I'm skipping the checkpoint log records):

INSERT INTO LockLogTest VALUES (2, 2, 'b');
GO
SELECT [Operation], [Context], [Page ID], [Slot ID], [Number of Locks] AS Locks, [Lock Information]
FROM fn_dblog (NULL, NULL);
GO

Operation        Context       Page ID        Slot ID  Locks Lock Information
---------------- ------------- -------------- -------- ----- -------------------------------------------------------------------
LOP_BEGIN_XACT   LCX_NULL      NULL           NULL     NULL  NULL
LOP_INSERT_ROWS  LCX_TEXT_MIX  0001:00000098  1        2     ACQUIRE_LOCK_IX PAGE: 18:1:152; ACQUIRE_LOCK_X RID: 18:1:152:1
LOP_INSERT_ROWS  LCX_HEAP      0001:0000009a  1        3     ACQUIRE_LOCK_IX OBJECT: 18:2073058421:0;
                                                                 ACQUIRE_LOCK_IX PAGE: 18:1:154; ACQUIRE_LOCK_X RID: 18:1:154:1
LOP_COMMIT_XACT  LCX_NULL      NULL           NULL     NULL  NULL

We can see page IX and row X locks for the LOB value being inserted into the text page, plus table IX, page IX, and row X locks for the data record being inserted into the heap. The lock resources break out as follows:

  • 18:1:152 is page 152 in file 1 of database ID 18
  • 18:1:152:1 is slot 1 on page 152 in file 1 of database ID 18
  • 18:2073058421:0 is object ID 2073058421 (the object ID of the table LockLogTest) in database ID 18

Notice also the LOP_BEGIN_XACT and LOP_COMMIT_XACT log records - even though I didn't do an explicit transaction, SQL Server has to start one internally for me (called an implicit transaction) so that there's a boundary for where to rollback if something goes wrong during the operation. 

And now an update operation (with a checkpoint first to clear out the log):

CHECKPOINT;
GO
UPDATE LockLogTest SET c1 = 3;
GO
SELECT [Operation], [Context], [Page ID], [Slot ID], [Number of Locks] AS Locks, [Lock Information]
FROM fn_dblog (NULL, NULL);
GO

Operation        Context       Page ID        Slot ID  Locks Lock Information
---------------- ------------- -------------- -------- ----- -------------------------------------------------------------------
LOP_BEGIN_XACT   LCX_NULL      NULL           NULL     NULL  NULL
LOP_MODIFY_ROW   LCX_HEAP      0001:0000009a  0        3     ACQUIRE_LOCK_IX OBJECT: 18:2073058421:0;
                                                                 
ACQUIRE_LOCK_IX PAGE: 18:1:154; ACQUIRE_LOCK_X RID: 18:1:154:0
LOP_MODIFY_ROW   LCX_HEAP      0001:0000009a  1        3     ACQUIRE_LOCK_IX OBJECT: 18:2073058421:0;
                                                                 ACQUIRE_LOCK_IX PAGE: 18:1:154; ACQUIRE_LOCK_X RID: 18:1:154:1
LOP_COMMIT_XACT  LCX_NULL      NULL           NULL     NULL  NULL

Just as we expected - a table IX lock, a page IX lock, and two row X locks on that page.

Now, what about something more complicated like a TRUNCATE TABLE? Have you heard the myth about it not being logged? Right - it's a myth:

CHECKPOINT;
GO
TRUNCATE TABLE LockLogTest;
GO
SELECT [Operation], [Context], [Page ID], [Slot ID], [Number of Locks] AS Locks, [Lock Information]
FROM fn_dblog (NULL, NULL);
GO

Operation        Context       Page ID        Slot ID  Locks Lock Information
---------------- ------------- -------------- -------- ----- -------------------------------------------------------------------
LOP_BEGIN_XACT   LCX_NULL      NULL           NULL     NULL  NULL
LOP_LOCK_XACT    LCX_NULL      NULL           NULL     1     ACQUIRE_LOCK_SCH_M OBJECT: 18:2073058421:0
LOP_MODIFY_ROW   LCX_IAM       0001:0000009b  0        1     ACQUIRE_LOCK_X RID: 18:1:155:0
LOP_MODIFY_ROW   LCX_PFS       0001:00000001  0        1     ACQUIRE_LOCK_X PAGE: 18:1:154
LOP_MODIFY_ROW   LCX_PFS       0001:00000001  0        1     ACQUIRE_LOCK_X PAGE: 18:1:155
LOP_MODIFY_ROW   LCX_IAM       0001:00000099  0        1     ACQUIRE_LOCK_X RID: 18:1:153:0
LOP_MODIFY_ROW   LCX_PFS       0001:00000001  0        1     ACQUIRE_LOCK_X PAGE: 18:1:152
LOP_MODIFY_ROW   LCX_PFS       0001:00000001  0        1     ACQUIRE_LOCK_X PAGE: 18:1:153
LOP_SET_BITS     LCX_SGAM      0001:00000003  1        NULL  NULL
LOP_SET_BITS     LCX_GAM       0001:00000002  1        NULL  NULL
LOP_COUNT_DELTA  LCX_CLUSTERED 0001:00000014  89       NULL  NULL
LOP_COUNT_DELTA  LCX_CLUSTERED 0001:00000011  78       NULL  NULL
LOP_COUNT_DELTA  LCX_CLUSTERED 0001:00000014  90       NULL  NULL
LOP_COUNT_DELTA  LCX_CLUSTERED 0001:00000041  164      NULL  NULL
LOP_COUNT_DELTA  LCX_CLUSTERED 0001:00000041  165      NULL  NULL
LOP_COUNT_DELTA  LCX_CLUSTERED 0001:00000041  166      NULL  NULL
LOP_HOBT_DDL     LCX_NULL      NULL           NULL     NULL  NULL
LOP_MODIFY_ROW   LCX_CLUSTERED 0001:00000014  89       2     ACQUIRE_LOCK_IX OBJECT: 18:7:0;
                                                                
ACQUIRE_LOCK_X KEY: 18:458752 (0000c2681664)
LOP_HOBT_DDL     LCX_NULL      NULL           NULL     NULL  NULL
LOP_MODIFY_ROW   LCX_CLUSTERED 0001:00000014  90       2     ACQUIRE_LOCK_IX OBJECT: 18:7:0;
                                                                 ACQUIRE_LOCK_X KEY: 18:458752 (00007a581379)
LOP_MODIFY_ROW   LCX_CLUSTERED 0001:00000011  78       2     ACQUIRE_LOCK_IX OBJECT: 18:5:0;
                                                                 ACQUIRE_LOCK_X KEY: 18:327680 (00001df3833b)
LOP_COMMIT_XACT  LCX_NULL      NULL           NULL     NULL  NULL

Lots of logging and lots of locks. If you look at the Context column, you'll see that the operation is modifying allocation bitmaps (LCX_IAM, LCX_PFS, LCX_SGAM, LCX_GAM) but taking locks on the table pages, not on the allocation bitmaps themselves - they're only ever latched (an internal, much lighter-weight, synchronization mechanism). This is done as the pages comprising the table are deallocated - this is all done because the table's small enough that the Storage Engine chooses to deallocate all the storage immediately, instead of pushing it all onto the task queue for the deferred drop background task. See my previous post Search Engine Q&A #10: When are pages from a truncated table reused? which discusses this too.

There are no actual row operations performed on the table itself. The only table row operations are down at the bottom on table with object IDs 7 and 5 (sysallocunits and sysrowsets, respectively) to update the page counts, first IAM, and first page entries for the table.

So - hopefully this has been useful to you. In the next post in the series, I'll discuss compensation log records and how rollback operations work.

PS Send me an email or put in a comment if there's something in particular about the log (or log records) you'd like to see explained.

One of the perennial problems facing both experienced and involuntary DBAs is how to tell whether a database is really in the FULL recovery model or not? This is complicated by the fact that when you switch a database into the FULL recovery mode, it actually behaves as if it's in the SIMPLE recovery mode until the log backup chain is established (this is commonly called being in 'pseudo-SIMPLE').

It's a problem for several reasons:

1) if the database is really in the FULL recovery model then log backups must be taken so the log can clear/truncate properly and it doesn't grow out of control

2) if the database is in the FULL recovery model but the log backup chain has been broken (or not established at all since the database was created) then log backups are not possible (except for the yuckiness in SQL 2000 when log backups would succeed without complaint but be totally useless during disaster recovery)

I don't know of any script to easily determine whether a database is really in the FULL recovery mode, so I knocked one together - and I present it here for you to use.

The trick to the script is finding the last LSN that's been backed up for the database. if this is non-NULL, then a log backup chain exists and the database is really in the FULL recovery mode. This is stored in the dbi_dbbackupLSN field in the database boot page (see Search Engine Q&A #20: Boot pages, and boot page corruption) but also nicely available in the DMV sys.database_recovery_status.

I've tested this on 2005 SP3 and 2008 SP1.

Note that this doesn't work on SQL 2000 - I've poked around and can't find a way to get at the LSN without reading the boot page directly, which can't be done gracefully inside a function - I'll leave that as an exercise for you. You'd expect the IsTruncLog property returned by DATABASEPROPERTY to be correct when the database is in pseudo-SIMPLE, but it's not unfortunately.

Here are some test cases for the script:

CREATE DATABASE SimpleModeDB;
CREATE DATABASE BulkLoggedModeDB;
CREATE DATABASE FullModeDB;
GO

ALTER DATABASE SimpleModeDB SET RECOVERY SIMPLE;
ALTER DATABASE BulkLoggedModeDB SET RECOVERY BULK_LOGGED;
ALTER DATABASE FullModeDB SET RECOVERY FULL;
GO

SELECT [Name], msdb.dbo.SQLSkillsIsReallyInFullRecovery ([Name]) AS 'ReallyInFULL'
FROM sys.databases
WHERE [Name] LIKE '%ModeDB';
GO

Name              ReallyInFULL
----------------- -------------
SimpleModeDB      0
BulkLoggedModeDB  0
FullModeDB        0

This makes sense - the new FullModeDB database is still in pseudo-SIMPLE. Now what if we take a full database backup?

BACKUP DATABASE FullModeDB To DISK='C:\SQLskills\FullModeDB.bck' WITH INIT;
GO
SELECT msdb.dbo.SQLSkillsIsReallyInFullRecovery ('FullModeDB') AS 'ReallyInFULL';
GO

Processed 152 pages for database 'FullModeDB', file 'FullModeDB' on file 1.
Processed 1 pages for database 'FullModeDB', file 'FullModeDB_log' on file 1.
BACKUP DATABASE successfully processed 153 pages in 0.230 seconds (5.449 MB/sec).

ReallyInFULL
------------
1

Perfect. Now what about switching it back to SIMPLE and back to FULL again?

ALTER DATABASE FullModeDB SET RECOVERY SIMPLE;
ALTER DATABASE FullModeDB SET RECOVERY FULL;
GO
SELECT msdb.dbo.SQLSkillsIsReallyInFullRecovery ('FullModeDB') AS 'ReallyInFULL';
GO

ReallyInFULL
------------
0

Just as we expect - the log backup chain has been broken and the database is back to pseudo-SIMPLE again.

Now what if we restart the log backup chain using a full database backup?

BACKUP DATABASE FullModeDB To DISK='C:\SQLskills\FullModeDB.bck' WITH INIT;
GO
SELECT msdb.dbo.SQLSkillsIsReallyInFullRecovery ('FullModeDB') AS 'ReallyInFULL';
GO

Processed 152 pages for database 'FullModeDB', file 'FullModeDB' on file 1.
Processed 1 pages for database 'FullModeDB', file 'FullModeDB_log' on file 1.
BACKUP DATABASE successfully processed 153 pages in 0.095 seconds (13.193 MB/sec).

ReallyInFULL
------------
1

Perfect. Now what about if we break the chain and try to restart it using a differential database backup?

ALTER DATABASE FullModeDB SET RECOVERY SIMPLE;
ALTER DATABASE FullModeDB SET RECOVERY FULL;
GO
BACKUP DATABASE FullModeDB To DISK='C:\SQLskills\FullModeDB_diff.bck' WITH INIT, DIFFERENTIAL;
GO
SELECT msdb.dbo.SQLSkillsIsReallyInFullRecovery ('FullModeDB') AS 'ReallyInFULL';
GO

Processed 40 pages for database 'FullModeDB', file 'FullModeDB' on file 1.
Processed 1 pages for database 'FullModeDB', file 'FullModeDB_log' on file 1.
BACKUP DATABASE WITH DIFFERENTIAL successfully processed 41 pages in 0.080 seconds (4.192 MB/sec).

ReallyInFULL
------------
1

Perfect - that works too, as I'd expect. You may wonder why a differential backup works - either a full or differential backup will work as they bridge the LSN gap since the last full or differential backup before the log backup chain was broken - both of these backups include transaction log - see More on how much transaction log a full backup includes.

And here's the script itself - enjoy!

/*============================================================================
   File: SQLskillsIsReallyInFullRecovery.sql

   Summary: This script creates a function in msdb that returns a BIT value of
   1 if the supplied database is really in the FULL recovery mode and not still
   in pseudo-SIMPLE.

   Date: October 2009

   SQL Server Versions:
      10.0.2531.00 (SS2008 SP1)
      9.00.4035.00 (SS2005 SP3)
------------------------------------------------------------------------------
   Copyright (C) 2009 Paul S. Randal, SQLskills.com
   All rights reserved.

   For more scripts and sample code, check out
     
http://www.sqlskills.com/

   You may alter this code for your own *non-commercial* purposes. You may
   republish altered code as long as you give due credit.

   THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF
   ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED
   TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
   PARTICULAR PURPOSE.
============================================================================*/

USE [msdb];
GO

IF EXISTS (SELECT * FROM sys.objects WHERE NAME = 'SQLskillsIsReallyInFullRecovery')
   DROP FUNCTION SQLskillsIsReallyInFullRecovery;
GO
 
CREATE FUNCTION SQLskillsIsReallyInFullRecovery (
   @DBName sysname)
RETURNS BIT
AS
BEGIN
  
DECLARE @IsReallyFull  BIT;
   DECLARE @LastLogBackupLSN NUMERIC (25,0);
   DECLARE @RecoveryModel  TINYINT;

   SELECT @LastLogBackupLSN = [last_log_backup_lsn]
   FROM sys.database_recovery_status
   WHERE [database_id] = DB_ID (@DBName);

   SELECT @RecoveryModel = [recovery_model]
   FROM sys.databases
   WHERE [database_id] = DB_ID (@DBName);

   IF (@RecoveryModel = 1 AND @LastLogBackupLSN IS NOT NULL)
      SELECT @IsReallyFull = 1
   ELSE
      SELECT @IsReallyFull = 0;

   RETURN (@IsReallyFull);
END;
GO

Last week I posted in-depth about how checkpoints work and what exactly goes on (see How do checkpoints work and what gets logged). About a year ago I posted about why the buffer pool on a busy system may seem to have an inordinate amount of dirty tempdb pages in it, and now I want to clarify a bit more why this is the case and how checkpoints work for tempdb. To see the buffer pool contents, see my post Inside the Storage Engine: What's in the buffer pool?.

A checkpoint is only done for tempdb when the tempdb log file reaches 70% full - this is to prevent the tempdb log from growing if at all possible (note that a long-running transaction can still essentially hold the log hostage and prevent it from clearing, just like in a user database).

If you read the first post I reference above, you'll see that when a checkpoint occurs for a user database, all dirty pages for that database are flushed to disk (as well as other operations). This does not happen for tempdb. Tempdb is not recovered in the event of a crash, and so there is no need to force dirty tempdb pages to disk, except in the case where the lazywriter process (part of the buffer pool) has to make space for pages from other databases.

The other operation that occurs during a checkpoint of databases in the SIMPLE recovery model is that the VLFs in the log are examined to see if they can be made inactive (see TechNet Magazine: feature article on understanding logging and recovery and Inside the Storage Engine: More on the circular nature of the log). This process allows the log to wrap-around and overwrite itself without having to grow - and this process *IS* required by tempdb, for proper management of it's log size.

This is all that happens for a checkpoint of tempdb - no data pages are flushed to disk.

This is a blog post I've been meaning to do for a while, and I've recently noticed some info on the web about checkpoints which is a little misleading, so I want to do a quick post to explain how checkpoints work as far as log records are concerned.

When a checkpoint operation occurs, no matter how it's triggered (for instance through a manual CHECKPOINT, from a database or differential backup, or automatically) the same set of operations occurs:

  • All dirty data file pages for the database are written to disk (all pages that have changed in memory since they were read from disk or since the last checkpoint), regardless of the state of the transaction that made the change.
  • Before a page is written to disk, all log records up to and including the most recent log record describing a change to that page are written to disk (yes, log records can be cached in memory too). This guarantees recovery can work and is called write-ahead logging. Log records are written to the log sequentially, and log records from multiple transactions will be interspersed in the log. The log cannot be selectively written to disk, so writing a dirty page to disk that only has a single log record affecting it may mean writing many more previous log records to disk as well.
  • Log records describing the checkpoint are generated.
  • The LSN of the checkpoint is recorded in the database boot page in the dbi_checkptLSN field (see Search Engine Q&A #20: Boot pages, and boot page corruption).
  • If in the SIMPLE recovery model, the VLFs in the log are checked to see whether they can be marked inactive (called clearing or truncating the log - both of which are terrible misnomers, as nothing is either physically cleared or truncated).

I'm using some terms you may not have come across - for a background primer on logging, recovery, and transaction log architecture see my article in the February 2009 TechNet Magazine - Understanding Logging and Recovery in SQL Server (or come to my Wednesday November 4th PASS Spotlight session with the same title).

Checkpoints are not really tracked in the transaction log - it just serves as a useful repository for information about which transactions are active at the time of the checkpoint. The LSN of the last checkpoint is recorded in the database boot page. This is where recovery starts, and if this page is inaccessible, the database cannot be attached, opened, or processed in any way - partly because it's the boot page that knows whether the database was cleanly shut down or not, and partly because it's the only place that records the LSN of the last checkpoint record. You may say, well it's recorded in the transaction log too, but what if the log is corrupt in some way?

One area of confusion I've seen is that the checkpoint log records are overwritten by subsequent checkpoints. Absolutely not - once written, a log record is NEVER updated or overwritten - it will only be overwritten when the log wraps and the VLFs are re-used (see Inside the Storage Engine: More on the circular nature of the log). This has led to further confusion about when checkpoint information is retrievable from the log, using commands such as fn_dblog.

For the rest of this post, I want to show you what's going on in the transaction log when checkpoints occur under different circumstances.

Consider the following example:

CREATE DATABASE CheckpointTest;
GO
USE CheckpointTest;
GO
CREATE TABLE t1 (c1 INT);
GO
INSERT INTO t1 VALUES (1);
GO
CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
----------------------- -------------------------------
00000047:00000051:009b  LOP_BEGIN_CKPT
00000047:00000091:0001  LOP_END_CKPT

We see the log records for the checkpoint. In this case the checkpoint is very simple and so there are only two records - beginning and ending the checkpoint.

If we run another checkpoint, what do we see?

CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
----------------------- -------------------------------
00000047:00000092:0001  LOP_BEGIN_CKPT
00000047:00000093:0001  LOP_END_CKPT

Only information about one checkpoint, but with different LSNs for the log records. It's not that the previous checkpoint was overwritten, it's that we did a checkpoint - so the active portion of the log has moved forward and the log records for the previous checkpoint aren't considered active any more as they're not required (for instance, for database mirroring, an active transaction, a log backup, transactional replication). They're still there in the log though, but just aren't part of the required portion of the log and so aren't dumped by fn_dblog.

Now, what if I create an active transaction? In another connection, I'll do:

USE CheckpointTest;
GO
BEGIN TRAN;
GO
INSERT INTO t1 VALUES (1);
GO

And what if we do a checkpoint and look at the log now?

CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
----------------------- -------------------------------
00000047:00000094:0001  LOP_BEGIN_XACT
00000047:00000094:0002  LOP_INSERT_ROWS
00000047:00000094:0003  LOP_COUNT_DELTA
00000047:00000094:0004  LOP_COUNT_DELTA
00000047:00000094:0005  LOP_COUNT_DELTA
00000047:00000094:0006  LOP_BEGIN_CKPT
00000047:00000096:0001  LOP_XACT_CKPT
00000047:00000096:0002  LOP_END_CKPT

We see the start of the open transaction, the insert of the record, the update of row counts in metadata, and the checkpoint.

You may notice that there's another log record being generated for the checkpoint - LOP_XACT_CKPT. This is only generated when there are active (uncommitted) transactions and it lists information about all actvie transactions at the time the checkpoint begins. This is used during crash recovery to work out how far back in the transaction log to go to start REDO and UNDO operations (well, technically only UNDO will need to go this far back). Focusing in on this log record, we can see:

SELECT [Current LSN], [Operation], [Num Transactions], [Log Record]
FROM fn_dblog (NULL, NULL) WHERE [Operation] = 'LOP_XACT_CKPT';
GO

Current LSN             Operation                       Num Transactions
----------------------- ------------------------------- ----------------
00000047:00000096:0001  LOP_XACT_CKPT                   1

Log Record
-----------------------------------------------------------------------------------------------------------
0x000018 <snip> 780500000000000047000000940000000100040147000000940000000200000001 <snip> 6621000000000000

This log record contains information about each active (uncommitted) transaction at the time of the checkpoint. Without going into all the details about what's in the payload for this log record, you can see two things in there:

  • The LSN of the LOP_BEGIN_XACT log record for the oldest active transaction (the first bold number above - match it against the LOP_BEGIN_XACT in the log dump a little higher up)
  • The LSN of the first log record making a database change for that transaction (the second bold number above - match it against the LOP_INSERT_ROWS in the log dump a little higher up)

You'll notice that these LSNs are stored byte-reversed. 

How about if we do another checkpoint?

CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
----------------------- -------------------------------
00000047:00000094:0001  LOP_BEGIN_XACT
00000047:00000094:0002  LOP_INSERT_ROWS
00000047:00000094:0003  LOP_COUNT_DELTA
00000047:00000094:0004  LOP_COUNT_DELTA
00000047:00000094:0005  LOP_COUNT_DELTA
00000047:00000094:0006  LOP_BEGIN_CKPT
00000047:00000096:0001  LOP_XACT_CKPT
00000047:00000096:0002  LOP_END_CKPT
00000047:00000097:0001  LOP_BEGIN_CKPT
00000047:00000098:0001  LOP_XACT_CKPT
00000047:00000098:0002  LOP_END_CKPT

This time we see the log records for the current checkpoint and the previous one - as the active log stretches all the way back to the start of the oldest active transaction - no matter how many checkpoints we do while there's an active transaction.

Now what if we start another active transaction in a third connection?

USE CheckpointTest;
GO
BEGIN TRAN;
GO
INSERT INTO t1 VALUES (2);
GO

And then go back to the original connection and do another checkpoint and dump the log again:

CHECKPOINT;
GO
SELECT [Current LSN], [Operation] FROM fn_dblog (NULL, NULL);
GO

Current LSN             Operation
----------------------- -------------------------------
00000047:00000094:0001  LOP_BEGIN_XACT
00000047:00000094:0002  LOP_INSERT_ROWS
00000047:00000094:0003  LOP_COUNT_DELTA
00000047:00000094:0004  LOP_COUNT_DELTA
00000047:00000094:0005  LOP_COUNT_DELTA
00000047:00000094:0006  LOP_BEGIN_CKPT
00000047:00000096:0001  LOP_XACT_CKPT
00000047:00000096:0002  LOP_END_CKPT
00000047:00000097:0001  LOP_BEGIN_CKPT
00000047:00000098:0001  LOP_XACT_CKPT
00000047:00000098:0002  LOP_END_CKPT
00000047:00000099:0001  LOP_BEGIN_XACT
00000047:00000099:0002  LOP_INSERT_ROWS
00000047:00000099:0003  LOP_COUNT_DELTA
00000047:00000099:0004  LOP_COUNT_DELTA
00000047:00000099:0005  LOP_COUNT_DELTA
00000047:00000099:0006  LOP_BEGIN_CKPT
00000047:0000009b:0001  LOP_XACT_CKPT
00000047:0000009b:0002  LOP_END_CKPT

You can see that we now have three sets of checkpoint log records, and two active transactions. Only one of these sets of checkpoint log records is the pertinent one - the previous two have been superceded, but the log records haven't been overwritten or removed, as you can clearly see.

Looking inside all the LOP_XACT_CKPT records, we can see (with a bit of creative formatting of the output):

SELECT [Current LSN], [Operation], [Num Transactions], [Log Record]
FROM fn_dblog (NULL, NULL) WHERE [Operation] = 'LOP_XACT_CKPT';
GO

Current LSN             Operation                       Num Transactions
----------------------- ------------------------------- ----------------
00000047:00000096:0001  LOP_XACT_CKPT                   1
00000047:00000098:0001  LOP_XACT_CKPT                   1
00000047:0000009b:0001  LOP_XACT_CKPT                   2

Log Record
-------------------------------------------------------------------------------------------------------------
0x000018 <snip> 780500000000000047000000940000000100040147000000940000000200000001 <snip> 21000000000000
0x000018 <snip> 780500000000000047000000940000000100040147000000940000000200000001 <snip> 21000000000000
0x000018 <snip> 780500000000000047000000940000000100040147000000940000000200000001 <snip> 21000000000000 ...
    ... 79050000000000004700000099000000010004014700000099000000020000000100000002000000DC000000

The first two checkpoints only list one active transaction and the most recent one lists two, as we'd expect. The log record payload for the first two lists the same oldest active transaction (highlighted in bold). The payload for the last checkpoint lists the same oldset active transaction (as that hasn't changed), but now lists an additional transaction (match the 470000009900000001000 with the LSN of the second LOP_BEGIN_XACT in the log dump above), and has a transaction count of two.

Just to finish things off, let's look at the boot page of the database either using DBCC PAGE or DBCC DBINFO:

DBCC TRACEON (3604);
GO
DBCC DBINFO ('CheckpointTest');
GO

DBINFO STRUCTURE:


DBINFO @0x6711EF64

dbi_dbid = 18                        dbi_status = 65536                   dbi_nextid = 2089058478
dbi_dbname = CheckpointTest          dbi_maxDbTimestamp = 2000            dbi_version = 611
dbi_createVersion = 611              dbi_ESVersion = 0                   
dbi_nextseqnum = 1900-01-01 00:00:00.000                                  dbi_crdate = 2009-09-28 07:06:35.873
dbi_filegeneration = 0              
dbi_checkptLSN

m_fSeqNo = 71                        m_blockOffset = 153                  m_slotId = 6
dbi_RebuildLogs = 0                  dbi_dbccFlags = 2                   
dbi_dbccLastKnownGood = 1900-01-01 00:00:00.000                          
<snip>

The dbi_checkptLSN is dumped out in decimal - converting to hex gives us  (47:99:6), which exactly matches the LSN of the most recent LOP_BEGIN_CKPT record in the log dump above.

Hopefully this explains things clearly!

The August edition of TechNet Magazine is available on the web now and has the latest installment of my regular SQL Q&A column.

This month's topics are:

  • Proactive page checksum failure detection
  • Why GUIDs make for bad clustered index keys 
  • Possible problems using a log shipping secondary for reporting 
  • Recovery model choice for log size management

Check it out at http://technet.microsoft.com/en-us/magazine/dd776512.aspx.

I came across a question on ServerFault this afternoon that inflamed my desire to be ultra-geeky (it was really already inflamed after teaching backup and restore internals all afternoon). Basically the question boiled down to how to find out who dropped a table if there's no other way except the transaction log (e.g. no tracing is available, even the default trace). So I hacked around and figured out at least how to find out *when* a particular table was dropped plus the UID and SPID of who dropped it.

Everything hinges on using undocumented commands to look into the transaction log. I've played with this before on the blog: fn_dblog.

First off I created a script to create a database, populate a table and then drop it.

USE master;
GO
CREATE DATABASE FnDbLogTest;
GO
USE FnDbLogTest;
GO

CREATE TABLE TestTable (
    c1 INT IDENTITY,
    c2 CHAR (100) DEFAULT 'a');
GO
CREATE CLUSTERED INDEX TT_CL ON TestTable (c1);
GO
INSERT INTO TestTable DEFAULT VALUES;
GO

SELECT OBJECT_ID ('TestTable');
GO

DROP TABLE TestTable;
GO

First we need to find the transactions that drop tables in the log. The following code works for 2000, 2005, and 2008:

SELECT [Transaction Id], [Begin Time], [UID], [SPID]
FROM ::fn_dblog (NULL, NULL)
WHERE [Transaction Name] = 'DROPOBJ'
GO

The (NULL, NULL) is the starting LSN and ending LSN to process - NULL means process everything available. 

Results on SQL Server 2000 will look as follows for me logging in as a Windows login in the SA role:

Transaction Id Begin Time              UID         SPID       
-------------- ----------------------- ----------- -----------
0000:000000e0  2009/06/16 18:23:03:320 1           51

And for the same circumstances on 2005 and 2008, the results look like:

Transaction Id Begin Time               UID         SPID
-------------- ------------------------ ----------- -----------
0000:00000587  2009/06/16 17:49:56:927  -1          51

If the user who dropped the table logged in as a role member, and they're not connected as that SPID any more, you may not be able to tell who it was unless you're also tracking successful logins into your server - but you'll at least know what role it was.

Now, this only shows us that a table was dropped, not which table it was. There's no way to get the name of the table that was dropped, only the object ID - so you'll need to have some other way to determine what the table ID is if there are multiple table drops and only one of them is malignant.

For SQL Server 2000, the code to find which object ID we're talking about is as follows (dropping the Transacation Id into the WHERE clause):

SELECT DISTINCT [Object Name] FROM ::fn_dblog (NULL, NULL)
WHERE [Transaction Id] = '0000:000000e0'
AND [Context] = 'LCX_IAM';
GO

Object Name     
--------------------
(2009058193)

The object ID in parentheses is the ID of the table that was dropped. 

For SQL Server 2005 and 2008, the code is as follows (with the same Transaction Id substitution):

SELECT TOP (1) [Lock Information] FROM ::fn_dblog (NULL, NULL)
WHERE [Transaction Id] = '0000:00000587'
AND [Lock Information] LIKE '%SCH_M OBJECT%';
GO

Lock Information
--------------------------------------------
ACQUIRE_LOCK_SCH_M OBJECT: 8:2073058421:0

The 8:2073058421 is the database ID and object ID of the table that was dropped.

Now you can go find whoever it was and take whatever action you deem appropriate Wink

Hope this helps!

PS If you find the you don't get enough info from ::fn_dblog, try turning on trace flag 2537. It allows the function to look at *all* possible log, not just the active log.

There's still a widely held misconception that when properly in the FULL or BULK_LOGGED recovery models that full or differential backups can truncate the log. No. It *NEVER* happens. This is one of the reasons why I'm doing a whole spotlight session on this at PASS this year - the transaction log and its behavior is IMHO one of the most misunderstood parts of SQL Server.

Notice that I said 'when properly in the FULL or BULK_LOGGED recovery models'. If you switch recovery models to FULL or BULK_LOGGED, until you take the first full backup, you are still essentially in the SIMPLE recovery model, and so the log will truncate on checkpoint. Once you take that first full backup, you are then in I-will-manage-the-size-of-the-log-through-log-backups mode. After that, the ONLY thing that will allow the log to clear/truncate is a log backup, as long as nothing else requires those transaction log records.

If you're not familiar with the term 'log clearing' or 'log truncating', they mean exactly the same thing - part of the transaction log is marked as no longer needed and can be overwritten. Nothing is zeroed out, the log file size is not altered. Some background reading on this:

Earlier this week I was involved in a discussion about log backup size management and how to prevent a log backup following a maintenance operation to not contain details of the maintenance operation.

There's a very simple answer: you can't.

If you do an operation in the FULL or BULK_LOGGED recovery models, the next log backup will contain all information required to replay that operation. In the FULL recovery model, everything is fully logged, so the log backup will contain all the log records generated by the operation. In the BULK_LOGGED recovery model, you may perform a minimally-logged operation, which generates hardly any transaction log, but the next log *backup* will be about the same size as if the operation was fully logged - because the log backup will pick up all the data extents modified by the minimally-logged operation.

One point in the discussion was that if you're running in the FULL or BULK_LOGGED recovery models, and you do a full backup after the maintenance operation, and before the log backup, the full backup will contain all the changes made by the maintenance operation, yes, and will clear the log.

No. Never.

A log backup is *ALL* the log generated since the last log backup. If this were not the case, how would log shipping work? You could take a full backup on the log shipping primary and suddenly you've broken the log backup chain and log shipping breaks. No, this is not how things work. A full backup contains only enough transaction log necessary to be able to restore that database to a transactionally consistent time - the time at which the data reading portion of the full backup completed. I blogged about this extensively previously:

But you don't have to believe me - it's very simple to convince yourself. The following script will show you that a full backup has no effect on the transaction log. It does the following:

  • Create a database and put it into the FULL recovery model, with a full backup.
  • Create and populate and index.
  • Take log backup 1 (just to clear things out)
  • Rebuild the index.
  • Take log backup 2.
  • Rebuild the index.
  • Take a full backup.
  • Take log backup 3.

And we will see that log backup #3 is the same size as log backup #2. The full backup will make no difference whatsoever.

Here's the script:

USE master;
GO
DROP DATABASE LogBackupTest;
GO
CREATE DATABASE LogBackupTest;
GO
USE LogBackupTest;
GO

ALTER DATABASE LogBackupTest SET RECOVERY FULL;
GO
BACKUP DATABASE LogBackupTest TO
DISK = 'C:\SQLskills\LogBackupTest_Full1.bak' WITH INIT;
GO

CREATE TABLE t1 (c1 INT IDENTITY, c2 CHAR (8000) DEFAULT 'a');
GO
CREATE CLUSTERED INDEX t1c1 ON t1 (c1);
GO
SET NOCOUNT ON;
GO
INSERT INTO t1 DEFAULT VALUES;
GO 1000

BACKUP LOG LogBackupTest TO
DISK = 'C:\SQLskills\LogBackupTest_Log1.bak' WITH INIT;
GO

-- Rebuild the index to generate some log and get a baseline
ALTER INDEX t1c1 ON t1 REBUILD;
GO
BACKUP LOG LogBackupTest TO
DISK = 'C:\SQLskills\LogBackupTest_Log2.bak' WITH INIT;
GO

-- Now do it again, but take a full backup before the log backup
ALTER INDEX t1c1 ON t1 REBUILD;
GO
BACKUP DATABASE LogBackupTest TO
DISK = 'C:\SQLskills\LogBackupTest_Full2.bak' WITH INIT;
GO
BACKUP LOG LogBackupTest TO
DISK = 'C:\SQLskills\LogBackupTest_Log3.bak' WITH INIT;
GO

And here's the result:

 

Log backup #3 is the same size as log backup #2. It contains all the log generated since log backup #2 was taken. The full backup had no affect whatsoever, because that would break the log backup chain.

If you don't believe me, run the script yourself and you'll see. A full backup does not and cannot affect the transaction log.

About a year ago, I blogged a cool script that would work out how much of a database has changed since the last full backup - i.e. how big will the next differential backup be. You can find that script at New script: How much of the database has changed since the last full backup?. I'm in the middle of writing a script that will tell you how big your next log backup will be, and as part of it, I need a way to determine how many extents from minimally logged operations will be included in the next log backup. I thought it might be useful to someone so I tidied it up and here it is for you to play with. It's almost the same as the previous script I published.

The code below creates a function and a stored procedure. The basic idea behind the code is as follows:

For each online data file in the database
   For each GAM interval in the file
      Crack the ML map page using DBCC PAGE
      Interpret the ML bitmap to aggregate the changed extents
      Add the sum to the total changed extents for the database
   End
End
Report results

There's a function that I create in msdb call SQLskillsConvertToExtents that cracks some of the DBCC PAGE output, and the main procedure is called sp_SQLskillsMLChangedData and it created as a system object in master. I tried making it a table-valued function but you can't do things like INSERT-EXEC in a function, and that's required for processing the DBCC PAGE output. So - create your own wrapper function or whatever to use it. The interface/output is:

EXEC sp_SQLskillsMLChangedData 'testdb';
GO

Total Extents Changed Extents Percentage Changed
------------- --------------- ----------------------
355           153              43.1

Anyway - here it is. You can download it in a zip file from SQLskillsMLChangedData.zip (2.50KB). Enjoy!

/*============================================================================
   File: SQLskillsMLChangedData.sql

   Summary: This script creates a system-wide SP SQLskillsMLChangedData that
   works out what percentage of a database has been changed by minimally
   logged operations since the last log backup.

   Date: May 2009

   SQL Server Versions:
      10.0.2531.00 (SS2008 SP1)
      9.00.4035.00 (SS2005 SP3)
------------------------------------------------------------------------------
   Copyright (C) 2009 Paul S. Randal, SQLskills.com
   All rights reserved.

   For more scripts and sample code, check out
      http://www.sqlskills.com/

   You may alter this code for your own *non-commercial* purposes. You may
   republish altered code as long as you give due credit.

   THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF
   ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED
   TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
   PARTICULAR PURPOSE.
============================================================================*/

-- Create the function in MSDB
--
USE msdb;
GO

IF EXISTS (SELECT * FROM sys.objects WHERE NAME = 'SQLskillsConvertToExtents')
   DROP FUNCTION SQLskillsConvertToExtents;
GO

-- This function cracks the output from a DBCC PAGE dump
-- of an allocation bitmap. It takes a string in the form
-- "(1:8) - (1:16)" or "(1:8) -" and returns the number
-- of extents represented by the string. Both the examples
-- above equal 1 extent.
--
CREATE FUNCTION SQLskillsConvertToExtents (
   @extents VARCHAR (100))
RETURNS INTEGER
AS
BEGIN
   DECLARE @extentTotal INT;
   DECLARE @colon INT;
   DECLARE @firstExtent INT;
   DECLARE @secondExtent INT;
   SET @extentTotal = 0;
   SET @colon = CHARINDEX (':', @extents);

   -- Check for the single extent case
   --
   IF (CHARINDEX (':', @extents, @colon + 1) = 0)
      SET @extentTotal = 1;
   ELSE
      -- We're in the multi-extent case
      --
      BEGIN
      SET @firstExtent = CONVERT (INT,
         SUBSTRING (@extents, @colon + 1, CHARINDEX (')', @extents, @colon) - @colon - 1));
      SET @colon = CHARINDEX (':', @extents, @colon + 1);
      SET @secondExtent = CONVERT (INT,
         SUBSTRING (@extents, @colon + 1, CHARINDEX (')', @extents, @colon) - @colon - 1));
      SET @extentTotal = (@secondExtent - @firstExtent) / 8 + 1;
   END

RETURN @extentTotal;
END;
GO

USE master;
GO

IF OBJECT_ID ('sp_SQLskillsMLChangedData') IS NOT NULL
   DROP PROCEDURE sp_SQLskillsMLChangedData;
GO

-- This SP cracks all minamally-logged bitmap pages for all online
-- data files in a database. It creates a sum of changed extents
-- and reports it as follows (example small msdb):
--
-- EXEC sp_SQLskillsMLChangedData 'msdb';
-- GO
--
-- Total Extents Changed Extents Percentage Changed
-- ------------- --------------- ----------------------
-- 355           153             43.1
--
CREATE PROCEDURE sp_SQLskillsMLChangedData (
   @dbName VARCHAR (128))
AS
BEGIN
   SET NOCOUNT ON;

   -- Create the temp table
   --
   IF EXISTS (SELECT * FROM msdb.sys.objects WHERE NAME = 'SQLskillsDBCCPage')
      DROP TABLE msdb.dbo.SQLskillsDBCCPage;

   CREATE TABLE msdb.dbo.SQLskillsDBCCPage (
      [ParentObject] VARCHAR (100),
      [Object] VARCHAR (100),
      [Field] VARCHAR (100),
      [VALUE] VARCHAR (100));

   DECLARE @fileID INT;
   DECLARE @fileSizePages INT;
   DECLARE @extentID INT;
   DECLARE @pageID INT;
   DECLARE @MLTotal INT;
   DECLARE @sizeTotal INT;
   DECLARE @total INT;
   DECLARE @dbccPageString VARCHAR (200);

   SELECT @MLTotal = 0;
   SELECT @sizeTotal = 0;

   -- Setup a cursor for all online data files in the database
   --
   DECLARE files CURSOR FOR
      SELECT [file_id], [size] FROM master.sys.master_files
      WHERE [type_desc] = 'ROWS'
      AND [state_desc] = 'ONLINE'
      AND [database_id] = DB_ID (@dbName);

   OPEN files;

   FETCH NEXT FROM files INTO @fileID, @fileSizePages;

   WHILE @@FETCH_STATUS = 0
   BEGIN
      SELECT @extentID = 0;

      -- The size returned from master.sys.master_files is in
      -- pages - we need to convert to extents
      --
      SELECT @sizeTotal = @sizeTotal + @fileSizePages / 8;

      WHILE (@extentID < @fileSizePages)
      BEGIN
         -- There may be an issue with the ML map page position
         -- on the four extents where PFS pages and GAM pages live
         -- (at page IDs 516855552, 1033711104, 1550566656, 2067422208)
         -- but I think we'll be ok.
         -- PFS pages are every 8088 pages (page 1, 8088, 16176, etc)
         -- GAM extents are every 511232 pages
         --
         SELECT @pageID = @extentID + 7;

         -- Build the dynamic SQL
         --
         SELECT @dbccPageString = 'DBCC PAGE ('
            + @dbName + ', '
            + CAST (@fileID AS VARCHAR) + ', '
            + CAST (@pageID AS VARCHAR) + ', 3) WITH TABLERESULTS, NO_INFOMSGS';

         -- Empty out the temp table and insert into it again
         --
         DELETE FROM msdb.dbo.SQLskillsDBCCPage;
         INSERT INTO msdb.dbo.SQLskillsDBCCPage EXEC (@dbccPageString);

         -- Aggregate all the changed extents using the function
         --
         SELECT @total = SUM ([msdb].[dbo].[SQLskillsConvertToExtents] ([Field]))
         FROM msdb.dbo.SQLskillsDBCCPage
            WHERE [VALUE] = ' MIN_LOGGED'
            AND [ParentObject] LIKE 'ML_MAP%';

         SET @MLTotal = @MLTotal + @total;

         -- Move to the next GAM extent
         SET @extentID = @extentID + 511232;
      END

      FETCH NEXT FROM files INTO @fileID, @fileSizePages;
   END;

   -- Clean up
   --
   DROP TABLE msdb.dbo.SQLskillsDBCCPage;
   CLOSE files;
   DEALLOCATE files;

   -- Output the results
   --
   SELECT
      @sizeTotal AS [Total Extents],
      @MLTotal AS [Changed Extents],
      ROUND (
         (CONVERT (FLOAT, @MLTotal) /
         CONVERT (FLOAT, @sizeTotal)) * 100, 2) AS [Percentage Changed];
END;
GO

-- Mark the SP as a system object
--
EXEC sys.sp_MS_marksystemobject sp_SQLskillsMLChangedData;
GO

-- Test to make sure everything was setup correctly
--
EXEC sp_SQLskillsMLChangedData 'msdb';
GO

This is a true story, and unfolded over the last few days. It's deliberately written this way, I'm not trying to be patronizing - just illustrating the mistakes people can make if they don't know what not to do.

Once upon a time (well, a few days ago), there was a security person who had access to a 2000 SQL Server instance. He saw that the server had run out of disk space, but had no SQL server knowledge. He decided that the best thing to do was to delete the log file, instead of contact the real DBA to take corrective action. This person should not have had any access to SQL Server, or the ability to delete the log. Nevertheless, the log was deleted. And then the database became suspect. Eventually, with no other option, someone decided to create a new transaction log file using DBCC REBUILD_LOG. This was the right thing to do. The database wasn't being backed up so restore wasn't an option. Some of the details of who did what are unclear, as it so often the case. Needless to say, the real DBA was kept in the dark.

Unfortunately, whoever rebuilt the log didn't run DBCC CHECKDB afterwards to find out what corruption had been caused in the database by having the transaction log unceremoniously ripped out from under the database's feet. Rebuilding a transaction log is one of the worst things that can happen to a database, as all active transactions are thrown away, with no chance of rolling back. This leaves the database in a transactionally inconsistent, and potentially structurally inconsistent state (see Corruption: Last resorts that people try first... filed under my Bad Advice category, and Q4 from TechNet Magazine: February 2009 SQL Q&A column: Is it ever safe to rebuild a transaction log?).

Instead they carried on running. The real DBA thought everything was cool and even upgraded the instance from SQL 2000 to SQL 2005. To be honest, after seeing the 19000+ corruptions they had in the database, I'm surprised that the database upgraded successfully. But it did. Then he tried to rebuild some indexes, which failed with corruption errors. Which brought us to this morning, when DBCC CHECKDB was run. It reported a lot of corruption, but running from SSMS only showed the first 1000 errors (which was the inspiration for one of my previous posts today How to get all the corruption messages from CHECKDB. So they started to run REPAIR_ALLOW_DATA_LOSS, which will cause data loss - we chose the name of the repair option well. I got involved at this point, and saw from the list of errors that massive data loss would be the result.

Once they had re-run DBCC CHECKDB though the command line (luckily only 20 minutes) I could see all the errors. And I must say I was astounded. It was the *worst* case of multiply-allocated extents (see Inside the Storage Engine: Anatomy of an extent) that I've ever seen. Not only were the extents allocated by multiple IAM pages (see Inside the Storage Engine: IAM pages, IAM chains, and allocation units) - in other words, two objects thought they had the same pages in the database allocated, but the two objects had actually each proceeded to overwite the other's updates in the pages - getting the two clustered indexes hopelessly interlinked. Lots of errors (1000s) like:

Msg 8904, Level 16, State 1, Server NAMESCHANGEDTOPROTECTTHEINNOCENT, Line 1
Extent (1:9528) in database ID 5 is allocated by more than one
allocation object.
...
Msg 8978, Level 16, State 1, Server NAMESCHANGEDTOPROTECTTHEINNOCENT, Line 1
Table error: Object ID 1445580188, index ID 1, partition ID
376212519911424, alloc unit ID 94737543200768 (type In-row data). Page
(1:52696) is missing a reference from previous page (1:427112).
Possible chain linkage problem.
...
Msg 8977, Level 16, State 1, Server NAMESCHANGEDTOPROTECTTHEINNOCENT, Line 1
Table error: Object ID 1445580188, index ID 1, partition ID
376212519911424, alloc unit ID 94737543200768 (type In-row data).
Parent node for page (1:143210) was not encountered.
...
Msg 2533, Level 16, State 1, Server NAMESCHANGEDTOPROTECTTHEINNOCENT, Line 1
Table error: page (1:405139) allocated to object ID 1445580188, index
ID 1, partition ID 376212519911424, alloc unit ID 94737543200768 (type
In-row data) was not seen. The page may be invalid or may have an
incorrect alloc unit ID in its header.
...
CHECKDB found 1653 allocation errors and 17646 consistency errors in
database 'NAMESCHANGEDTOPROTECTTHEINNOCENT'.

The repair for all of this? Delete it all and fix up the links. MASSIVE data loss.

Lessons to learn from this:

  • Don't give people with no clue about SQL Server access to SQL Server. 
  • Don't delete a transaction log to reclaim space. Cardinal sin.
  • Don't rebuild a transaction log UNLESS you run a full DBCC CHECKDB afterwards and satify yourself that you know the extent of the damage.
  • Don't upgrade a database without running DBCC CHECKDB first. Best case - the upgrade fails. Worst case - it upgrades and then you might not be able to fix the corruptions.
  • Don't just run repair without doing a little analysis on the errors reported, especially with a *huge* number of errors reported.

Finally, I'm really grateful to the DBA in question for letting me help him out with this, and to post this blog post - we all learn from our own and others' mistakes.

PS And I got involved in this from Twitter - I just *love* it. After initially being skeptical of how much time I'd spend on it, I'm finding the benefits of connecting to the SQL community in 'real-time' vastly outweigh the time I'm putting into it. Follow me on http://twitter.com/paulrandal and you'll see a bunch of other SQL MVPs on there too.

Page splits are always thought of as expensive, but just how bad are they? In this post I want to create an example to show how much more transaction log is created when a page in an index has to split. I'm going to use the sys.dm_tran_database_transactions DMV to show how much more transaction log is generated when a page has to split. You can find the list of columns and a small amount of explanation of each column in Books Online here - I was reminded of its existence by someone on Twitter (sorry, don't remember who it was and I couldn't find it in search).

In the example, I'm going to create a table with approximately 1000-byte long rows:

CREATE DATABASE PageSplitTest;
GO
USE pagesplittest;
GO

CREATE TABLE BigRows (c1 INT, c2 CHAR (1000));
CREATE CLUSTERED INDEX BigRows_CL ON BigRows (c1);
GO

INSERT INTO BigRows VALUES (1, 'a');
INSERT INTO BigRows VALUES (2, 'a');
INSERT INTO BigRows VALUES (3, 'a');
INSERT INTO BigRows VALUES (4, 'a');
INSERT INTO BigRows VALUES (6, 'a');
INSERT INTO BigRows VALUES (7, 'a');
GO

I've engineered the case where the clustered index data page has space for one more row, and I've left a 'gap' at c1=5. Let's add it as part of an explicit transaction and see how much transaction log is generated:

BEGIN TRAN
INSERT INTO BigRows VALUES (8, 'a');
GO

SELECT [database_transaction_log_bytes_used] FROM sys.dm_tran_database_transactions
WHERE [database_id] = DB_ID ('PageSplitTest');
GO

database_transaction_log_bytes_used
-----------------------------------
1228

That's about what I'd expect for that row. Now what about when I cause a page split by inserting the 'missing' c1=5 row into the full page?

-- commit previous transaction
COMMIT TRAN
GO

BEGIN TRAN
INSERT INTO BigRows VALUES (5, 'a');
GO

SELECT [database_transaction_log_bytes_used] FROM sys.dm_tran_database_transactions
WHERE [database_id] = DB_ID ('PageSplitTest');
GO

database_transaction_log_bytes_used
-----------------------------------
6724

Wow. 5.5x more bytes are written to the transaction log as part of the system transaction that does the split.

The ratio gets worse as the row size gets smaller. For a row with an approximately 100-byte long row (use the same code as above, but change to a CHAR (100), insert 67 rows with a 'gap' somewhere then insert the 68th to cause the split), the two numbers are 328 and 5924 - the split cause 18 times more log to be generated! For a row with an approximately 10-byte long row, I got numbers of 240 and 10436, because I created skewed data (about 256 rows with the key value 8) and then inserted key value 5 which forced a (rare) non-middle page split. That's a ratio of more than 43 times more log generated! You can try this yourself if you want: I changed the code to have a CHAR (10), inserted values 1, 2, 3, 4, 6, 7, then inserted 256 key values of 8 and then 2 of 5. The resulting page had only 6 rows - it split after the key value 5 - the Storage Engine doesn't always do a 50/50 page split. And that's not even causing nasty cascading page-splits, or splits that have to split a page multiple times to fit a new (variable-sized) row in.

Bottom line: page splits don't just cause extra IOs and index fragmentation, they generate a *lot* more transaction log. And all that log has to be (potentially) backed up, log shipped, mirrored....

A short post to start the day (in India) and then at lunchtime I'll do how it works: FILESTREAM garbage collection.

This is a question that came up recently on an MVP discussion list: why does a 200MB log backup take 40 minutes to restore on my log-shipping secondary?

First answer I thought of was that the I/O subsystem on the log-shipping secondary was very slow, or broken in some way, so I/Os were taking ages to complete. Nope - I/O subsystem was performing well.

Next thing I suggested was the the log-shipping secondary was restoring the log backups using WITH STANDBY and there was something like an index rebuild that was being rolled back. As an aside, when you restore log backups on the log-shipping secondary, you have a choice how they recovery: either WITH NORECOVERY or WITH STANDBY. The first option doesn't allow any access to the database, as the database is still "in recovery". The second option runs the REDO part of recovery, then runs the UNDO part of recovery, but saves the details of what it had to do for UNDO into a file (who's name and location you specify). It then allows read-only access to the database, for queries/reporting/whatever. Obviously if there's a lot of transaction log that has to be undone (i.e. rolling back transactions that weren't committed at the time the log backup completed), then this could take some time. But 40 minutes? No. That wasn't it.

I was partly right on my second guess. The *previous* log backup that was restored WITH STANDBY contained a long-running index operation, and so the undo file that the restore created was *huge*. The next log backup that's restored after a RESTORE ... WITH STANDBY, must first undo everything in the undo file (i.e. put the database back to the exact state it was in as if the WITH STANDBY part of the restore never happened) before it can restore more transaction log. In this case, no-one had noticed that the previous log restore *also* took a lot of time and created the huge undo file. It just looked like the 200MB log backup was causing the problem.

My advice: in any kind of there's-suddenly-a-corruption-or-performance-problem situation, don't just look at what's happening now. Always look at what happened leading up to the problem, as the answer is often there.

Every so often (well, ok, very frequently) my internal list of 'I must blog about that' gets too big and I have an almost uncontrollable urge. Luckily, these urges end up with me blogging several times, otherwise who knows what kind of trouble I'd get into? Smile

First up today is some pretty deep internals about how the transaction log works in a specific situation. This has come up a few times in the last few internals/maintenance classes I've taught in the transaction log module so I wanted to get a blog post out there that proves my answer is correct. The question is easier to frame with a picture, borrowed from my TechNet Magazine article from February (see here).

  

The image shows the circular natue of the transaction log, and how VLFs can be overwritten once the log has wrapped around, as long as they are no longer active (red = active log/VLF, blue = inactive log/VLF). If none of this makes sense to you, go read the TechNet Magazine article and it should do.

The question is: what happens if log records continue being written to VLFs 1 and 2 in the picture, and eventually the end of VLF 2 has been reached, but VLF 3 is still active, and can't be overwritten? Does the database just stop?

Let's find out.

I'm going to create a database and then engineer the situation above.

CREATE DATABASE LogTestDB ON PRIMARY (
    NAME = LogTestDB_data,
    FILENAME = N'C:\SQLskills\LogTestDB_data.mdf')
LOG ON (
    NAME = 'LogTestDB_log',
    FILENAME = N'C:\SQLskills\LogTestDB_log.ldf',
    SIZE = 5MB);
GO

The database is in the SIMPLE recovery model. Well, to be completely honest, it's in the FULL recovery model, but there hasn't been a full database backup yet, so as far as transaction log behavior is concerned, it behaves as if it was in SIMPLE (the log can be cleared by a CHECKPOINT) - again, if this makes no sense, go read the TechNet Magazine article. 

And then see how many VLFs it has using the DBCC LOGINFO command (and this is the *only* way to see how many VLFs there are):

DBCC LOGINFO ('LogTestDB');
GO

FileId   FileSize   StartOffset   FSeqNo   Status   Parity CreateLSN
-------- ---------- ------------- -------- -------- ------ -----------
2        1245184    8192          31       2        64     0
2        1245184    1253376       0        0        0      0
2        1245184    2498560       0        0        0      0
2        1499136    3743744       0        0        0      0

We've got 4 VLFs. The Status column tells us whether the VLF is active or not. A status of 2 means the VLF is active, 0 means it's not. The sequence number (FSeqNo) is the logical order of the VLFs within the log. The FileSize is in bytes (so each VLF is about 1.25 MB). Right now there's only one active VLF.

Now I'll engineer the situation in the image above. I'm going to fill the log so that VLFs 1, 2, and 3 are active. Then I'm going to start an explicit transaction that will hold VLF 3 and onwards active. Then I'll continue filling the log so it wraps around and starts to fill up VLF 1 again.

USE LogTestDB;
GO
CREATE TABLE BigRows (c1 INT IDENTITY, c2 CHAR (8000) DEFAULT 'a');
GO

SET NOCOUNT ON;
INSERT INTO BigRows DEFAULT VALUES;
GO 300

I've filled up VLFs 1 and 2, and started on 3. Let's check with DBCC LOGINFO:

DBCC LOGINFO ('LogTestDB');
GO

FileId   FileSize   StartOffset   FSeqNo   Status   Parity CreateLSN
-------- ---------- ------------- -------- -------- ------ -----------
2        1245184    8192          31       2        64     0
2        1245184    1253376       32       2        64     0
2        1245184    2498560       33       2        64     0
2        1499136    3743744       0        0        0      0

As you can see, the first 3 VLFs now have a status of 2 (active), and they're in sequence. Now I'll create an explicit transaction that will prevent VLF 3 and onwards being cleared.

BEGIN TRAN
INSERT INTO BigRows DEFAULT VALUES;
GO

If I explicitly do a CHECKPOINT now, VLFs 1 and 2 will clear:

CHECKPOINT
GO

DBCC LOGINFO ('LogTestDB');
GO

FileId   FileSize   StartOffset   FSeqNo   Status   Parity CreateLSN
-------- ---------- ------------- -------- -------- ------ ----------
2        1245184    8192          31       0        64     0
2        1245184    1253376       32       0        64     0
2        1245184    2498560       33       2        64     0
2        1499136    3743744       0        0        0      0

Now I'll carry on filling up the BigRows table so the log wraps around and starts filling up VLFs 1 and two again.

INSERT INTO BigRows DEFAULT VALUES;
GO 300

DBCC LOGINFO ('LogTestDB');
GO

FileId   FileSize   StartOffset   FSeqNo   Status   Parity CreateLSN
-------- ---------- ------------- -------- -------- ------ ----------
2        1245184    8192          35       2        128    0
2        1245184    1253376       32       0        64     0
2        1245184    2498560       33       2        64     0
2        1499136    3743744       34       2        64     0

You can see that the log has wrapped around now, but VLFs 3 and 4 are still active. Look at the sequence numbers of the active VLFs... the active log is VLF 3 then 4 then 1, with sequence numbers 33 then 34 then 35. Now if I carry on filling up the table, what's going to happen when the log bumps up against VLF 3 that is still active? Will it stop or will it grow?

INSERT INTO BigRows DEFAULT VALUES;
GO 300

DBCC LOGINFO ('LogTestDB');
GO

FileId   FileSize   StartOffset   FSeqNo   Status   Parity CreateLSN
-------- ---------- ------------- -------- -------- ------ ------------------
2        1245184    8192          35       2        128    0
2        1245184    1253376       36       2        128    0
2        1245184    2498560       33       2        64     0
2        1499136    3743744       34       2        64     0
2        253952     5242880       37       2        64     36000000049300052
2        270336     5496832       38       2        64     36000000049300052
2        253952     5767168       0        0        0      36000000107500066
2        335872     6021120       0        0        0      36000000107500066
2        253952     6356992       0        0        0      36000000190700020
2        401408     6610944       0        0        0      36000000190700020
2        253952     7012352       0        0        0      37000000037300040
2        466944     7266304       0        0        0      37000000037300040

The answer is that it grew, and kind of skipped the active VLFs! Look at the sequence numbers. The new sequence of the active log is VLF 3 then 4 then 1 then 2 then 5 then 6, as you can see from the sequence numbers. Once the active transaction I created commits or rolls back, VLFs 3, 4, 1, and 2 can clear and then the 'normal' sequencing of VLFs in the log will resume.

Every VLF has a small header which contains the sequence number of the VLF within the transaction log, so the log can kind of do contortions to work around active VLFs in the middle of the log. Very cool.

Ok - that was fun - now I feel better!

PS In the last couple of DBCC LOGINFO dumps, where the log has wrapped around, you can see that the parity bits for the log blocks in the VLFs have changed, as I explained in one of my previous posts Search Engine Q&A #24: Why can't the transaction log use instant initialization?.

Last week's survey was on how *you* manage the size of your transaction log (see here for the survey). Here are the results as of 4/10/2009 - the most popular survey yet:

In my opinion (guarantee that I'm going to say stuff that some of you won't agree with!), there are only two valid answers in this survey: #1 - run in the FULL recovery model and take regular log backups, and #4 - run in the SIMPLE recovery model all the time. The last answer is applicable if you run out of log space even though you're in either of these situations but isn't a general strategy like #1 or #4. IMHO, you should be in one of these two situations and in the rest of this editorial I'll explain why. I'm not going to touch on *why* your transaction log might start to fill up, instead here are some links:

Now for the survey options:

  1. Take regular log backups. I'm very pleased to see the vast majority of respondents doing this, as it's the only valid log size management strategy when in the FULL recovery model (same thing applies to BULK_LOGGED of course, but with a few twists). Once you take that first database backup in the FULL recovery model, SQL Server assumes you're going to manage the transaction log size from that point on by taking log backups. Unfortunately that isn't documented in big, red flashing letters in Books Online - so people can get themselves into troubel inadvertently. Also, the FULL recovery model is the default, and is required for database mirroring - which further adds to the potential for people (such as involuntary DBAs) to accidentally switch into I-will-take-log-backups mode and then not take them. If you don't want to take log backups for recovery purposes, or you don't want to use database mirroring, don't use the FULL recovery model - it's as SIMPLE as that (ha ha). You might argue and say that you're only using FULL because of database mirroring, and don't want to take log backups. I'd argue back and say that if you care enough to have hot standby of your database, you must also take backups - as you can't rely solely on a redundant copy of your database on a different server.
  2. Use BACKUP LOG WITH NO_LOG/TRUNCATE_ONLY regularly. These two commands do basically the same thing - allow the log to be cleared without taking a log backup. What's the point if you're not taking log backups? - just switch to SIMPLE and let the checkpoints clear the log. In fact, in 2008 these two commands have been removed. See my blog post BACKUP LOG WITH NO_LOG - use, abuse, and undocumented trace flags to stop it.
  3. Use BACKUP LOG WITH NO_LOG/TRUNCATE_ONLY when it fills up. Same as above. You might argue that you're only keeping the log around in case there's a disaster, so that you can take a log backup at that point and use it to recover up to the point of the disaster. I'd argue that's broken on two counts: 1) what if the log file is damaged and you can't back it up? 2) that's *all* the transaction log since the last full database backup you took (if you break the log backup chain and then take a full database backup, that backup becomes the base of subsequent log backups) so that may take a long time to restore and replay...
  4. Run in the SIMPLE recovery model all the time. If you don't need to use FULL, don't. Running in SIMPLE is perfectly acceptable, as long as you don't mind losing work in the event of a disaster.
  5. Switch to SIMPLE when it fills up, then back to FULL. This is like #s 2 and 3 - what's the point?
  6. Switch to SIMPLE when it fills up, shrink the log, then switch back to FULL. This is worse than 2, 3, or 5. If you shrink the log, then it's going to grow again - possibly causing VLF fragmentation (see Transaction Log VLFs - too many or too few?), and definitely causing your workload to pause while the log grows, as the log can't use instant initialization (see Search Engine Q&A #24: Why can't the transaction log use instant initialization?).
  7. Shutdown SQL Server and delete the transaction log file(s). Just don't get me started on this one - I'm glad no-one 'fessed up to doing it. There are many reasons why this is daft, including: 1) you have to shutdown to do it, so your workload is off-line 2) if the database wasn't cleanly shut down, it won't be able to be started again without using EMERGENCY mode repair, and your data will be transactionally inconsistent 3) as the log can't be instant initialized, the database won't come online until the log has been created and zero'd. Just don't do this. Ever.
  8. Add more transaction log files. As a general strategy for managing the size of the log, this is not good. Extra log files have zero effect on performance (the myth that SQL Server writes in parallel to the log files is just that - a myth) and make management more tricky. However, if your log fills up for some out-of-the-ordinary reason, then adding another log file may be the only way to allow the database to keep running. If you have to do this though, make sure that you remove the extra log file(s) when you can to keep log management uncomplicated.

To summarize, if you want to be able to take log backups to aid in point-in-time or up-to-the-second recovery, use the FULL recovery model. If not, use SIMPLE and you won't need to mess around with the log when it fills up because you're not taking log backups.

Next post - this week's survey! (And thanks to all those who are responding to them!)

PS For those of you who sent me details about your databases from the survey back at the start of March (see here) - I haven't forgotten. I was waiting to get a decent sample size and now I'm going to go through the data. If you want to send me any more data, you've got until Sunday.

In this week's survey, I'm interested in how *you* manage the size of your transaction log. I'll report on the results around 4/10/2009.

Thanks!

(No comments please... wait till the results post next week) 

Way back at the start of me blogging here I wrote a comprehensive description of ghost records and the ghost cleanup process - see Inside the Storage Engine: Ghost cleanup in depth. A question came up in the class I'm teaching this week that's worth answering in a blog post - do ghost records occur in heaps? The answer is no, not during normal processing.

When snapshot isolation is enabled, deletes from a heap are ghosted, as part of the overall versioning process, which can lead to some interesting side-effects. A versioned record has an extra 14-bytes tagged on the end, so a heap record that suddenly becomes versioned is 14-bytes longer - which may mean it doesn't fit on the page any longer. This could lead to it being moved, resulting in a forwarding/forwarded record pair - just because the record was deleted! Now, the page has to be full for this time happen, and the Storage Engine will take steps to avoid this happening for rows less than 32 bytes long - but that's getting a little too deep.

Anyway, I digress. I want to show you the difference between deleting from a clustered index and from a heap. I'm going to create two such tables, then delete row from each and roll it back.

CREATE TABLE t1 (c1 CHAR (10));
CREATE CLUSTERED INDEX t1c1 on t1 (c1);
GO

CREATE TABLE t2 (c1 CHAR (10));
GO

INSERT INTO t1 VALUES ('PAUL');
INSERT INTO t1 VALUES ('KIMBERLY');

INSERT INTO t2 VALUES ('PAUL');
INSERT INTO t2 VALUES ('KIMBERLY');
GO

-- prevent random background transactions
ALTER DATABASE GhostTest SET AUTO_CREATE_STATISTICS OFF;
GO

BEGIN TRAN DelFromClust;
DELETE FROM t1 WHERE c1='KIMBERLY';
ROLLBACK TRAN;
GO

BEGIN TRAN DelFromHeap;
DELETE FROM t2 WHERE c1='KIMBERLY';
ROLLBACK TRAN;
GO

SELECT * FROM ::fn_dblog (null, null);
GO

Here's a portion of the results from looking in the transaction log. The line of code where I turn off auto-update stats is just to prevent the auto-create transactions from cluttering up my view of the transaction log.

The first (highlighted) transaction is for the delete/rollback in the clustered index. You can clearly see that the third column shows a log context of ghosting for the LOP_DELETE_ROWS log record, plus the setting of the 'this page has at least one ghost record' in the PFS byte for that page.

The second (unhighlighted) transaction is for the delete/rollback in the heap. Here you can see that it just does a straight delete.

If you look at the data page contents before the rollback in both cases, for the clustered index you'll still be able to see the deleted (ghosted) record, and for the heap you'll see the deleted record really is deleted.

Hope this helps.

This is an interesting case that cropped up yesterday - the transaction log is damaged so a log backup doesn't work (with the error below):

Backup detected log corruption in database FakeDBName. Context is FirstSector. LogFile: 2 'F:\SQLLOGS\XYZ\FakeDBName_Log.ldf' VLF SeqNo: x502e VLFBase: x2ce40000 LogBlockOffset: x2d0a9000 SectorStatus: 2 LogBlock.StartLsn.SeqNo: x4ee3 LogBlock.StartLsn.Blk: x1348 Size: x200 PrevSize: x400
2009-03-06 10:00:02.61 Backup      Error: 3041, Severity: 16, State: 1.
2009-03-06 10:00:02.61 Backup      BACKUP failed to complete the command BACKUP LOG FakeDBName Check the backup application log for detailed messages.
2009-03-06 10:00:03.61 Backup      Error: 3041, Severity: 16, State: 1.

However a full backup succeeds, as does a DBCC CHECKDB. What's going on?

The answer comes with understanding what portions of the transaction log are required for these operations. (For a good understanding of the transaction log itself, along with logging and recovery, see my article in the February TechNet Magazine.)

A transaction log backup, by its very nature, has to backup *all* transaction log generated since the last log backup - so it will try to backup the corrupt portion and fail.

A full database backup only has to backup enough transaction log to allow the database to be restored and recovered to a transactionally consistent point. In other words, it only requires the transaction log back to the beginning of the oldest active transaction at the point that the data-reading section of the full backup completes. This is a source of immense confusion - many people don't believe that a full (or differential) backup needs to also backup some transaction log. For a more in-depth study of this, see my previous blog posts Debunking a couple of myths around full database backups and More on how much transaction log a full backup includes.

A DBCC CHECKDB operation uses a database snapshot to get a transactionally consistent view of the database on which to run consistency checks. When the database snapshot is created, crash recovery is run on it to make it transactionally consistent. That requires the same amount of log as if a full backup was taken - back to the beginning of the oldest active transaction at the time the database snapshot is created. See CHECKDB From Every Angle: Complete description of all CHECKDB stages for more info.

So - it's entirely possible for the situation reported above to exist. The question then becomes, how to recover from it?

Assuming that the database files are intact, there is a simple solution. This solution will break the log backup chain, but given that the log is corrupt so a log backup cannot be taken, the log backup chain is *already* broken. Here's what to do:

  1. Stop all user activity in the database
  2. Switch to the SIMPLE recovery model (breaking the log backup chain and removing the requirement that the damaged portion of log must be backed up)
  3. Switch to the FULL recovery model
  4. Take a full database backup (thus starting a new log backup chain)
  5. Start taking log backups

You might want to manually shrink and grow the log file in between steps 2 and 3 too - in case the log file is on a damaged portion of disk - or maybe even shrink it right down and add another log file on an undamaged disk. You also will need to do some root-cause analysis to determine why the corruption occured in the first place, and to take preventative measures to stop it happening again.

Hope this helps

PS In my previous post, Testing a new survey method: backup validation, the answer with the largest number of responses so far is that people never verify their backups - very disturbing!

Back in 2005 Kimberly produced two very popular webcast series - an 11-part webcast series for TechNet called SQL Server 2005 for the IT Professional and a 10-part webcast series for MSDN called A Primer for Proper SQL Server Development. The webcast links and blog posts were broken for quite a while but now they're all fixed up and working again. I've created some web pages that link to all the webcasts and blog posts, along with abstracts. I've also included some more recent ones too and will be adding to the list over the next few weeks.

There's over 30 hours of good stuff to watch - check them out at http://www.sqlskills.com/webcasts.asp

One of the things I mentioned in my recent TechNet Magazine article on Understanding Logging and Recovery was the need to manage the transaction log so the number of VLFs (virtual log files) does not get too large (with too large being more than, say, 100 VLFs). Linchi Shea (a fellow MVP) has just posted some performance numbers comparing insert/update/delete performance between a database with 16 VLFs and one with 20000 VLFs, representing a poorly managed transaction log. You can see his blog post here.

For those wishing to take charge of unruly transaction logs, checkout Kimberly's blog posts - starting with this one. And if you want to quickly know how many VLFs your transaction log has, use the undocumented DBCC LOGINFO command - the number of lines of output is the number of VLFs you have.

Next posts coming up will be photos - we're in Bangkok right now and I've got 3 Where In The World Are Paul and Kimberly posts queued up - St. Lucia, Hyderabad, and Bangkok.

OK - last content post today. I forgot that the February TechNet Magazine also has the latest edition of my regular SQL Q&A column. This month's column covers:

  • Should backup compression be enabled at the instance level?
  • Client redirection during database mirroring failovers
  • Partition-level lock escalation in SQL Server 2008
  • Is it ever safe to rebuild a transaction log?

Check out the column at http://technet.microsoft.com/en-us/magazine/2009.02.sqlqa.aspx

Wow - today is all about new content. As if I haven't already blogged about enough stuff to keep you reading through next week, the February issue of TechNet Magazine is now available and contains a feature article I wrote about understanding how logging and recovery work inside SQL Server.

The article covers:

  • What is logging?
  • What is recovery?
  • The transaction log (include logical and physical architecture)
  • Recovery models and how they affect the behavior of the transaction log

There's also a ten-minute screencast video where I demonstrate a runaway transaction log.

Check it out at http://technet.microsoft.com/en-us/magazine/2009.02.logging.aspx.

Here's a question I got from someone who attended our database maintenance workshop at PASS last week (paraphrased):

I attended your pre-conference session on database maintenance and found it to be very informative.  From what you told use though, I think I need to change my nightly backup procedure.  I like to get my databases back to as small of a size as possible before backing them up, so I run the following commands to do this before taking the full database backup.  Could you help me with a better way of doing this? We're on SQL Server 2005.

BACKUP LOG <mydbname> WITH NO_LOG

DBCC SHRINKDATABASE (<mydbname>)

And here's the answer I sent back:

How large is the database? And how long must you keep the backups around? If the cumulative size of the backups takes up a large proportion of your available storage space (and we're talking more than just a single direct-attached 100+GB drive), then it may be worth compressing the backups - otherwise you're likely causing yourself more trouble than its worth.
 
By doing BACKUP LOG WITH NO_LOG you're effectively throwing away log records and removing the possibility of doing any kind of point-in-time, or up-to-the-second recovery (see BACKUP LOG WITH NO_LOG - use, abuse, and undocumented trace flags to stop it). If you're running in the FULL recovery model, and you don't care about either of these features, then you should switch to the SIMPLE recovery model. If you really want to be in FULL, don't ever use WITH NO_LOG.

The amount of transaction log that a full backup requires cannot be changed by you truncating the log. The full backup will backup any log it requires to enable the restored database to be a transctionally consistent copy of the database. See Debunking a couple of myths around full database backups and More on how much transaction log a full backup includes.

Doing a DBCC SHRINKDATABASE (the same exact operation as a database auto-shrink) will cause massive index fragmentation, and cause file-system fragmentation of the data files, as they will likely need to grow again after you've squeezed all the space out of them. See Auto-shrink - turn it OFF! for more details on the effects.

If you're really concerned about backup sizes and space is at a premium, I recommend using a 3rd-party backup compression tool such as LiteSpeed or HyperBac so you're not affecting the actual database. Remember also that SQL Server 2008 has native backup compression too - see my blog post here for more details.

Hope this helps

Today's post is based on a bunch of questions I've had around the various compression features in SQL Server 2008.

Does turning on data compression or backup compression compress the transaction log files on disk?

No. The transaction log is not compressed in any way under any circumstances. Rows from tables and indexes that have compression enabled will be logged in their row compressed forms, even if page compression is enabled.

Does data compression compress LOB columns?

No. There is no native compression support for any LOB columns (n/text, image, n/varchar(max), varbinary(max), XML), whether stored in-row or out-of-row. There's also no native compression support for FILESTREAM data.

Does log shipping use compression to compress the logs being shipped?

Log shipping does not ship transaction logs - it ships log *backups*. If backup compression is enabled for the instance hosting the log shipping primary database, or the log shipping job is changed to enable backup compression, then the log backups will be compressed and less data will be sent over the wire to the log shipping secondary(s).

Is backup compression the same as log stream compression with database mirroring?

No. Backup compression compresses backups (see my previous blog post here). Log stream compression with database mirroring compresses transaction log records before sending them between the principal and the mirror (see my previous blog post here).

Should I just turn on backup compression at the instance level?

Not necessarily. It depends whether the majority of database on the instance will benefit from backup compression. Backup compression (and any compression algorithm) uses CPU whether a decent compression ratio is achieved or not. Check what compression ratio is achieved first and then enable backup compression if its worth it. Otherwise, just enable it for individual databases.

Does data compression use the same algorithm as backup compression?

No. Backup compression uses a proprietary block-based compression algorithm that is part of Windows. Data compression uses up to 3 algorithms, depending on the level of compression configured. Row compression just makes all non-LOB columns into variable-length columns. Page compression does row compression, then common-prefix compression for each table columns, then common-value dictionary compression for each page. Details can be found at the following BOL sections: Row Compression Implementation  and Page Compression Implementation.

Hope this helps!

During early SQL Server 2008 development, the Storage Engine team introduced additional minimally-logged functionality. And then it was removed again for RTM. Now it seems that they've bowed under customer pressure and have made the functionality available again (see Sunil's announcement here).

Using trace flag 610 in the RTM build, you enable the potential for minimal-logging when:

  • Bulk loading into an empty clustered index, with no nonclustered indexes
  • Bulk loading into a non-empty heap, with no nonclustered indexes

Sunil's previous blog post here gives more info on the required syntax - very useful!

It's been almost two weeks since my last post as we've been offline in Scotland (another photo post to follow - I owe you two now...) but now we're back for the crazy Fall conference and teaching season.

The latest installment of my regular Q&A column in TechNet Magazine is available at http://technet.microsoft.com/en-us/magazine/cc895648.aspx. This month I cover the following topics:

  • How backups and restores work and why the times for each may differ
  • The difference between log shipping and database mirroring around BULK_LOGGED operations
  • How other factors apart from log backups can contribute to excessively large transaction logs
  • Why database repair exists and why it shouldn't be used
  • Tracking index usage with sys.dm_db_index_usage_stats

Enjoy!

Now this one's sure to spark some controversy...

I was checking my RSS feeds of the SQL blogs that I follow and noticed an interesting post on Kalen's blog (the post is here). She'd been lambasted for suggesting in the weekly SQL Server Magazine newsletter that *Microsoft* shouldn't provide a log reader tool and asked for comments. It's worth reading her original newsletter too, as it provides some interesting history of the development of log readers.

Donning my flame-proof suit, I wholeheartedly agree. I don't think Microsoft should spend engineering resources on a log reader tool that tries to show the SQL that generated the log records. So why shouldn't Microsoft build one? Well, IMHO, in a perfect world with no mistakes and perfect applications, there really isn't a need for a log reader tool. In an imperfect world, there is a need - but should Microsoft be the one to provide it? I think there are way more important tools and features that Microsoft should spend engineering resources on.

So why do people want a log reader tool, apart from curiosity? The three uses I see are change tracking, auditing, and the ability to rollback mistakes.

The first use, change tracking, is viable, and in fact the change data capture feature in 2008 is built on top of the transactional replication log reader Agent job (I'll post more on this, and I've just written an article on tracking changes in 2008 for the November issue of TechNet Magazine).

For auditing, how can a log reader tell whether the SQL statement was being run under a different security context, such as after an EXECUTE AS statement? How can it tell the difference between a single statement UPDATE with a multi-part WHERE clause, and multiple UPDATE statements of single rows? And on top of that, it needs to read through all the transaction log, causing contention on the log drive. In 2008 there's an in-built, synchronous auditing solution (SQL Server Audit), although it has issues with parameterized queries. In 2005, you could roll your own auditing by having all DML done through stored-procs that log what they did, for instance, or using DML triggers.

For the ability to rollback mistakes... don't get me started! Recovering from user mistakes is not a situation you want to be in - you can setup your system to avoid mistakes altogether. DDL triggers, DML triggers, schema separation, no direct access to tables for DML, and so on. Any of these are better to do than having to figure out what someone did so you can undo a mistake. Ok - so people make mistakes and you'd like to rollback one statement. How about using your backups? Oh, you don't have a good backup strategy. Well, that's where a log reader can help, if it works. But should Microsoft have to provide it?

Yes, I know the alternatives I mentioned above are sometimes easier said than done, especially with 3rd-party applications, but that's for the application writers to fix. Microsoft shouldn't have to provide a tool because of broken applications, or 3rd-party license agreements that preclude adding triggers, or DBAs that haven't implemented safe-guards. It already provides features that can stop mistakes happening, and allow auditing to happen. Why provide a log reader tool that allows other companies to then produce tools on top of that which do what SQL Server already provides out-of-the-box? And why provide a log reader at all when other companies do it already?

If anything, I'd like to see the existing tools be made to work in all circumstances. AFAIK there isn't a log reader tool on the market today that copes with absolutely everything 2005 can put into the log. Although the log internals are supposedly proprietary, there's an internals document that Microsoft licenses for free (at least when I was on the team until last summer) to companies wishing to build such tools, and they're not *secret*. There's plenty of info about the log internals on the web (some provided by me) and you can poke about to your heart's content using the undocumented tools (that's what DBCC LOG and fn_dblog are for).

However, what I'd *really* like to see is the need for a log reader tool to slowly die away as more devs and DBAs are educated and implement techniques for preventing the problems that log reader tools help to rectify (sometimes). One thing I haven't mentioned above is to have a log-shipping secondary with a load-delay - that way you have a redundant copy of the data that's always several hours behind your primary system. Or even using regular database snapshots.

Ok - that was a bit of a rant, and this is the same view when I worked for Microsoft too. To summarize, I don't think Microsoft should provide a log-reader tool. I see the need for them, when a system isn't setup to prevent mistakes happening, and there's no good backup or redundancy strategy, but I think that need can be filled by 3rd-party vendors.

Happy to hear well-thought-out arguments on this either way, either privately or as comments.

I woke up this morning and someone had replaced my wife with someone who likes to blog :-). Kimberly's turned over a new leaf and is going to blog much more often - in fact she's blogged 4 times today already. Check out her blog here.

Continuing on the transaction log theme of the last few Search Engine Q&A posts, this one addresses a question I've heard a few times, most recently on an MVP discussion group. Let me paraphrase:

If I have a transaction that inserts a huge amount of data, the transaction log grows to 50-GB. I then rollback the transaction. When I take the next log backup, it's way smaller than 50-GB. What's going on?

Let's see if we can repro the scenario. I've created a database with a 500-MB data file and a 1-MB log file, with 100-MB and 1-MB auto-growth intervals. I want the log to be as small as possible and to grow in small chunks so I can see just how much it *has* to grow by, rather than having a large growth size. Then I set the recovery mode to full and took a database backup to make sure the log won't truncate until it's backed up.

CREATE DATABASE LogSizeTest ON
   
(NAME = N'LogSizeTest',
   
FILENAME = N'C:\SQLskills\LogSizeTest.mdf',
   
SIZE = 512MB,
   
FILEGROWTH = 100MB)
LOG ON 
   
(NAME = N'LogSizeTest_log',
   
FILENAME = N'C:\SQLskills\LogSizeTest_log.ldf',
   SIZE = 1MB,
   
FILEGROWTH = 1MB);
GO

ALTER DATABASE LogSizeTest SET RECOVERY FULL;
GO

BACKUP DATABASE LogSizeTest TO DISK = 'C:\SQLskills\LogSizeTest.bak';
GO

Let's check the size of the log:

DBCC SQLPERF (LOGSPACE);
GO

Database Name  Log Size (MB) Log Space Used (%) Status
-------------- ------------- ------------------ -----------
LogSizeTest    0.9921875     36.66339           0

This gives back info for all databases, I've trimmed down the output just for the LogSizeTest database.

Now I'm going to create a table, start an explicit transaction and add about 500-MB of info to the table.

USE LogSizeTest;
GO
SET NOCOUNT ON;
GO
CREATE
TABLE Test (c1 INT IDENTITY, C2 CHAR (8000) DEFAULT (REPLICATE ('a', 8000)));
GO

BEGIN TRAN;
GO

DECLARE @count INT;
SELECT @count = 0;
WHILE (@count < 64000)
BEGIN
   
INSERT INTO Test DEFAULT VALUES;
   
SELECT @count = @count + 1;
END;
GO

Checking the log file size again gives:

DBCC SQLPERF (LOGSPACE);
GO

Database Name  Log Size (MB) Log Space Used (%) Status
-------------- ------------- ------------------ -----------
LogSizeTest    703.9922      99.98737           0

The log size has grown to about 700-MB, way more than the size of the data I was inserting, and it's completely full. Now let's rollback the transaction and check the log size again.

ROLLBACK TRAN;
GO

DBCC SQLPERF (LOGSPACE);
GO

Database Name  Log Size (MB) Log Space Used (%) Status
-------------- ------------- ------------------ -----------
LogSizeTest    703.9922      85.21268           0

The size of the log file is the same, but the percentage used has actually gone down! How can that happen? Let's take a backup and checkout it's size:

BACKUP LOG LogSizeTest TO DISK = 'C:\SQLskills\LogSizeTest_log.bck';
GO
RESTORE HEADERONLY FROM DISK = 'C:\SQLskills\LogSizeTest_log.bck';
GO

The BackupSize in the output from the RESTORE HEADERONLY is 631454208, which is 602.2-MB. Taking the numbers from the DBCC SQLPERF output above, 85.21268% of 703.9922-MB is 599.89-MB - so the backup is roughly the same size as the used transaction log. That's what I'd expect, but why is it smaller than the total size of the transaction log?

So what's going on? Why did the transaction log need to grow so much larger than it needed to, and why did the percentage used actually *drop* after the transaction rolled back?

The answer is in the way the transaction log works. Whenever a logged operation occurs in a transaction, there is some transaction log space reserved in case the transaction rolls back. The idea is that there's always enough space available in the transaction log for a transaction to roll back, without having to grow the transaction log and potentially have that fail. If a transaction could not roll back successfully because the log didn't have enough space, the database would become transactionally inconsistent, would be taken offline and the state changed to SUSPECT.

The behavior we saw was the Storage Engine reserving transaction log space for a potential roll back. When the roll back occured, the transaction log records necessary to undo the effects of the transaction (called compensation log records) are created and written to the log. The issue is that they usually don't take up as much space as the Storage Engine reserved, as it tends to be very conservative in its estimates of how much log space to reserve, to avoid the potential for SUSPECT databases. This explains the difference between the various sizes and percentages we saw above.

The Storage Engine code to do the reservations is quite interesting - I remember fixing a couple of bugs in it during SQL Server 2000 development in 1999 while I was getting to know the internals of the logging and recovery system before tackling some of the (since removed) log-reading code in DBCC CHECKDB in SQL Server 2000.

Anyway, there you have it. Log space reservation is the answer, and is also one of the reasons why it can be tricky to estimate how large a transaction log should be when a database is created.

This is a quickie in response to a blog comment from my previous post on instant initialization - How to tell if you have instant initialization enabled?. The comment was:

I must say, I love instant initialization. It's awesome. But I always wondered why it's not available for the log file. I assume there's a technical reason... but what is it? Does it depend on having the rest of the file be zeroed out? Doesn't it already know where it's start and stop points are anyways, since the log is circular?

I couldn't remember the exact answer so I discussed with Peter Byrne on the Storage Engine dev team and now I have the answer to share. There is a lot of metadata kicking around in the Storage Engine about the transaction log (mostly in the boot page - see my post Search Engine Q&A #20: Boot pages, and boot page corruption), including where to start reading the log during crash recovery. However, there's nothing that can be used after a crash occurs to determine where the active transaction log ends (i.e. where should crash recovery stop processing log records).

The way this is done is to have each log sector have parity bits stamped on it. When the log is first created, it is zero-initialized (with zero being an illegal log sector parity value). As the log is written, each sector has parity bits in it. When the end of the log is reached, and it wraps around to the start of the log file, the parity bits are flipped, so that overwritten log sectors have the opposite parity from when they were last written. When a crash occurs, log sectors are read and processed until a log sector with an out-of-sequence parity is found.

This entire process will not work if there's already random data in the space used by the log file - some of the random data could just look like a valid set of parity bits and cause the recovery system to try to process a log sector full of garbage, leading to a suspect database, at best.

So - it's not just a "there wasn't time" - there really is a good, architectural reason why instant initialization cannot be done with the transaction log.

There's been a spate of problems on the forums and in my inbox with people's transaction logs filling up. With the increase in "involuntary DBAs", there are more people being thrust into the role of database administration without being given the chance to learn the skills, tips, and tricks necessary. One of the most common problems I see is the tranaction log filling up and the database grinding to a halt, and the most common cause of this is the lack of log backups when in the FULL recovery mode.

The very first post in my Search Engine Q&A series last September dealt with this problem - and showed two examples of filling up a transaction log plus the command to see why a transaction log is full (examining the log_reuse_wait_desc column in master.sys.databases). See here for more details. One thing I don't go into too much detail about in that post is the way that transactional replication or database mirroring can cause the log to grow. Both of these technologies rely on reading the transaction log for an unbroken stream of changes to be sent to the secondary systems (but with slightly different mechanisms) - but both will cause the log to grow if there is transaction log that hasn't yet been read (usually because of a performance issue). The same thing applies to the new Change Data Capture technology in SQL Server 2008 - it uses the transactional replication log reader to harvest changes from the log. The addition of any of these technologies could cause your log to grow excessively.

So it's grown too big - now what? Let's deal with the case where you've taken a full database backup in the FULL recovery mode but haven't taken any log backups. There are a number of options:

  1. Take a log backup! This might be easier said than done if you're like the customer I once had who had a 10GB database with an (I kid you not) 987GB log file.
  2. Switch to the SIMPLE recovery model (either permanantly or temporarily). If you're not interested in point-in-time recovery, there's no need for you to be in the FULL recovery mode (unless you're using a feature that requires it, like database mirroring, in which case you have no choice but to start taking log backups)
  3. In SQL Server 2005 and before, you can manually throw away the log contents - see my blog post here. This is not advisable.
  4. Grow the log file. This really just addresses the symptom but not the cause - the log will fill up again eventually.

Note that options #1-3 will just remove the restriction that the log contents are still needed (called "clearing" or "truncating" the log). None of these will cause the log file to shrink. The ONLY way to change the size of the log file is to use DBCC SHRINKFILE. After doing one of #1-3, you must manually shrink the log file using DBCC SHRINKFILE. If you do #2, and then go back into the FULL recovery mode, you should immediately take a full database backup. After shrinking the log file, you should manually grow it again using ALTER DATABASE so that it doesn't suffer repeated auto-grows and lead to performance issues with the log (see Kimberly's post here).

This topic is really a two-hour lecture in the database maintenance class I teach, but at least here (and with the blog post links) I've given you a bit of insight into why it can happen and how to solve the problem.

Over the last few weeks I've seen (and helped correct) quite a few myths and misconceptions about index rebuild operations. There's enough now to make it worthwhile doing a blog post (and it's too hot here in Orlando for us to go sit by the pool so we're both sitting here blogging)...

Myth 1:  index rebuild pre-allocates the necessary space

This myth has two variations:

  1. The space for the new copy of the index is pre-allocated
  2. The space for the sort portion of the rebuild is pre-allocated

Neither of these are true. Index rebuild (whether online or offline, and at least as far back as 7.0) will create a new copy of the index before dropping the old copy. The pages and extents required to do this will always be allocated as needed, as with any other operation in SQL Server. The sort phase of an index rebuild, if required (in certain cases it is skipped in 2005), will adhere to the same allocation behavior.

Myth 2: indexes are rebuilt within a single file in a multi-file filegroup

This is a new one that I just heard yesterday - (paraphrasing) "In a two-file filegroup, an index in file 1 will be rebuilt into file 2. The next time it is rebuilt, it will be built in file 1. And so on".

This is untrue. Any time any allocations are done in a multi-file filegroup, the allocations are spread amongst all the files using the allocation system's proportional fill algorithm. In a nutshell, this says that space will be allocated more frequently from larger files with more free space than from smaller files with less free space. There is no concept in SQL Server of limiting allocations to a particular file in a multi-file filegroup.

Myth 3: non-clustered indexes are always rebuilt when a clustered index is rebuilt

This is untrue. The rules are a little complex here but can be summed up as follows:

  • In 2005+, rebuilding a unique or non-unique clustered index (without changing its definition) will NOT rebuild the non-clustered indexes
  • In 2000:
    • Rebuilding a non-unique clustered index WILL rebuild the non-clustered indexes
    • Rebuilding a unique clustered index will NOT rebuild the non-clustered indexes

The first few service packs of 2000 had bugs that changed the behavior of rebuilding unique clustered indexes back and forth - this is the source of much of the confusion around this myth.

For a much more detailed discussion of this, see my blog post from last Fall - Indexes From Every Angle: What happens to non-clustered indexes when the table structure is changed?.

Myth 4: BULK_LOGGED recovery mode decreases the size of the transaction log and log backups for an index rebuild

This myth is partly true.

Switching to the BULK_LOGGED recovery mode while doing an index rebuild operation WILL reduce the amount of transaction log generated, which is very useful for limiting the size of the transaction log file (note I say 'file', not 'files' - you only need one log file).

Switching to the BULK_LOGGED recovery mode while doing an index rebuild will NOT reduce the size of the transaction log BACKUP. Although the operation will be minimally-logged, the next transaction log backup will read all the transaction log since the last backup plus all the extents that were changed by the minimally-logged index rebuild. This will result in a log backup that's almost exactly the same size as for a fully-logged index rebuild. The ONLY time a log backup will contain data extents is when a minimally-logged operation has taken place since the last log backup - see here on MSDN for more info.

If you're considering using the BULK_LOGGED recovery mode, beware that you lose the ability to do point-in-time recovery to ANY point covered by a transaction log backup that contains even a single minimally-logged operation. Make sure that there's nothing else happening in the database that you may need to effectively roll-back with P.I.T. recovery. The operations you should perform if you're going to do this are:

  • In FULL recovery mode, take log backup immediately before switching to BULK_LOGGED
  • Switch to BULK_LOGGED and do the index rebuild
  • Switch back to FULL and immediately take a log backup

This limits the time period in which you can't do P.I.T. recovery.

Myth 5: online index rebuild doesn't take any locks

This myth is untrue. The 'online' in 'online index operations' is a bit of a misnomer.  Online index operations need to take two very short-term table locks. An S (Shared) table lock at the start of the operation to force all write plans that could touch the index to recompile, and a SCH-M (Schema-Modification - think of it as an Exclusive) table lock at the end of operation to force all read and write plans that could touch the index to recompile.

The most recent time this came up on the forums was someone noticing insert queries timing out after an online index rebuild operation had just started. The problem is that the  table lock that online index rebuild needs has to be entered into the grant queue in the lock manager until it can be acquired - and it will stay there until existing transactions that are holding conflicting locks either commit or roll-back. Any transaction that requires a conflicting lock AFTER the index rebuild lock has been queued but not acquired (and then released) will wait behind it in the lock grant queue. If the query timeout is reached before the transaction can get it's lock, it will timeout.

This is still much better than the table lock being held for the entire duration of the index rebuild operation. For more info, checkout this whitepaper on Online Index Operations in SQL Server 2005.

In a previous post (see here) I debunked a myth about how much transaction log a full backup would include. I had a question in the blog post comments that asked (paraphrasing):

The full backup has to include all the transaction log from the begin LSN of the oldest active transaction at the time the data read portion of the backup ends, until the LSN at which the data read portion ends. If that begin LSN is later in time than the LSN of the checkpoint that backup does initially, why does the full backup need to include all thr transaction log between the checkpoint and the begin LSN? What is it used for?

I replied in the comments with a quip that it would be easier to reply with a whiteboard and a timeline - so I got all enthusiastic and created a picture in Powerpoint to help explain better.

Consider the timeline in the picture above for a full backup (the red numbers match the list below):

  1. The backup operation take a checkpoint to force all dirty pages in the buffer pool to disk - both those containing changes from transactions that have committed and those containing changes from transactions that are still in-flight. The backup operation then starts reading the allocated pages in the database.
  2. The read operation reads page X
  3. Transaction A starts
  4. Transaction A makes a change to page X. The copy in the backup is now out-of-date. Note that the backup will not read page X again - it's already passed that point in the database.
  5. Transaction B starts. It won't complete before the data read operation completes so it's begin LSN is the oldest active transaction begin LSN.
  6. Transaction A commits. This commits the changes to page X.
  7. The backup data read operation completes and transaction log reading starts.

Now, the reason that the transaction log is read is so that the restore operation can recover the database so it is transactionally consistent as of the point in time when the read data operation completed.

If the transaction log was only included from the oldest active transaction begin LSN (point 5), then the copy of page X that was restored from the backup (read at point 2) would not be updated with the changes from transaction A (that happened at point 4). This means that it would not be transactionally consistent with the rest of the database as of the time the read data operation completed (point 7).

So, (ignoring replication) the minimum LSN of the transaction log that's included in the full backup is MIN (LSN of last checkpoint, LSN of oldest active transaction). This ensures that recovery can REDO log records to bring pages up-to-date and UNDO log records for transactions that had not committed.

Much easier to explain with aid of a picture than without! :-)

A couple more questions from the last couple of classes.

Q1) Why doesn't performing an index rebuild alter the fragmentation?

A1) Here are the possibilities - all of which I've seen happen:

  • There isn't an index - either DBCC DBREINDEX or ALTER INDEX ... REBUILD are being run on a table that only has a heap, and so the (extent) fragmentation of the heap isn't changing because there's no way to rebuild a heap (except by the heavily NOT recommended method of creating and then dropping a clustered index).
  • The index is too small. An index with only a handful of pages may not show any changes in fragmentation because all the pages are single, mixed pages (see my previous post on extent types for more info) and so rebuilding the index does nothing.
  • The workload and schema are such that by the time the rebuild has finished and the fragmentation calculation has been done again, the index is already getting fragmented again.
  • The Extent Scan Fragmentation result from DBCC SHOWCONTIG is being used to gauge fragmentation for an index stored in a filegroup with multiple files. The Extent Scan Fragmentation in DBCC SHOWCONTIG does not cope with multiple files (as is documented in Books Online) and so the value may even go UP in some cases!

Q2) What operations take advantage of minimal-logging when the recovery mode is BULK_LOGGED?

A2) The list is very small - 4 four classes of operations:

  • Index builds, rebuilds, or drop of a clustered index (NOT index defrags with DBCC INDEXDEFRAG or ALTER INDEX ... REORGANIZE - this is a common misconception).
  • Bulk load operations (i.e. BCP, INSERT ... SELECT * FROM OPENROWSET (BULK...), and BULK INSERT).
  • Insert or appends of LOB data (either using WRITETEXT/UPDATETEXT for TEXT/NTEXT/IMAGE data types, or UPDATE with a .WRITE clause).
  • SELECT INTO operations on permanent tables.

For these operations, only the allocations are logged in the transaction log. Any extents that are allocated and changed through a minimally-logged operation are marked in the ML bitmaps (one for every 4GB of each file) and then the next transaction log backup will also read all those extents and include them in the backup.

Quick post this morning before this gets knocked out of my head by the Disaster Recovery session we're doing today at SQL Connections. This came up yesterday for a few people both on 2000 and 2005 - the database is running in SIMPLE recovery mode but the log isn't getting cleared as it usually does with checkpoints. There are two possibilities here - one for 2000 and one for both 2000 and 2005.

Very large VLFs

If the transaction log is comprised of very large (multi-GB) virtual log files, then a checkpoint operation may not clear out all the active log from the VLF - and so the log may not clear as frequently as when its comprised of more, smaller VLFs. Kimberly's blog post yesterday touched on this plus a solution.

SQL 2000 checkpoint bug

This is one I hadn't heard of. There's a bug in 2000 SP4 that stops checkpoint working properly on some configurations and so the log file can grow until it runs out of space. A few people in yesterdays Database Maintenance class were hitting this. Searching through the Knowledge Base articles turned up KB 909369 which explains the issue and has a pointer to the hotfix.

Hope this helps some of you out.

Categories:
Transaction Log

I've been involved in a few conversations today that have highlighted some big misconceptions about how backups work. I'd like to use this blog post to debunk them. I checked everything I say here with my friend Steve Schmidt, the developer on the Storage Engine team responsible for BACKUP/RESTORE for the last ten years.

Myth 1: A full database backup only contains the transaction log from the start of the backup to the end of the backup

When you restore a full database backup, you get a transactionally consistent database. Consider the case where there's an active transaction that doesn't commit until after the backup completes. If the backup only contained the log that occured while the database was being backed up, how would it roll back the active transaction. It *has* to include enough transaction log to roll back the active transaction. The start LSN of the log included in a database backup is the minimum of:

  • LSN of the last checkpoint
  • LSN of the start of the oldest active transaction
  • LSN of the last replicated transaction

Let me prove it to you. I'm going to create a database, put it into FULL recovery mode, start a transaction, checkpoint, and then take a backup. The checkpoint ensures the page I've altered is flushed to disk.

CREATE DATABASE stopattest;

GO

ALTER DATABASE stopattest SET RECOVERY FULL;

GO

BACKUP DATABASE stopattest TO DISK = 'c:\sqlskills\stopattest.bck' WITH INIT;

GO

USE stopattest;

GO

CREATE TABLE t1 (c1 INT);

GO

BEGIN TRAN;

INSERT INTO t1 VALUES (1);

GO

Now in another connection I'll take another full database backup.

BACKUP DATABASE stopattest TO DISK = 'c:\sqlskills\stopattest.bck' WITH INIT;

GO

The msdb.dbo.backupmedia table will tell us the relevant LSNs in the backup (I added some spaces to delineate the prts of the LSN for clarity):

SELECT last_lsn, checkpoint_lsn, database_backup_lsn FROM msdb.dbo.backupset

WHERE database_name = 'stopattest';

GO

last_lsn              checkpoint_lsn        database_backup_lsn
--------------------- --------------------- ---------------------
21 0000000190 00001   21 0000000174 00037    21 0000000058 00037

[Edit: After swapping some email with Kalen Delaney, I realized that when I originally put this together I had more log records in the post and when I removed them I messed up the description of the (21:174:37) LSN - its now corrected below)

So you can see the checkpoint that begins the backup was at (21:174:37). The LSN of the first log record that the backup contains is (21:58:37), which is before the start of the backup. And the backup contains all the log from then until (21:190:1). Now let's look at the actual transaction log to see what these LSNs correspond to.

SELECT [Current LSN], Operation, [Transaction Name] FROM fn_dblog (null,null);

GO

Here's some selected output:

Current LSN              Operation        Transaction Name
------------------------ ---------------- ------------------
00000015:0000003a:0025   LOP_BEGIN_CKPT   NULL
                (this is the calculated minimum LSN the backup must contain (21:58:37) - which is (15:3a:25) in hex)
.
.
00000015:00000061:0001   LOP_BEGIN_XACT   user_transaction
                (here's my transaction starting - before the backup started but within the LSN range contained in the backup)
.
.
00000015:000000ab:0004   LOP_BEGIN_XACT   Backup:InvalidateDiffMaps
                (this is the backup clearing the differential bitmaps)
.
.
00000015:000000ae:0025  LOP_BEGIN_CKPT    NULL
                (this is the checkpoint that BACKUP does - matching the checkpoint LSN above)
.
.

So - this clearly shows that the backup contains more than just the log from the time the backup was running.

Myth 2: It's possible to do a STOPAT with only a full database backup

This myth is that its possible to use STOPAT with a full database backup to stop during the time the backup was being taken. The argument FOR this myth is that the backup contains the log for all the changes that happened while the backup was being taken, so it must be possible to stop at any point in time. Technically, that's correct, but in practice it's wrong - you cannot stop at a point while the backup was running, using only the database backup.

This one's more complicated to explain. Doing a STOPAT operation means getting the database to a state where operations later than the time or LSN specified in the STOPAT clause haven't affected the database yet. A database backup reads pages that may or may not have been changed while the backup was running. If they are changed, it could be at any point while the backup is running.

Consider the case where page X is changed at LSN (10:45:12), *just* before the backup completes and is read by the backup at the time equivalent to LSN (10:45:13). The backup will contain the changed page image, plus the log record for the change. What if I want to stop at a point while the backup was running but *before* the change to page X, say at LSN (10:44:00). The backup only contains the image of page X at LSN (10:45:12) - how can it be put back to the image at the time we want to stop at? The argument is that we have the log record for the change - can't SQL Server just undo it?

No. It won't even see it. STOPAT works by recovering the database up to the point that the STOPAT specified. If we ask to stop at LSN (10:44:00), then the log will only be read and recovered up to that point. However, because the database backup didn't read page X until LSN (10:45:13), it only has the image of it from when it was altered at (10:45:12). This clearly won't give a database image as of (10:44:00).

The only way to stop at a particular time/LSN, is to have images of *all* database pages from before that time/LSN (i.e. from the *previous* database backup) and then restore all the transaction logs up to and including the time/LSN to stop at.

Hopefully that makes sense.

There's been some discussion over on the SQL Server Central forums about database mirroring performance, and one question in particular on how to do index rebuilds on mirrored VLDBs.

Remember that in database mirroring, the database has to be in the full recovery mode, so all index rebuilds are fully logged. It all comes down to the amount of transaction log generated and whether this causes a problem.

  • In synchronous mirroring, the additional log being generated could overload the network link between the principal and mirror. This will slow down the hardening of regular transactions in the mirror's transaction log, and thus led to a decrease in transaction throughput on the principal.
  • In asynchronous mirroring, the additional log being generated could again overload the network link - but this time there's no requirement for transactions to harden in the mirror before they can commit on the principal, so instead the SEND queue on the principal gets really large. Now, in asynchronous mirroring the SEND queue represents the amount of work that will be lost if a failover occurs, as its all the transaction log that hasn't yet been sent to the mirror. So, in asynchronous mode, a large index rebuild operation could lead to increase exposure to data loss in the event of a failover.

In SS2008, the log stream compression I blogged about here should go a long way to alleviating this problem. There are three ways I can think of to reduce the amount of transaction log generated by and index maintenance plan in SS2005:

  1. Use a potentially less expensive (in terms of logging) solution for removing index fragmentation. Doing an index rebuild is guaranteed to generate an equivalent amount of transaction log to the size of the index being rebuilt, no matter how much fragmentation there is - because an index rebuild always rebuilds the entire index. The alternative is to do an index reorganize, either using my old DBCC INDEXDEFRAG or the new ALTER INDEX ... REORGANIZE. These will only generate transaction log when index pages are compacted and reorganized - so for less heavily fragmented indexes. There's no hard and fast rule here but I generally say where Logical Scan Fragmentation/Average Fragmentation in Percent from DBCC SHOWCONTIG/sys.dm_db_index_physical_stats, respectively, is less than 30%. You also need to consider page density too - but really this is a topic for a whole other post.
  2. Be very selective on which indexes you choose to rebuild/reorganize. A lot of people have a maintenance plan that does this for every index every week, without checking whether the index is even fragmented or whether removing fragmentation for an index improves workload performance. Again, this a whole other topic but there is an old whitepaper for SS2000 I helped with that describes some of this - Microsoft SQL Server 2000 Index Defragmentation Best Practices.
  3. Partition the tables/indexes so that the changing portion of the data is the only portion that's affected by index maintenance. If most of your data is read-only, there's no point in having it included in reindex/reorganize operations, right? Kimberly recently wrote a blog post about such an architecture here.

For more info on database mirroring performance considerations, checkout the whitepaper Database Mirroring: Best Practices and Performance Considerations. There's also a slide deck presentation based on this whitepaper that's been presented at various TechEds (I did it in China and Hong Kong last year and Kimberly did it in South Africa last year) - you can download it from the Hong Kong website here.

The bottom line is when mirroring is in the mix, you need to be more intentional with your database maintenance.

This has come up several times over the last few days, and is something that Kimberly and I discuss (and deplore) when we lecture - the use of BACKUP LOG WITH NO_LOG (or TRUNCATE_ONLY - they're synonymous) to allow log truncation.

How is it used?

The common use is when the transaction log grows to be inordinately large on a database that's in FULL (or BULK_LOGGED) recovery mode. This happens after a database backup has been taken, which switches the log into a mode where it won't truncate until its been backed up. In these circumstances, if you don't take a transaction log backup, the log will continue to grow until it runs out of disk space.

The alternative to taking a real log backup is to issue a BACKUP LOG dbname WITH NO_LOG command. Let's see this in action - note that I'm not advocating its use but I want to show you what it does. First off I'll create some transaction log after taking a full database backup:

USE nologtest;
GO

BACKUP DATABASE nologtest TO DISK = 'c:\sqlskills\nologtest.bck' WITH INIT;
GO

CREATE TABLE t1 (c1 INT, c2 VARCHAR (8000))
GO

SET NOCOUNT ON
GO

DECLARE @a INT;
SELECT @a = 1;
WHILE (@a < 10000)
BEGIN
   
INSERT INTO t1 VALUES (@a, REPLICATE ('a', 8000));
   
SELECT @a = @a + 1;
END;
GO

How large is the transaction log now?

SELECT name, size FROM sys.database_files;
GO

name            size
--------------- --------
nologtest       90264
nologtest_log   104128

A little bit larger than the data file (which is what I'd expect after the operation I just performed) and they're both around 100MB. I'll pretend that I haven't been paying attention to the size of the database and log and now I don't have any space to perform a backup. Can I just shrink the log?

DBCC SHRINKFILE (nologtest_log, 2);
GO

Cannot shrink log file 2 (nologtest_log) because all logical log files are in use.
DbId   FileId      CurrentSize MinimumSize UsedPages   EstimatedPages
------ ----------- ----------- ----------- ----------- --------------
11     2           102944      63          102944      56

No. What's stopping me (well DBCC SHRINKFILE just told me, but let's double-check)?

SELECT log_reuse_wait_desc FROM sys.databases WHERE NAME = 'nologtest';
GO

LOG_BACKUP

Ok - so I can't perform a backup so I'll use BACKUP LOG WITH NO_LOG.

BACKUP LOG nologtest WITH NO_LOG;
GO

SELECT log_reuse_wait_desc FROM sys.databases WHERE NAME = 'nologtest';
GO

NOTHING

Now it looks like I can do the shrink:

DBCC SHRINKFILE (nologtest_log, 2);
GO

DbId   FileId      CurrentSize MinimumSize UsedPages   EstimatedPages
------ ----------- ----------- ----------- ----------- --------------
11     2           256         63          256         56

Success!

Hold on, are you sure?

Why is it bad?

Do you realize what just happened? We discarded the contents of the transaction log since the last full/differential/log backup. That means that until the next full or differential backup completes, all work since the last backup completed will be lost if the database is destroyed. So if your business relies on not losing data/work in the event of a disaster - then what I just did is sacrilege!

The whole point of FULL (and BULK_LOGGED) recovery modes is to preserve the transaction log so that the database can be recovered right up to the point of a failure, or to do point-in-time restores. Using BACKUP LOG WITH NO_LOG negates all of that. (For completeness, note that you can't do a point-in-time restore to any time in a transaction log backup containing a bulk-logged transaction).

What are the alternatives?

If you don't want the FULL recovery mode behavior, then don't use FULL recovery mode - switch to SIMPLE mode. But be aware that you lose the ability to restore to any point in between your full or differential backups.

If you want the FULL recovery mode behavior, but don't want to run out of log space - then the answer is again simple (pun intended :-), take log backups! If your log is growing too quickly, take more frequent log backups! If you don't have enough disk space to store the log backups, talk to your management about the options - buying more disks and continuing to use the FULL recovery options, or being forced to move to SIMPLE recovery mode.

One thing to bear in mind - you actually have to monitor the size of your log to tell whether its growing. That's what gets people into trouble in the first place - a combination of:

  • Not knowing that the database is in FULL recovery mode
  • Not tracking the size of the log

The problem comes for those people who just use an app and have no idea that the developer put the database (maybe for a web application log) into FULL recovery mode, or even that they have SQL Server on their system. For these people there is no good answer unfortunately apart from educating developers.

How to prevent it being used

If you're a sysadmin who wants to stop your database owners and backup operators from using the NO_LOG option, there is way to do it. Trace flag 3231 in SS2000 and SS2005 will turn the NO_LOG and TRUNCATE_ONLY options into no-ops in FULL/BULK_LOGGED recovery mode, and will clear the log in SIMPLE recovery mode. Trace flag 3031 in SS2005 turns them in checkpoints in all recovery modes. These trace flags are undocumented but they're allowed to be publicized.

This way no-one can truncate the log unless you allow them to, and when they come to you to complain, you can educate them!

As I mentioned in a previous post, one of the new features for database mirroring in SQL Server 2008 is log stream compression. There's a good explanation of what this is (along with some example workloads and performance graphs) on the SQL Customer Advisory Team blog so I'm not going to duplicate all that here.

Basically, the way mirroring works is by shipping all the transaction log from the principal to the mirror, and then replaying the log records on the mirror database. Network bandwidth can be a bottleneck for mirrored databases where the rate of transaction log generation is high enough to saturate the network. On synchronously mirrored databases (where a transaction cannot commit on the principal until all the transaction log for it has been hardened in the log of the mirror database), this can even lead to workload throughput degradation.

Log compression is a way to alleviate this problem. SS2008 Books Online states that compression rates of at least 12.5% are achieved - obviously the compression ratio could be much higher than that and is dependant on what's being compressed - i.e. the data that is being processed by the application. One of the tests they did for the post above shows a 5x compression ratio - that's pretty good.

The downside of this is that compression is not free - extra CPU must be used. The obvious CPU load increases comes from having to compress the transaction log on the principal and then decompress it again on the mirror before hardening it in the log. The non-obvious increase will come because the principal and mirror should end up being able to process more transactions/second, which means more CPU is needed.

This extra CPU could be a problem for some systems that are already pegged in terms of CPU load, and so they may actually see a drop in performance when log stream compression is enabled. Now, it's on by default when you upgrade to SS2008 but there is a way to turn it off. The blog post above divulges that there is a trace flag, 1462, that turns off log stream compression and effectively reverts the behavior back to SS2005.

Apart from the obvious benefits of this feature for systems that can afford the extra CPU, I'm excited because it may allow more mirrored databases to perform effective database maintenance - in the form of fragmentation control and index maintenance. When database mirroring is enabled, the database has to be running in the full recovery mode. This means that operations such as index rebuilds are fully logged - that's a lot of extra log being generated if you're used to switching to bulk-logged recovery mode to perform index operations. Of course, this isn't an issue for reorganizing an index (with ALTER INDEX ... REORGANIZE) as this has been fully logged (and has to be that way because of how it works) since I wrote the old DBCC INDEXDEFRAG for SS2000.

Anyway, for some customers the fact that these two operations are fully logged means that running them produces so much additional log that, in conjunction with the log from the workload itself, the network becomes a bottleneck for mirroring and it slows down both the maintenance and reduces the workload throughput. The addition of log stream compression means that for these systems, the log may be compressed enough so that the network is no longer a bottleneck and more regular fragmentation control can take place. This in turn will increase workload throughput as well - a double benefit!

Over the years I was in the Storage Engine team I saw a lot of concern on the various forums about the ghost cleanup task. There have been a few bugs with it in previous versions  (see these KB articles - 932115 and 815594) and there's very little info available on it. For some reason I didn't get around to posting about it on my old blog but today I want to go into some depth on it.

So what is ghost cleanup? It's a background process that cleans up ghost records - usually referred to as the ghost cleanup task. What's a ghost record? As I described briefly in the Anatomy of a record post last week, a ghost record is one that's just been deleted in an index on a table (well, actually it gets more complicated if snapshot isolation of some form is enabled but for now, a record in an index is a good start). Such a delete operation never physically removes records from pages - it only marks them as having been deleted, or ghosted. This is a performance optimization that allows delete operations to complete more quickly. It also allows delete operations to rollback more quickly because all that needs to happen is to unmark the records as being deleted/ghosted, instead of having to reinsert the deleted records. The deleted record will be physically removed (well, its slot will be removed - the record data isn't actually overwritten) later by the background ghost cleanup task. The ghost cleanup task will leave a single record on the page to avoid having to deallocate empty data or index pages.

The ghost cleanup task can't physically delete the ghost records until after the delete transaction commits because the deleted records are locked and the locks aren't released until the transaction commits. As an aside, when ghost records exist on a page, even a NOLOCK or READ UNCOMMITTED scan won't return them because they are marked as ghost records.

When a record is deleted, apart from it being marked as a ghost record, the page that the record is on is also marked as having ghost records in one of the allocation maps - the PFS page (post coming soon!) - and in its page header. Marking a page as having ghost records in a PFS page also changes the database state to indicate that there are some ghost records to cleanup - somewhere. Nothing tells the ghost cleanup task to clean the specific page that the delete happened on - yet. That only happens when the next scan operation reads the page and notices that the page has ghost records.

The ghost cleanup task doesn't just start up when it's told to - it starts up in the background every 5 seconds and looks for ghost records to cleanup. Remember that it won't be told to go cleanup a specific page by a delete operation - it's a subsequent scan that does it, if a scan happens. When the ghost cleanup task starts up it checks to see if its been told to cleanup a page - if so it goes and does it. If not, it picks the next database that is marked as having some ghost records and looks through the PFS allocation map pages to see if there are any ghost records to cleanup. It will check through or cleanup a limited number of pages each time it wakes up - I remember the limit is 10 pages - to ensure it doesn't swamp the system. So - the ghost records will eventually be removed - either by the ghost cleanup task processing a database for ghost records or by it specifically being told to remove them from a page. If it processes a database and doesn't find any ghost records, it marks the database as not having any ghost records so it will be skipped next time.

How can you tell its running? On SQL Server 2005, you can use the following code to see the ghost cleanup task in sys.dm_exec_requests:

SELECT * INTO myexecrequests FROM sys.dm_exec_requests WHERE 1 = 0;

GO

SET NOCOUNT ON;

GO

DECLARE @a INT

SELECT @a = 0;

WHILE (@a < 1)

BEGIN

INSERT INTO myexecrequests SELECT * FROM sys.dm_exec_requests WHERE command LIKE '%ghost%'

SELECT @a = COUNT (*) FROM myexecrequests

END;

GO

SELECT * FROM myexecrequests;

GO

And on SQL Server 2000 you need to use sysprocesses (well, on SQL Server 2005 this works as well but its fake view derived from the DMVs):

SELECT * INTO mysysprocesses FROM master.dbo.sysprocesses WHERE 1 = 0;

GO

SET NOCOUNT ON;

GO

DECLARE @a INT

SELECT @a = 0;

WHILE (@a < 1)

BEGIN

INSERT INTO mysysprocesses SELECT * FROM master.dbo.sysprocesses WHERE cmd LIKE '%ghost%'

SELECT @a = COUNT (*) FROM mysysprocesses

END;

GO

SELECT * FROM mysysprocesses;

GO

The output from sys.dm_exec_requests is (with most unused and uninteresting columns stripped off):

session_id request_id  start_time              status       command
---------- ----------- ----------------------- ------------ ----------------
15         0           2007-10-05 16:34:49.653 background   GHOST CLEANUP

So how can you tell if a record is ghosted? Let's engineer some and look at it with DBCC PAGE - I've stripped out the uninteresting bits of the output and highlighted the interesting ghost parts:

CREATE TABLE t1 (c1 CHAR(10))

CREATE CLUSTERED INDEX t1c1 on t1 (c1)

GO

BEGIN TRAN

INSERT INTO t1 VALUES ('PAUL')

INSERT INTO t1 VALUES ('KIMBERLY')

DELETE FROM t1 WHERE c1='KIMBERLY';

GO

DBCC IND ('ghostrecordtest', 't1', 1);

GO

DBCC TRACEON (3604);

GO

DBCC PAGE ('ghostrecordtest', 1, 143, 3);

GO

<snip>

m_freeData = 130    m_reservedCnt = 0   m_lsn = (20:88:20)
m_xactReserved = 0  m_xdesId = (0:518)  m_ghostRecCnt = 1
m_tornBits = 0

<snip>

Slot 0 Offset 0x71 Length 17

Record Type = GHOST_DATA_RECORD      Record Attributes =  NULL_BITMAP
Memory Dump @0x6256C071

00000000:   1c000e00 4b494d42 45524c59 20200200 †....KIMBERLY  ..
00000010:   fc†††††††††††††††††††††††††††††††††††.
UNIQUIFIER = [NULL]

Slot 0 Column 1 Offset 0x4 Length 10

c1 = KIMBERLY

Slot 1 Offset 0x60 Length 17

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP
Memory Dump @0x6256C060

00000000:   10000e00 5041554c 20202020 20200200 †....PAUL      ..
00000010:   fc†††††††††††††††††††††††††††††††††††.
UNIQUIFIER = [NULL]

Slot 1 Column 1 Offset 0x4 Length 10

c1 = PAUL

Let's see what goes on the transaction log during this process (remember this is undocumented and unsupported - do it on a test database) - I've stripped off a bunch of the columns in the output:

DECLARE @a CHAR (20)

SELECT @a = [Transaction ID] FROM fn_dblog (null, null) WHERE [Transaction Name]='PaulsTran'

SELECT * FROM fn_dblog (null, null) WHERE [Transaction ID] = @a;

GO

Current LSN              Operation         Context             Transaction ID
------------------------ ----------------- ------------------- --------------
00000014:00000054:0011   LOP_BEGIN_XACT    LCX_NULL            0000:00000206
00000014:0000005a:0012   LOP_INSERT_ROWS   LCX_CLUSTERED       0000:00000206
00000014:0000005a:0013   LOP_INSERT_ROWS   LCX_CLUSTERED       0000:00000206
00000014:0000005a:0014   LOP_DELETE_ROWS   LCX_MARK_AS_GHOST   0000:00000206
00000014:0000005a:0016   LOP_DELETE_ROWS   LCX_MARK_AS_GHOST   0000:00000206

So there are the two inserts followed by the two deletes - with the rows being marked as ghost records. But where's the update to the PFS page? Well, changing the ghost bit in a PFS page is not done as part of a transaction. We'll need to look for it another way (apart from just dumping everything in the transaction log and searching manually):

SELECT Description, * FROM fn_dblog (null, null) WHERE Context like '%PFS%' AND AllocUnitName like '%t1%';

GO

Description               Current LSN              Operation        Context   Transaction ID
------------------------- ------------------------ ---------------- --------- ----------------
Allocated 0001:0000008f   00000014:00000054:0014   LOP_MODIFY_ROW   LCX_PFS   0000:00000208
                          00000014:0000005a:0015   LOP_SET_BITS     LCX_PFS   0000:00000000

The first one is just allocating a page but the second one is the one we're looking for - it's changed the bit for the page to say it has ghost records on. Let's commit the transaction and see what happens, filtering out all the previous transaction log:

SELECT MAX ([Current LSN]) FROM fn_dblog (null, null);

GO

-- 00000014:0000005e:0001

COMMIT TRAN

GO

SELECT [Page ID], * FROM fn_dblog (null, null) WHERE [Current LSN] > '00000014:0000005e:0001';

GO

Page ID         Current LSN              Operation          Context         Transaction ID
--------------- ------------------------ ------------------ --------------- --------------
NULL            00000014:0000005f:0001   LOP_COMMIT_XACT    LCX_NULL        0000:00000206
0001:0000008f   00000014:00000060:0001   LOP_EXPUNGE_ROWS   LCX_CLUSTERED   0000:00000000

We see that almost as soon as the transaction has commited, the ghost cleanup task goes in and process the page. Let's check a page dump to make sure the record is gone, and show that the contents of the record are still on the page (again, with non-relevant bits snipped out):

DBCC PAGE ('ghostrecordtest', 1, 143, 3);

GO

<snip>

m_freeData = 130         m_reservedCnt = 0        m_lsn = (20:94:1)
m_xactReserved = 0       m_xdesId = (0:518)       m_ghostRecCnt = 0
m_tornBits = 0

<snip>

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP
Memory Dump @0x6212C060

00000000:   10000e00 5041554c 20202020 20200200 †....PAUL      ..
00000010:   fc†††††††††††††††††††††††††††††††††††.
UNIQUIFIER = [NULL]

Slot 0 Column 1 Offset 0x4 Length 10

c1 = PAUL

DBCC PAGE ('ghostrecordtest', 1, 143, 2);

GO

<snip>

6212C040:   01000000 00000000 00000000 00000000 †................
6212C050:   00000000 00000000 00000000 00000000 †................
6212C060:   10000e00 5041554c 20202020 20200200 †....PAUL      ..
6212C070:   fc1c000e 004b494d 4245524c 59202002 †.....KIMBERLY  .
6212C080:   00fc0000 00000000 00000000 01000000 †................
6212C090:   00000000 13000000 01000000 00000000 †................

<snip>

So even though the record no longer exists, all that happened was that the slot was removed from the slot array at the end of the page - the record contents will remain on the page until the space is reused.

In the next post I'll go into details of the PFS and other allocation maps. Btw - please let me know if this stuff is interesting - I'd like to know where to spend blogging time. Thanks!

Here's a really interesting question that was in my search engine logs yesterday - if I have a transaction that runs and completes while a backup is running, will the complete transaction be in the backup? The answer is.... it depends!

In terms of what gets backed up, the way a full backup works is:

  1. Note the transaction log's LSN (Log Sequence Number)
  2. Read all allocated extents in the various data files
  3. Note the LSN again
  4. Read all the transaction log between the starting LSN and the ending LSN

Any transaction that commits before or on the LSN read in step 3 will be fully reflected when the database is restored. If not, the transaction will be undone. So you can't just go by the completion time of the backup and the completion time of the transaction. The transaction may well commit before the backup operation completes, but it may complete during step 4, and so it will get rolled back during a restore. In this case, it's necessary to take a log backup as well and restore that too to make the transaction be fully reflected after a restore.

As you know I always like to prove things  - so here's my proof of what I just said. I'm going to use the AdventureWorks database to do this. First thing is to set it to full recovery mode (and take the first full backup to start full recovery mode logging):

ALTER DATABASE AdventureWorks SET RECOVERY FULL;

BACKUP DATABASE AdventureWorks TO DISK='C:\SQLskills\AdventureWorks.bck' WITH INIT;

GO

Now I'm going to flush out the backup history tables in MSDB:

USE msdb;

GO

EXEC sp_delete_backuphistory '10/6/07';

GO

I've got a really contrived example that I'm going to use to show whether my transaction is wholely contained in the full backup. Using the HumanResources.Employee table, there's a column VacationHours which I'm going to set to 0 and then force all the changes pages to disk. This is my base state:

UPDATE AdventureWorks.HumanResources.Employee SET VacationHours = 0;

GO

CHECKPOINT;

GO

My contrived transaction is going to be in a tight loop updating all the rows in the table, which will generate lots of transaction log, and timed to complete just before the backup completes (i.e. in stage 4). In one connection I start the backup:

BACKUP DATABASE AdventureWorks TO DISK='C:\SQLskills\AdventureWorks.bck' WITH INIT;

SELECT GETDATE ();

GO

and in another I start my contrived transaction, after starting the backup:

BEGIN TRAN

DECLARE @a INT

DECLARE @b INT

SELECT @a = 1

WHILE (@a < 6)

BEGIN

SELECT @b = 1

WHILE (@b < 201)

BEGIN

UPDATE AdventureWorks.HumanResources.Employee SET VacationHours = @b

SELECT @b=@b+1

END

SELECT @a=@a+1

END

COMMIT TRAN;

SELECT GETDATE ();

GO

It's not pretty but it does the job. The backup finishes at 2007-10-05 17:42:38.983 and the transaction finishes at 2007-10-05 17:42:38.107 - before the backup finishes. Remember I set the VacationHours all to zero before running my transaction - let's check the transaction did actually change them:

SELECT MAX (VacationHours) FROM AdventureWorks.HumanResources.Employee;

GO

This returns 200 - which is what I'd expect. But is the transaction wholely contained in the backup? Let's look at the backup history to find out the last LSN that was captured in the full backup:

SELECT Backup_Start_Date, Backup_Finish_Date, First_LSN, Last_LSN

FROM msdb.dbo.backupset WHERE database_name = 'AdventureWorks';

GO

And the output we get is:

Backup_Start_Date       Backup_Finish_Date      First_LSN           Last_LSN
----------------------- ----------------------- ------------------- -------------------
2007-10-05 17:42:22.000 2007-10-05 17:42:38.000 86000000001600029   91000000625600001

The LSNs are in decimal, so we need to convert the three numbers to hex so we can compare them to what's in the log - giving: 5B:1870:1. This is the LSN of the last log record that was backed up in the full backup. Now let's take a look at the transaction log for AdventureWorks using the undocumented fn_dblog function. This is undocumented but very well known. It's a fully composable alternative to using the old DBCC LOG command.

USE AdventureWorks;

GO

SELECT [Current LSN], Operation, [Transaction ID], AllocUnitName FROM fn_dblog (NULL, NULL);

GO

The log record at that LSN is:

00000058:00001870:0001  LOP_MODIFY_ROW     0000:00001338  HumanResources.Employee.PK_Employee_EmployeeID

This is obviously in the middle of my contrived transaction - showing that it isn't all in the full backup. The end of the transaction isn't until way later in the log:

0000005e:00000628:01b1  LOP_MODIFY_ROW     0000:00001338  HumanResources.Employee.PK_Employee_EmployeeID
0000005e:00000628:01b2  LOP_COMMIT_XACT    0000:00001338  NULL

Before I do anything else, I want to take a log backup to preserve my transaction:

BACKUP LOG AdventureWorks TO DISK='C:\SQLskills\AdventureWorks_Log.bck' WITH INIT;

GO

Now I want to restore the full backup and really show that my transaction isn't all in there:

USE master;

GO

RESTORE DATABASE AdventureWorks FROM DISK='C:\SQLskills\AdventureWorks.bck' WITH REPLACE, RECOVERY;

GO

SELECT MAX (VacationHours) FROM AdventureWorks.HumanResources.Employee;

GO

This returns 0. Clearly my transaction isn't all in there, and the parts that are were rolled back during the restore. Now let's do the same thing but using NORECOVERY for the restore of the full backup and also applying the log backup I took:

RESTORE DATABASE AdventureWorks FROM DISK='C:\SQLskills\AdventureWorks.bck' WITH REPLACE, NORECOVERY;

GO

RESTORE LOG AdventureWorks FROM DISK='C:\SQLskills\AdventureWorks_Log.bck' WITH RECOVERY;

GO

SELECT MAX (VacationHours) FROM AdventureWorks.HumanResources.Employee;

GO

This time the SELECT returns 200. And now you can start playng around with fn_dblog if you didn't know about it before. I'll be posting more about Storage Engine internals that you can figure out from the transaction log in future.

(Be sure to join our community to get our monthly newsletter with exclusive content, advance notice of classes with discount codes, and other SQL Server goodies!)

(If you're having to read this because no-one is managing your SQL Server, consider contacting us to manage them for you - see here for details.)

This is a follow-on article from two posts:

People get themselves into situations where they have no backups (or damaged backups) and the data or log files are damaged such that the only way to access the database is with EMERGENCY mode. In these situations, prior to SQL Server 2005, there was no documented or supported way to fix a database while in EMERGENCY mode - the only guidance could be found on the Internet or from calling Product Support and paying for help. The sequence of events was:

  1. Hack the system tables to get the database into 'emergency' mode.
  2. Use the undocumented and unsupported DBCC REBUILD_LOG command to build a new transaction log.
  3. Run DBCC CHECKDB with the REPAIR_ALLOW_DATA_LOSS option to fix up corruptions in the data files - both those that may have caused the issue, and those caused by rebuilding the transaction log (e.g. because an active transaction altering the database structure was lost).
  4. Figure out what data was lost or is transactionally inconsistent (e.g. because a transaction altering multiple tables was lost) as far as your business logic is concerned
  5. Take the database out of emergency mode
  6. And then all the other stuff like root-cause analysis and getting a better backup strategy

I decided to add a new feature to SQL Server 2005 called EMERGENCY mode repair that will do steps 2 and 3 as an atomic operation. The reasons for this were:

  • Much of the advice of how to do this on the Internet missed steps out (particularly missing step 3!)
  • The DBCC REBUILD_LOG command was unsupported and undocumented and we didn't like advising customers to use it
  • Adding a documented last-resort method of recovering from this situation would reduce calls to Product Support - saving time and money for customers and Microsoft.

So, when in EMERGENCY mode, you can use DBCC CHECKDB to bring the database back online again. The only repair option allowed in EMERGENCY mode is REPAIR_ALLOW_DATA_LOSS and it does a lot more than usual:

  • Forces recovery to run on the transaction log (if it exists). You can think of this as 'recovery with CONTINUE_AFTER_ERROR' - see this post for more details on the real CONTINUE_AFTER_ERROR option for BACKUP and RESTORE. The idea behind this is that the database is already inconsistent because either the transaction log is corrupt or something in the database is corrupt in such a way that recovery cannot complete. So, given that the database is inconsistent and we're about to rebuild the transaction log, it makes sense to salvage as much transactional information as possible from the log before we throw it away and build a new one.
  • Rebuild the transaction log - but only if the transaction log is corrupt.
  • Run DBCC CHECKDB with the REPAIR_ALLOW_DATA_LOSS option.
  • Set the database state to ONLINE.

It's a one-way operation and can't be rolled back. I always advise taking a copy of the database files before doing this in case something goes wrong or there are unrepairable errors. And if it does? Probably time to update your resume for not having a water-tight backup and disaster-recovery strategies in place. Saying that, I've never seen it fail. I can think of some pathalogical cases where it would fail though (involving the file system itself having problems) but that's really unlikely.

Let's walk-through an example of using it. I'm assuming there's a database called emergencydemo that's in the same state as at the end of the Search Engine Q&A #4 blog post - the database has no log file, is in EMERGENCY mode and the salaries table is corrupt.

First off I'll try bringing the database online, just to see what happens:

ALTER DATABASE emergencydemo SET ONLINE;
GO

File activation failure. The physical file name "C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\emergencydemo_log.LDF" may be incorrect.
The log cannot be rebuilt because the database was not cleanly shut down.
Msg 945, Level 14, State 2, Line 1
Database 'emergencydemo' cannot be opened due to inaccessible files or insufficient memory or disk space. See the SQL Server errorlog for details.
Failed to restart the current database. The current database is switched to master.
Msg 5069, Level 16, State 1, Line 1
ALTER DATABASE statement failed.

The first message makes sense - the database knows it needs to be recovered because it wasn't cleanly shut down, but the log file simply isn't there. The second message is from the new feature in 2005 that will automatically create a log file if one is missing on startup or attach - as long as the database was cleanly shut down. The 945 and 5069 errors are self-explanatory but notice that the database gets switched to master underneath us. This has bitten me several times in the past.

Well, I expected that not to work. Let's run emergency-mode repair:

DBCC CHECKDB (emergencydemo, REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS;
GO

Msg 945, Level 14, State 2, Line 1
Database 'emergencydemo' cannot be opened due to inaccessible files or insufficient memory or disk space. See the SQL Server errorlog for details.

Hmm - looks like the failed ALTER DATABASE statement did change the state - but what to?

SELECT state_desc FROM sys.databases WHERE name='emergencydemo';
GO

state_desc
------------------------------------------------------------
RECOVERY_PENDING

That makes sense I guess. Ok - back to emergency mode and run repair:

ALTER DATABASE emergencydemo SET EMERGENCY;
GO
DBCC CHECKDB (emergencydemo, REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS;
GO

Msg 7919, Level 16, State 3, Line 1
Repair statement not processed. Database needs to be in single user mode.

EMERGENCY mode is not SINGLE_USER mode - a database needs to be in SINGLE_USER mode for repair to run and EMERGENCY mode allows multiple connections from members of the sysadmin role. You can set SINGLE_USER mode as well as EMERGENCY mode - however, the sys.databases field state_desc will still just say EMERGENCY.

ALTER DATABASE emergencydemo SET SINGLE_USER;
GO
DBCC CHECKDB (emergencydemo, REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS;
GO

File activation failure. The physical file name "C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\emergencydemo_log.LDF" may be incorrect.
The log cannot be rebuilt because the database was not cleanly shut down.
Warning: The log for database 'emergencydemo' has been rebuilt. Transactional consistency has been lost. The RESTORE chain was broken, and the server no longer has context on the previous log files, so you will need to know what they were. You should run DBCC CHECKDB to validate physical consistency. The database has been put in dbo-only mode. When you are ready to make the database available for use, you will need to reset database options and delete any extra log files.

This time it worked. First of all we get the same error as if we tried to bring the database online - that's from the code that's trying to run 'recovery with CONTINUE_AFTER_ERROR' on the transaction log. Next we get a nice long warning that the transaction log has been rebuilt and the consequences of doing that (basically that you need to start a new log backup chain by taking a full backup). If there had been any corruptions we'd have seen the usual output from DBCC CHECKDB about what errors it found and fixed. There's also a bunch of stuff in the error log

2007-10-02 17:21:20.95 spid51      Starting up database 'emergencydemo'.
2007-10-02 17:21:20.96 spid51      Error: 17207, Severity: 16, State: 1.
2007-10-02 17:21:20.96 spid51      FileMgr::StartLogFiles: Operating system error 2(The system cannot find the file specified.) occurred while creating or opening file 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\emergencydemo_log.LDF'. Diagnose and correct the operating system error, and retry the operation.
2007-10-02 17:21:20.96 spid51      Starting up database 'emergencydemo'.
2007-10-02 17:21:21.10 spid51      Starting up database 'emergencydemo'.
2007-10-02 17:21:21.18 spid51      Warning: The log for database 'emergencydemo' has been rebuilt. Transactional consistency has been lost. The RESTORE chain was broken, and the server no longer has context on the previous log files, so you will need to know what they were. You should run DBCC CHECKDB to validate physical consistency. The database has been put in dbo-only mode. When you are ready to make the database available for use, you will need to reset database options and delete any extra log files.
2007-10-02 17:21:21.18 spid51      Warning: The log for database 'emergencydemo' has been rebuilt. Transactional consistency has been lost. The RESTORE chain was broken, and the server no longer has context on the previous log files, so you will need to know what they were. You should run DBCC CHECKDB to validate physical consistency. The database has been put in dbo-only mode. When you are ready to make the database available for use, you will need to reset database options and delete any extra log files.
2007-10-02 17:21:21.99 spid51      EMERGENCY MODE DBCC CHECKDB (emergencydemo, repair_allow_data_loss) WITH no_infomsgs executed by ROADRUNNERPR\paul found 0 errors and repaired 0 errors. Elapsed time: 0 hours 0 minutes 1 seconds.

Note that the usual error log entry from running DBCC CHECKDB is preceded by 'EMERGENCY MODE' this time.

Checking the database state:

SELECT state_desc FROM sys.databases WHERE name='emergencydemo';
GO

state_desc
------------------------------------------------------------
ONLINE

we find that it's been brought back online again because everything worked. It's still SINGLE_USER though so let's make it MULTI_USER and see what happened to our table:

ALTER DATABASE emergencydemo SET MULTI_USER;
GO
USE EMERGENCYDEMO;
GO
SELECT * FROM salaries;
GO

FirstName            LastName             Salary
-------------------- -------------------- -----------
John                 Williamson           10000
Stephen              Brown                0
Jack                 Bauer                10000

(3 row(s) affected)

And of course its still corrupt - because even though the transaction log was rebuilt and repaired, the original transaction that changed the salary to 0 never got a chance to rollback becuase I deleted the transaction log (in the previous post).

Now remember, you should only use this as a last resort, but if you do get yourself into trouble, you know there's a command that should be able to help you.

By far the most common search engine query leading to the blog is about fixing a suspect or unrecovered database. The very best way to do this is to use your backups, and to have a backup strategy that allows you to recover in the smallest amount of time and with no data loss. But what if you don't have a backup for some reason? Well, it depends what's damaged in the database and when the damage is noticed.

There are three states the database can be in when its damaged:

  1. ONLINE
    • If it's one of the data files that is damaged and the damage is hit during a query or some other operation then the database will remain online and accessible.
  2. RECOVERY PENDING
    • The database will be in this state if SQL Server knows that recovery needs to be run on the database but something is preventing recovery from starting. This is different from SUSPECT because there's nothing to say that recovery is going to fail - it just hasn't started yet.
    • An example of this is when the database wasn't cleanly shut down (i.e. there was at least one uncommitted transaction active at the time the database was shut down) and the log file has been deleted.
  3. SUSPECT
    • The database will be in this state if the transaction log is damaged and it prevents recovery or a transaction rollback from completing.
    • Some examples of this are:
      • When the database wasn't cleanly shut down and recovery tries to read a corrupt data page or comes across a corrupt log record.
      • A regular transaction rolls back and tries to read a corrupt data page or comes across a corrupt log record.

You can check the state of a database in the sys.databases catalog view:

SELECT state_desc FROM sys.databases WHERE name = 'master';

GO

or by using the DATABASEPROPERTYEX function:

SELECT DATABASEPROPERTYEX ('master', 'STATUS');

GO

Beware however, as DATABASEPROPERTYEX returns SUSPECT when the database is RECOVERY PENDING, as I'll show you below.

So the state the database is in determines what you can do if you don't have a backup. The easiest case is when it's still ONLINE. In this case you can probably run repair to remove the damage, most likely with some data loss (see my previous post on REPAIR_ALLOW_DATA_LOSS for more details), and then take steps to prevent the damage occuring again. If repair can't fix all the errors then your only option without a backup is to extract as much data as you can into a new database.

The other two database states are more difficult and are what's causing people to search for help. In this case the database isn't accessible at all, because recovery hasn't run or completed and so the database is in an inconsistent state. It could just be logically inconsistent (e.g. a transaction modifying data hasn't recovered) or worse it could structurally inconsistent (e.g. a system transaction modifying index linkages has't recovered). Either way, SQL Server wants to prevent you from getting into the database because it doesn't know what state the data and structures in the database are in. But if you don't have a backup, you need to get into the database, no matter what state things are in.

You can do this using EMERGENCY mode. In versions prior to SQL Server 2005, EMERGENCY mode wasn't documented and you had to hack the sysdatabases table to get a database into it (worse still, the exact hack to use was changed from version to version in a bid to obfuscate things). In 2005 though, EMERGENCY mode was documented and proper syntax added to support it. Members of the sysadmin role can put the database into EMERGENCY mode using:

ALTER DATABASE foo SET EMERGENCY;

GO

Once in EMERGENCY mode, the database is accessible only by members of the sysadmin role. The database is also read-only as nothing can be written to the transaction log.

Let's see an example of this (based on a demo from my Secrets of Fast Detection and Recovery from Database Corruptions presentation). I'm going to create a database and a sample table:

IF DATABASEPROPERTY (N'emergencydemo', 'Version') > 0 DROP DATABASE emergencydemo;

GO

CREATE DATABASE emergencydemo;

GO

USE emergencydemo;

GO

CREATE TABLE salaries (

FirstName CHAR (20),

LastName CHAR (20),

Salary INT);

GO

INSERT INTO salaries VALUES ('John', 'Williamson', 10000);

INSERT INTO salaries VALUES ('Stephen', 'Brown', 12000);

INSERT INTO salaries VALUES ('Jack', 'Bauer', 10000);

GO

I'm going to start an explicit user transaction and update a row in the table:

BEGIN TRANSACTION;

GO

UPDATE salaries SET Salary = 0 WHERE LastName='Brown';

GO

Now I'm going to force the data page holding the updated row to be written to disk:

CHECKPOINT;

GO

So we have an active, uncommitted transaction that's modified the table, and the table modification has been written to disk. If the power failed at this point, crash recovery would run and the transaction would be rolled back. I'm going to simulate this by shutting down SQL Server. In another connection:

SHUTDOWN WITH NOWAIT;

GO

Server shut down by NOWAIT request from login ROADRUNNERPR\paul.

SQL Server is terminating this process.

I'm also going to simulate damage to the transaction log:

C:\Documents and Settings\paul>del "C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\emergencydemo_log.LDF"

C:\Documents and Settings\paul>

Now when I start up SQL Server again, we see the following in the error log:

2007-10-02 11:39:47.14 spid18s     Starting up database 'emergencydemo'.
2007-10-02 11:39:47.46 spid18s     Error: 17207, Severity: 16, State: 1.
2007-10-02 11:39:47.46 spid18s     FileMgr::StartLogFiles: Operating system error 2(The system cannot find the file specified.) occurred while creating or opening file 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\emergencydemo_log.LDF'. Diagnose and correct the operating system error, and retry the operation.
2007-10-02 11:39:47.60 spid18s     File activation failure. The physical file name "C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\emergencydemo_log.LDF" may be incorrect.
2007-10-02 11:39:47.60 spid18s     The log cannot be rebuilt because the database was not cleanly shut down.

The database wasn't cleanly shut down and the transaction log isn't available so recovery couldn't run. The final message is interesting - there's a feature in SQL Server 2005 that if you attach or startup a database without a transaction log file, and the database was cleanly shut down, SQL Server will create a new log file automatically. In our case that can't happen though.

What happens if I try to get into the database?

USE emergencydemo;

GO

Msg 945, Level 14, State 2, Line 1

Database 'emergencydemo' cannot be opened due to inaccessible files or insufficient memory or disk space. See the SQL Server errorlog for details.

So what state is the database in?

SELECT DATABASEPROPERTYEX ('emergencydemo', 'STATUS');

GO

returns SUSPECT. But checking the sys.databases table

SELECT state_desc FROM sys.databases WHERE name='emergencydemo';

GO

returns RECOVERY PENDING. This is what I'd expect, as recovery didn't get a chance to even start.

Now I'll set the database into EMERGENCY mode so I can get in and see what state things are in:

ALTER DATABASE emergencydemo SET EMERGENCY;

GO

In the errorlog you can tell when a database has been put into EMERGENCY mode:

2007-10-02 11:53:52.57 spid51      Setting database option EMERGENCY to ON for database emergencydemo.
2007-10-02 11:53:52.59 spid51      Starting up database 'emergencydemo'.
2007-10-02 11:53:52.62 spid51      The database 'emergencydemo' is marked EMERGENCY_MODE and is in a state that does not allow recovery to be run.

Let's try that again:

USE emergencydemo;

GO

This time it works. What's the state of the data?

SELECT * FROM salaries;

GO

FirstName            LastName             Salary
-------------------- -------------------- -----------
John                 Williamson           10000
Stephen              Brown                0
Jack                 Bauer                10000

It's inconsistent, as I'd expect.

That's the catch with EMERGENCY mode - you can get into the database but recovery hasn't run or completed so you don't know whether the database is logically or structurally consistent. However, at least you can get into the database to extract data out or repair any damage.

In the next post (later today) I'll show you how to repair any damage using the emergency-mode repair feature of DBCC CHECKDB.

One of the great things about the blog engine we use is that it shows all the search engine queries that led to someone clicking through to the site. I've been looking through the logs to see what kind of problems people are having that end up here. In this occasional series, I'm going to pick a search engine query and blog about it - hopefully helping out people who have the problem in future.

First up is running out of transaction log space. This happens when the transaction log fills up to the point where it has to grow but either autogrow is not turned on or the volume on which the transaction log is placed has no more space for the file to grow. So what causes the transaction log to fill up in the first place? It could be a number of different things. The two most common ones I've seen are:

  • The database is in full recovery mode with normal processing, a full database backup has been taken but no log backups have been taken. This commonly happens when an application goes into production and someone decides to take a backup, without realizing the consequences. Log truncation is not automatic in such a case - a log backup is required to allow the log to truncate.
  • The database is in any recovery mode but there's a very long-running, uncommitted transaction that prevents log truncation. (Even in full recovery mode with regular log backups, all the log records for the long-running transaction are required in case it rolls-back - and SQL Server can't selectively truncate log records from the log for some transactions but not others.)

Let's have a look at both of these in SQL Server 2005 and example solutions.

The log backup case...

First thing to do is create a database with a very small transaction log that we can easily fill up.

USE master;
GO
IF DATABASEPROPERTY (N'tinylogtest', 'Version') > 0 DROP DATABASE tinylogtest;
GO

CREATE DATABASE tinylogtest ON (
   
NAME = fgt_mdf,
   
FILENAME = 'c:\tinylogtest\tinylogtest.mdf',
   
SIZE = 2MB)
LOG ON (
   
NAME = fgt_log,
   
FILENAME = 'c:\tinylogtest\tinylogtest.ldf',
   
SIZE = 512KB,
   
FILEGROWTH = 0);
GO

Notice that I've specifically turned auto-growth of the log file off by setting FILEGROWTH = 0. Now let's set the database into full recovery mode and take a full database backup.

ALTER DATABASE tinylogtest SET RECOVERY FULL;
GO
BACKUP DATABASE tinylogtest to DISK = 'C:\tinylogtest\dummybackup.bck';
GO

And then create a simple table and fill up the transaction log.

SET NOCOUNT ON;
GO
CREATE TABLE tinylogtest..testtable (c1 INT, c2 CHAR (3000));
GO
WHILE (1=1)
BEGIN
   
INSERT INTO tinylogtest..testtable VALUES (1, 'a')
END;

GO

And we get the following error:

Msg 9002, Level 17, State 2, Line 4
The transaction log for database 'tinylogtest' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases

This is much better than you get on SQL Server 2000:

Msg 9002, Level 17, State 2, Line 4
The log file for database 'tinylogtest' is full.

This has no helpful information at all. So, looking at the 2005 message, it's telling us to look in the sys.databases table for more info. Let's do that:

SELECT log_reuse_wait_desc FROM sys.databases WHERE [name] = 'tinylogtest';
GO

And the result is:

LOG_BACKUP

That's pretty clear - take a log backup! Here are the various values this can take, from the Books Online entry for sys.databases:

  • NOTHING
  • CHECKPOINT
  • LOG_BACKUP
  • ACTIVE_BACKUP_OR_RESTORE
  • ACTIVE_TRANSACTION
  • DATABASE_MIRRORING
  • REPLICATION
  • DATABASE_SNAPSHOT_CREATION
  • LOG_SCAN
  • OTHER_TRANSIENT

Their meanings are explained in the Books Online entry Factors That Can Delay Log Truncation. Now if we take a log backup, the log can be truncated.

BACKUP LOG tinylogtest to DISK = 'C:\tinylogtest\dummybackup.bck';
GO

And checking the state in sys.databases again returns:

NOTHING

Simple - but it's amazing how many times I see this problem on newsgroups and forums.

The long-running transaction case...

For this test, assume I've run the T-SQL above to create the database, put it into full recovery mode, take the first full backup, and then create the table. Now in a second connection, I'll create a long-running transaction:

BEGIN TRANSACTION PAULSTRAN;
GO

INSERT INTO tinylogtest..testtable VALUES (1, 'a');
GO

Note that the transaction hasn't been committed or rolled back. In the original connection, I'll execute the WHILE loop to fill up the log again. When we check sys.databases we get back:

LOG_BACKUP

So we take a log backup and check the state again. This time we get:

ACTIVE_TRANSACTION

We need to use the DBCC OPENTRAN command to find out what the long running transaction is:

DBCC OPENTRAN ('tinylogtest');
GO

And we get back:

Transaction information for database 'tinylogtest'.

Oldest active transaction:
    SPID (server process ID): 54
    UID (user ID) : -1
    Name            : PAULSTRAN
    LSN               : (18:98:1)
    Start time      : Sep 23 2007  5:49:53:077PM
    SID               : 0x010500000000000515000000ae4da6eadcd59cf661d6bb58ed030000
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

From this info we can track down the transaction and commit it or roll it back. In our example, once we roll back the PAULSTRAN transaction, there's enough log space to insert one more record into the table, but then we run out of log space again. What's going on? Because the PAULSTRAN transaction we very close to the start of the transaction log, all of the log it was effectively holding hostage was backed up but couldn't be cleared. In this case we need to perform another log backup to allow the log to be cleared. This makes sense and is documented behavior.

Summary

Of course, there are other causes of the transaction log filling up and other things you can do apart from taking log backups or identifying long-running transactions - such as growing the log file or moving the log file to a volume with more free space. Here are some good resources you can use for further reading:

Hope this helps!

These next few posts are based off part of my Secrets of Fast Detection and Recovery from Database Corruptions session from TechEd and various user groups around the world (see here for a video recording from TechEd). I'll also be doing this session at ITForum in Barcelona in November, and as a live webcast for Microsoft sometime over the next month or so (I'll publicize the date nearer the time). I did a few posts on these subjects last year but now I want to reorder them, add a post about using emergency mode and walk you through some emergency mode demo scripts.

In this post I want to describe the two worst things I think you can do to your database - rebuilding your transaction log and running REPAIR_ALLOW_DATA_LOSS - that people often try doing instead of restoring from their backups.

Rebuilding your transaction log

It's pretty well known that in SQL Server 2000 (and before) there's an undocumented and unsupported command called DBCC REBUILD_LOG. What this does is pretty simple - it deletes the transaction log file(s) and creates a new one. It completely disregards any uncommitted transactions that may exist - it just deletes them. This means that these uncommitted transactions don't get a chance to roll back.

What does this mean? Well, in the best case, the only in-flight transactions were altering user data - so your business logic, inherent and constraint-enforced relationships between tables, and the basic logical integrity of your user data are all broken. In the worst case, the in-flight transactions were altering the structure of the database (e.g. doing a page split) so that fact that they didn't get achance to rollback means the databse may be structurally corrupt!

Here's are two examples (somewhat contrived) that illustrate the possible consequences of rebuilding a transaction log.

  1. Logical data loss.

    Imagine you're at an ATM transferring $1000 from your checking account to your savings account. On the SQL Server in the bank's datacenter, the transaction happens in two parts - update the record in the checking accounts table with the balance minus $1000 and then update the record in the savings account table with the balance plus $1000. The process is half-way through - $1000 has been debited from your checking account, but not yet credited to your savings account, when disaster strikes! A work crew outside the datacenter accidentally cuts the power and the machine hosting SQL Server powers down. The ATM displays one of those infuriating "We're sorry, our computers are unavailable at present" and you walk away grumbling but think nothing more about it.

    Meanwhile, the power's been restored to the datacenter and SQL Server is going through crash recovery. The partially completed transaction on your account should rollback and credit back the $1000 to your checking account. But the new DBA at the bank decides that its taking too long for the system to come back up. He delets the transaction log and rebuilds it to get the system up faster. Unfortunately, the portion of the transaction log that had not had a chance to recover included the transaction involving your bank account. Even more unfortunately, a checkpoint occured right before the power loss, and the database page containing the updated checking account balance was flushed to disk. Now, when the transaction log is deleted and rebuilt, your transaction can't rollback - because it's simply gone. So the $1000 debit from your checking account is not rolled back - you've lost $1000!!

  2. Physical database corruption.

Imagine an insert into a table with a single non-clustered index. Under the covers, the insert happens in two parts - insert the record into the table and then insert the corresponding non-clustered index record. Imagine a similar disaster recovery situation as I described above occuring after the table insert has been done but not the non-clustered index insert. If the log is rebuilt, then the table and its index are not in sync - actual phsyical corruption!

Whenever a transaction log is rebuilt, a message is output to the SQL Server error log and the Windows event log. In SQL Server 2005, the message is:

2007-09-14 15:50:48.82 spid52      Warning: The log for database 'test' has been rebuilt. Transactional consistency has been lost. The RESTORE chain was broken, and the server no longer has context on the previous log files, so you will need to know what they were. You should run DBCC CHECKDB to validate physical consistency. The database has been put in dbo-only mode. When you are ready to make the database available for use, you will need to reset database options and delete any extra log files.

In SQL Server 2005, the old DBCC command has been removed and replaced with undocumented (and still unsupported) syntax. However, there is now a fully documented and supported way to do this - which I'll get to in a couple of posts.

Sometimes there's no alternative to rebuilding a transaction log - when the log is physically damaged and there's no backup. In this case, Product Support will walk you through the process of correctly rebuilding the transaction log, running repair to fix up any corruptions, and ensuring you know that the logical integrity of your data could be broken. Unfortunately, all too often I see people simply rebuilding the transaction log and continuing with regular operations - no checks, no repairs, no root-cause analysis.

REPAIR_ALLOW_DATA_LOSS

The vast majority of the time, REPAIR_ALLOW_DATA_LOSS is the repair level that CHECKDB recommends when it finds corruptions. This is because fixing nearly anything that's not a minor non-clustered index issue requires deleting something to repair it. So, REPAIR_ALLOW_DATA_LOSS will delete things. This means it will probably delete some of your data as well. If, for instance it finds a corrupt record on a data page, it may end up having to delete the entire data page, including all the other records on the page, to fix the corruption. That could be a lot of data. For this reason, the repair level name was carefully chosen. You can't type in REPAIR_ALLOW_DATA_LOSS without realizing that you're probably going to lose some data as part of the operation.

I've been asked why this is. The purpose of repair is not to save user data. The purpose of repair is to make the database structurally consistent as fast as possible (to limit downtime) and correctly (to avoid making things worse). This means that repairs have to be engineered to be fast and reliable operations that will work in every circumstance. The simple way to do this is to delete what's broken and fix up everything that linked to (or was linked from) the thing being deleted - whether a record or page. Trying to do anything more complicated would increase the chances of the repair not working, or even making things worse.

The ramifications of this are that running REPAIR_ALLOW_DATA_LOSS can lead to the same effect on your data as rebuilding a transaction log with in-flight transactions altering user data - your business logic, inherent and constraint-enforced relationships between tables, and the basic logical integrity of your user data could all be broken. BUT, the database is now structurally consistent and SQL Server can run on it without fear of hitting a corruption that could cause a crash.

To continue the contrived example from above, imagine your bank checking and savings accounts just happen to be stored on the same data page in the bank's SQL Server database. The new DBA doesn't realize that backups are necessary for disaster recovery and data preservation and so isn't taking any. Disaster strikes again in the form of the work crew outside the datacenter accidentally cutting the power and the machine hosting SQL Server powers down. This time, one of the drives has a problem while powering down and a page write doesn't complete - causing a torn page. Unfortunately, it's the page holding your bank accounts. As the DBA doesn't have any backups, the only alternative to fix the torn-page is to run REPAIR_ALLOW_DATA_LOSS. For this error, it will delete the page, and does so. In the process, everything else on the page is also lost, including your bank accounts!!

Summary

So, you can see how these two operations are really very, very bad things to do to a database and can cause havoc with your data. And yet people still have to  use these operations because they don't have valid backups...

In the next post I'll introduce EMERGENCY mode and how to use it.

Theme design by Nukeation based on Jelle Druyts