This blog post is part of the monthly T-SQL Tuesday series that fellow-MVP Adam Machanic (twitter|blog) organizes. This month's T-SQL Tuesday is being run by Michael Coles (twitter|blog) and is on the subject of reporting - see this blog post for details.

The theme of this month's T-SQL Tuesday is LOB data so I'm going to stretch things a little and explain why LOB data makes shrink performance really suck (that's a technical term :-)

You all know that I really don't like data file shrinking (although it can sometimes be absolutely necessary. See these posts for my soap-box diatribes:

The way that data file shrink works is that it picks up pages from the end of the data file and tries to move them as far towards the beginning of the data file as it can. As part of doing this, it has to fix up any structures that page is part of.

For example, if the page is part of an index, there will be other pages in the index that have physical links to the page being moved - these linkages have to be fixed up. The left- and right-hand neighboring pages in the same level of the index are easy to fix, as the page being moved points to them.

However, the page in the lext level up towards the root page in the index is more tricky, as there's no back-link pointing up to it - a scan of the next level in the index is required to find the page that points down to the page being moved. This should be pretty fast as the index levels get much smaller pretty quickly as you move up towards the root page (the magnitude change in size depends on the fanout of the index - I explain this in the post Inside sys.dm_db_index_physical_stats).

Important point: Whenever a page is being moved and there's some portion of a structure on the page that doesn't have a backlink to what is pointing to it, a scan is involved.

LOB values can be stored in-row or off-row (i.e. as part of the data or index record, or in a tree of text records on different pages altogether). Legacy LOB types (text, ntext, image) are stored off-row by default. New LOB types (varchar(max), nvarchar(max), XML, varbinary(max)) are stored in-row by default, up to a maximum size of 8000 bytes as long as there is space in the record.

When a LOB value is stored off-row, there is a complex pointer stored in the data or index row (called a blob root) that contains a pointer to the physical location of the top of the tree of text records that make up the LOB value, the size of the first record being pointed at, and a timestamp (not a time, a timestamp data-type value). The text record being pointed to also contains the same timestamp.

Let's take a look:

CREATE TABLE TextTest (c1 INT, c2 TEXT);
GO
INSERT INTO TextTest VALUES (1, 'a');
GO

Now figure out the first page in the table using DBCC IND or my sp_AllocationMetadata script, or whatever, and run DBCC PAGE on it:

DBCC TRACEON (3604);
DBCC PAGE (TestDB, 1, 737, 3);
GO

<snip>

Slot 0 Column 0 Offset 0x4 Length 4

c1 = 1                              

c2 = [Textpointer] Slot 0 Column 1 Offset 0xf Length 16

TextTimeStamp = 131137536            RowId = (1:152:0)

If we follow the pointer to page (1:152) and run DBCC PAGE on it, we see:

DBCC TRACEON (3604);
DBCC PAGE (foo, 1, 152, 3);
GO

<snip>

Blob row at: Page (1:152) Slot 0 Length: 84 Type: 0 (SMALL_ROOT)

Small Blob Id: 131137536 Size:1

6855C074:        61                               a

Notice that the two numbers in bold match. Notice also that there's no backlink from the off-row LOB value back to the 'owning' data record.

When a text page is moved by data file shrink, a table scan must be performed for each LOB value to find the 'owning' data record so the pointer can be updated. (If the LOB value is from an INCLUDEd column in a nonclustered index, then a similar scan must be performed of the nonclustered index. If the LOB value is lower down in the text tree for a >8000 byte LOB value, all text pages for that table or index must be scanned.)

Very slow. Very, very slow.

There's a similar issue if DBCC CHECKDB finds an orphaned text record - it can't tell who is supposed to own it so it has to go back and rescan all tables in its current batch to figure it out - which can make DBCC CHECKDB take a lot longer than usual too.

During SQL Server 2005 development when the new LOB types were implemented, I pushed for a backlink to alleviate these problems, but the engineering effort wasn't worth it. So we're stuck with it.

Yet one more reason that shrink is nasty!

Categories:
LOB data | Shrink

2005 SP3 CU6 contains a fix for a corruption bug that can happen after rebuilding an XML index that contains XML elements greater than 8000 bytes.

The bug was fixed in 2008 but hadn't made it back to 2005 until the last CU was released. I had email from someone back in August experiencing corruption and I thought it looked like a bug - so I advised them to call PSS, who told them they were the first to report it on 2005 and it was a bug. Cool! (if you see what I mean).

If you run a CHECKDB or a CHECKTABLE after rebuilding an XML index, you'll see 8964 errors like the one below:

DBCC results for 'sys.xml_index_nodes_980198542_32000'.
Msg 8964, Level 16, State 1, Line 2
Table error: Object ID 909246294, index ID 1, partition ID 72057594130530304, alloc unit ID 72057594149535744 (type LOB data). The off-row data node at page (1:621), slot 0, text ID 1302003712 is not referenced.

If the XML index contains XML elements that are larger than 8000 bytes, they will be pushed off-row as row overflow columns, stored in LOB pages. The XML index rebuild isn't cleaning up its LOB pages properly so leaves orphaned ones lying around. Not really nasty corruption, but enough to set off alarm bells in your monitoring.

You can get the CU at http://support.microsoft.com/kb/974648/, and a high-level description of the bug at http://support.microsoft.com/kb/974985.

Categories:
Bugfixes | Corruption | LOB data

Here's an interesting bug that was just recently fixed, and someone actually hit it today, so it's time to blog it.

Bottom line is that if you're doing NOLOCK scans of tables involving LOB data, the perfmance might suck. 2005 SP3 CU3 has the bug fix, but unfortunately the KB article was written by someone who didn't seem to know what they were talking about, so I suggested it was rewritten, which it has been (but not republished yet). The original KB article is at http://support.microsoft.com/kb/961049/. It's also fixed in 2008, but I don't know which build (I believe CU4 at least).

Here's my explanation of the problem.

LOB data can be stored in-row or off-row (my previous LOB post Importance of choosing the right LOB storage technique has more details). When it's stored off-row, it must be accessed by first reading the pointer to the LOB data from the data/index record, and then following the pointer (remember that 'record' is synonymous with 'row'). When a LOB data value is updated, the off-row value is updated first, and then if the off-row link changed, the data/index record is updated with the new link. There's obvioulsy a window here, where someone reading the data/index record might see the wrong/non-existent off-row pointer. This is exactly what can happen when a NOLOCK scan occurs.

To mitigate the possibility of a NOLOCK scan trying to follow a bad off-row link, the old behavior was to scan all the IAM pages (see Inside the Storage Engine: IAM pages, IAM chains, and allocation units) for the table/index to make sure that the off-row link actually pointed to a page allocated to the table/index. If there are lots of IAM pages, this means lots of logical IOs, and poorly performing SELECT queries. And it does the scan once for *every row*. The person that hit it today had a 500 row select of ~20KB per row taking 20 seconds - 10MB of physical IOs and 30MB of logical IOs!

The fix is to make further use of an already existing in-memory cache of IAM pages to do a quick lookup of the right IAM page covering the GAM interval of the LOB page being read, without having to scan the whole IAM chain.

Hope this helps explain things, and track down perf problems for some of you.

Last week's survey was on how you should store large-value character data in SQL 2005+ (see here for the survey). Here are the result as of 4/3/2009 - and I think my favorite answer is starting to catch-on:

My favorite answer is, of course, it depends! For all those who didn't answer 'it depends', your answer is valid, but only for particular circumstances, as each method has its pros and cons and won't be applicable in all cases. It's extremely important when designing a schema to consider how to store LOB data, as making the wrong choice can lead to nasty performance issues (where 'performance' is a catch-all to include things like slow queries, fragmentation, and wasted space). Now I'd like to run through each of the options and detail what I think of as the pros and cons. A couple of definitions first: 'in-row' means the column value is stored in the data or index record with the other columns; 'out-of-row' or 'off-row' means the column value is stored in a text page somewhere in the data file(s), with a physical pointer stored in the data/index record (taking either 16 or 24 bytes itself).

  • As a N/CHAR column. This is a great choice when the data that's stored in the column is a fixed size all the time, and always uses the full width of the column. Any time that the data may be smaller than the defined wdith of the column, space is being wasted in the row. Wasted space leads to fewer rows per page, more disk space being used to store the data, more I/Os to read the data, and more memory used in the buffer pool. However, if the character values are very volatile, and can change size, then having a fixed-width column can avoid the problem of a row having to expand and there not being enough space on the page to allow that - leading to a fragmentation-causing page split in an index (or forwarding record in a heap). There's a tipping point that can be hard to identify for your particular application...
  • As a N/VARCHAR (1-8000) column. For data values less than 8000 bytes, this is the common choice as it avoids wasted space. However, if the application can change the size of the data after the initial creation of the row, then there is the possibility of fragmentation occuring through page-splits. In SQL Server 2005+, a row can also be created that is more than 8060 bytes - one or more variable-length columns is pushed into off-row storage and replaced by a physical pointer. This means any access of the column has to do an extra I/O to reach the data - and this is commonly a physical I/O as the text page is not already in memory. This can lead to hard-to-diagnose performance issues if a query selects the column and some rows have the data in-row, and some out-of-row. Also, if the data values tend towards the larger end of the 1-8000 byte spectrum, individual rows can become vary large, leading to very few rows per page - and the problems described in the first option. If the data isn't used very much, then storing it in-row like this isn't very efficient.
  • As a N/VARCHAR (MAX) column in-row. This has the same pros and cons as the option above, with the added benefit that the value can grow larger than 8000 bytes. In that case it will be pushed off-row automatically, and start incurring the extra I/O for each access. These data types also work with the intrinsic functions in the same way as the character data types discussed above. I guess one drawback of this type compared to FILESTREAM is that it's limited to 2GB. Also, if there's a LOB data column in the table definition, the table's clustered index cannot have online operations peformed on it - even if all the LOB values are NULL or stored in-row!
  • As a N/VARCHAR (MAX) column out-of-row. The drawback of storing this data out-of-row is that accessing it requires an extra I/O to retrieve it, but if the data isn't used very much then this is an efficient way to go, but still uses space in-row to store the off-row pointer. An additional benefit of storing the data off-row is that it can be placed in a separate filegroup, possibly on less expensive storage (e.g. RAID 5) - but then there's the drawback that it can't be moved after being created except with an export/import operation. This option has the same online operations drawback as storing the data in-row.
  • As a N/TEXT column in-row. This has the same pros and cons as the N/VARCHAR (MAX) column in-row option, but these data types are deprecated and don't work with the majority of the intrinsic functions.
  • As a N/TEXT column out-of-row. Same as above.
  • In a seperate table and JOIN to it when required. This option is great when the data isn't used very much, as it doesn't require any storage at all in the main table (except for a value to use for the JOIN), but it does require some extra up-front design and slightly more complicated queries. There's another HUGE benefit to doing this - by moving the LOB data to another table, online operations become available on the main table's clustered index. (This concept is 'vertical partitioning' a huge topic in itself...)
  • As a FILESTREAM column. (Yes, I didn't have this in the survey, but it's a possibility). If your data values are going to be more than 1MB, then you may want to consider using the FILESTREAM data type in SQL 2008 to allow much faster access to the data than having to read it through the buffer pool before giving it to the client. There are lots of pros and cons to using FILESTREAM - see my whitepaper for more info here.

So, as you can see, the best answer for a general question like this is definitely It Depends!. Although I haven't covered every facet of each storage option, the aim of this post is to show that it is very important to consider the implications of the method you choose, as it could lead to performance problems down the line.

Next post - this week's survey!

(And it's official - this is blog post 39 this month, making this my most prolific month yet for blogging. An arguably dubious achievement...)

In this week's survey, I'm interested in your views on the best way to store large-value character data. I'll report on the results next week (around 4/3/09).

Thanks!

Categories:
LOB data | Surveys

Both 2005 SP3 CU2 and 2005 SP2 CU12 contain fixes for two bugs that cause corruption in LOB data.

The first bug may happen when a column type is converted to varbinary(max), change the large-value-types-out-of-row option to true, insert a row, and then try to update it - described in KB 961648. The second bug may happen when you update an out-of-row LOB value in a database with snapshot isolation enabled - described in KB 962209. Either of these problems will result in a 7105 error:

Msg 7105, Level 22, State 11, Line 1
The Database ID Database ID, Page (N:N), slot N for LOB data type node does not exist. This is usually caused by transactions that can read uncommitted data on a data page. Run DBCC CHECKTABLE.

You can get the fixes at 2005 SP3 CU2 and 2005 SP2 CU12.

Theme design by Nukeation based on Jelle Druyts