Time flies when you're having fun, eh? This week Paul and I are in Dublin, Ireland delivering a deep technical "SQL Immersion Event" on Internals, Performance Tuning and Maintenance with our partner ProData (and having a blast - we have a really fun and very interactive group). And, today's an even more interesting day to be in Dublin as it's the 250th anniversary of the Guinness Brewery...everyone around Dublin will be toasting the most popular drink in Ireland today (Sept 24) at 5:59pm GMT. Unfortunately, our class runs until 6pm but Paul tells me that we're still going to celebrate. ;-)

For now, we'll just have to celebrate some SQL (I can hear minor grumblings everywhere :)...

Filtered indexes and filtered stats are an incredibly powerful feature in SQL Server 2008 - one that I've posted on before and will again. Today's post is about some of the gotchas with filtered index stats and filtered stats. Cutting to the punchline - you might have some trouble with their accuracy as your data changes. The SIMPLE solution is to update these specific stats manually and more frequently (meaning that you don't want to rely on the database option AUTO_UPDATE_STATISTICS). In fact, depending on your data volatility, you might want to update these statistics on a VERY frequent basis. Part of the question you'll have to answer is how much of the filtered set changes over the course of a day/week/month. And, in whatever frequency you reach 5-10% (let's say weekly), I'd suggest updating the statistics weekly. If you update 5-10% within the course of a day - then daily. Let me give you the full story...

Statistics are traditionally updated when roughly 20% (+ a minimum of 500 rows) of the data has changed. If you have a table with 100,000 rows then SQL Server will hit the threshold at 20,500 rows (just a tad over 20%) and a table with 10,000 rows will hit the threshold at 2,500 rows (which is 25%). Additionally, in SQL Server 200x (2000, 2005 and 2008) statistics are NOT immediately updated when the threshold is reached, instead they are invalidated. It's not until someone needs the statistic that SQL Server updates it. This reduces thrashing that occurred in SQL Server 7.0 when stats were updated immediately instead of just being invalidated. Another interesting point is what is meant by "20% of the data has changed?"... How is that defined? Is it based on updates to columns or inserts of rows? Of course the answer is... it depends - here, it depends on the version of SQL Server that you're using:

  • SQL Server 2000 defines 20% as 20% of the ROWS have changed. You can see this in sysindexes.rcmodctr.
  • SQL Server 2005/8 defines 20% as 20% of the COLUMN data has changed. You cannot see this unless you are accessing SQL Server through the DAC as it's in a base system table (2005: sysrowsetcolumns.rcmodified and for 2008: sysrscols.rcmodified).

Now, for regular indexes (those that do not include a filter) both strategies have pros and cons. Let's start with the con for SQL 2000... If you have a particularly volatile column then the rowmodctr will be increased quickly and then ALL statistics (even those that have not changed) are invalidated. Without a way to understand a column's volatility, you will invalidate some statistics too soon. So, 2005/2008 created a column modification counter to better define when a statistic should be invalidated. But, the con for SQL 2005/2008 (which is the pro for row-based algorithm of SQL 2000) is that a lot more data must change in a single column before the statistics get updated. If your modifications are fairly well balanced (across the table) then the row-based algorithm could suit you better. If your modifications are isolated (to a specific column) then the column-based algorithm will suit you better. But, neither is perfect... and, the new column-based algorithm poses another HUGE problem for filtered indexes.

For filtered indexes the threshold invalidation algorithm is tied SOLELY to the column and not just to the filtered set. So, if your table has 10,000 rows it takes 2,500 modifications in that column to update statistics. If your filtered index only has 1,000 rows... then you could theoretically modify this specific filtered set 2.5 times before IT would be updated. Yes, I realized that this sounds unlikely but it's really not. What if your filter is for ACTIVE = 1. It's only 1% of your table and it's the most active data set. You will effectively have to modify this set 20 times over before the statistics for your filtered index get updated.

But, having said that - I also have a simple solution. Don't wait... For filtered indexes (especially on volatile data) add a special UPDATE STATISTICS job that specifically updates their stats with a FULLSCAN. The good news is that they are likely to be smaller indexes (and therefore easier/less-costly statistics to update) and you'll clearly get a lot better uses out of them. Additional good news is that many filtered indexes are designed to be covering indexes, covering indexes aren't going to rely as heavily on statistics so there are many covered queries that won't care. The following query will get the list of all of your tables/indexes that have filters. Once you have this list you can create a loop to update statistics.

SELECT object_name(object_id) AS [Table Name]
       , name AS [Index Name]
       , stats_date(object_id, stats_id) AS [Last Updated]
FROM sys.
stats
WHERE has_filter = 1

So, there are lots of options and definitely a few things of which to be aware... filtered indexes and filtered stats are a powerful option to leverage, just make sure you keep those stats up to date!

Cheers... it's time for a pint of Guinness!

kt

PS - For more information on statistics, check out the SQL Server whitepapers:

OK, for years I've been saying that SQL Server doesn't care about the order in which you define the columns of your table because internally SQL Server will re-arrange your columns to store all of the fixed width columns first and the variable columns last. In both the fixed-width portion of the row as well as the variable-width portion of the row, the columns are defined in the order in which they are declared. So, what does matter?

It's all in the cost of the variable array's offset values. If the large majority of your NULLable records are at the end, then SQL Server doesn't need to completely populate the variable block array (which saves 2 bytes per column). If you have a table where 36 columns are NULLable and generally they are NULL, then defining those columns at the end of the row can save you space.

The following script will show you how the maximum length of the row changes based on whether or not a later column in the variable block is NOT NULL - even when most/all of the prior columns are!

CREATE TABLE RowSizeVariableBlock
(
ID
int NOT NULL identity
,
c01 char(10) NOT NULL default 'test'
,
c02 datetime2(7) NOT NULL default sysdatetime
(),
c03 char(80) NOT NULL default 'junk'
,
c04 varchar(100)
NULL,
c05 varchar(100)
NULL,
c06 varchar(100)
NULL,
c07 varchar(100)
NULL,
c08 varchar(100)
NULL,
c09 varchar(100)
NULL,
c10 varchar(100)
NULL,
c11 varchar(100)
NULL,
c12 varchar(100)
NULL,
c13 varchar(100)
NULL,
c14 varchar(100)
NULL,
c15 varchar(100)
NULL,
c16 varchar(100)
NULL,
c17 varchar(100)
NULL,
c18 varchar(100)
NULL,
c19 varchar(100)
NULL,
c20 varchar(100)
NULL,
c21 varchar(100)
NULL,
c22 varchar(100)
NULL,
c23 varchar(100)
NULL,
c24 varchar(100)
NULL,
c25 varchar(100)
NULL,
c26 varchar(100)
NULL,
c27 varchar(100)
NULL,
c28 varchar(100)
NULL,
c29 varchar(100)
NULL,
c30 varchar(100)
NULL,
c31 varchar(100)
NULL,
c32 varchar(100)
NULL,
c33 varchar(100)
NULL,
c34 varchar(100)
NULL,
c35 varchar(100)
NULL,
c36 varchar(100)
NULL,
c37 varchar(100)
NULL,
c38 varchar(100)
NULL,
c39 varchar(100)
NULL,
c40 varchar(100)
NULL
)
go

insert RowSizeVariableBlock DEFAULT VALUES
go

select * from RowSizeVariableBlock
go

select * from sys.dm_db_index_physical_stats
(db_id(), object_id('RowSizeVariableBlock'), null, null, 'detailed'
)
-- review "max" record size = 114
go

insert RowSizeVariableBlock (c01, c03, c20)
values ('med row', 'up to c20', 'test'
)
go

select * from RowSizeVariableBlock
go

select * from sys.dm_db_index_physical_stats
(db_id(), object_id('RowSizeVariableBlock'), null, null, 'detailed'
)
-- review "max" record size = 154
go

insert RowSizeVariableBlock (c01, c03, c30)
values ('med+ row', 'up to c30', 'test'
)
go

select * from RowSizeVariableBlock
go

select * from sys.dm_db_index_physical_stats
(db_id(), object_id('RowSizeVariableBlock'), null, null, 'detailed'
)
-- review "max" record size = 174
go

insert RowSizeVariableBlock (c01, c03, c40)
values ('large row', 'up to c40', 'test'
)
go

select * from RowSizeVariableBlock
go

select * from sys.dm_db_index_physical_stats
(db_id(), object_id('RowSizeVariableBlock'), null, null, 'detailed'
)
-- review "max" record size = 194
go

While there are some other optimizations at this level, most tables cannot benefit from this as the data populations aren't as predictable nor are most tables filled with so many variable-width and NULLable columns. However, if you do have this data pattern, defining these columns at the end of your table's definition - MIGHT save a tremendous amount of space, especially when this table is very large!

Paul's blogged more on these structures as well as the NULL bitmap here: http://www.sqlskills.com/BLOGS/PAUL/post/Misconceptions-around-null-bitmap-size.aspx.

Enjoy! And, thanks for reading,
kt

I've always been concerned with security and I've always stressed the importance of auditing the REAL user context not just the current user (see this post on EXECUTE AS and auditing). So, I generally try to avoid using dynamic string execution and if necessary create well tested/protected parameters (fyi - using QUOTENAME can be a fantasic solution to protectng identifiers as input parameters but it can't protect more complex strings).

Having said that, what if I'm looking at a database for the first time... just poking around trying to see if there's anything that needs further attention? I've come up with a quick query... And, while it's not going to "solve" your problem (as that's going to take some re-writing of code) or even truly verify if you're vulnerable, it gives you a "quick list" of where you should look first! If your code uses dynamic strings AND it's elevated - then start there! 

SELECT OBJECT_NAME(object_id) AS [Procedure Name],
  CASE
      WHEN sm.definition LIKE '%EXEC (%' OR sm.definition LIKE '%EXEC(%' THEN
'WARNING: code contains EXEC'
      WHEN sm.definition LIKE '%EXECUTE (%' OR sm.definition LIKE '%EXECUTE(%' THEN
'WARNING: code contains EXECUTE'
  END AS [Dynamic Strings]
,
  CASE
     
WHEN execute_as_principal_id IS NOT NULL THEN N'WARNING: EXECUTE AS ' + user_name(execute_as_principal_id
)
      ELSE
'Code to run as caller - check connection context'
  END AS [Execution Context Status]
FROM sys.sql_modules AS sm
ORDER BY [Procedure Name]

Is this enough? Anything else you'd check? What do you think?

THANKS!
kt

OK, I'll definitely take a beating from all of you for having gone so long between my survey posts and now. I won't even go into the details but between some crazy work schedules, multiple sinus problems and even migraines... well, I've been a bit behind. Let's just say that April/May were rough at best. I'm feeling better and well, now I'm trying to catch up. I had really gotten the blogging bug in March but I completely lost it in April. But, this tipping point series is in dire need of lots of explaining so I'm really hoping to get a few posts done in this area for sure!

First, I started the discussion around this in a few surveys:

Survey/Question 1

Q1 was described as this: if a table has 1 million rows at 20 rows per page (50,000 pages), at what percentage (roughly) of the data would a nonclustered index no longer be used. Blogged here. Here's what the survey said as of today:

And, for Q1 the correct result (Between 0-2% of the rows) is actually the best result (but, by no means the overwhelming majority at only 28%). However, often people just "think" the answer is very small. So... I did a few more questions/surveys. 

Survey/Question 2

Q2 was described as this: if a table has 1 million rows at 100 rows per page (10,000 pages), at what percentage (roughly) of the data would a nonclustered index no longer be used. Blogged here. Here's what the survey said as of today:

And, for Q2 the correct result (Less than .5% of the rows) is actually at a tie for the best (but, again, even a small percentage at only 22%). Again, often people just "think" the answer is very small. So... I did one more question/survey. 

Survey/Question 3

Q3 was described as this: if a table has 1 million rows at 2 rows per page (500,000 pages), at what percentage (roughly) of the data would a nonclustered index no longer be used. Blogged here. Here's what the survey said as of today:

And, for Q3 the correct result (Between 10-20% of the rows) is actually NOT the highest answer. And, this is even more convincing that there's confusion around what's going on and why.

The Tipping Point

What is the tipping point?

It's the point where the number of rows returned is "no longer selective enough". SQL Server chooses NOT to use the nonclustered index to look up the corresponding data rows and instead performs a table scan.

When does the tipping point occur?

It depends... it's MOSTLY tied to the number of pages in the table. Generally, around 30% of the number of PAGES in the table is about where the tipping point occurs. However, parallelism, some server settings (processor affinity and I/O affinity), memory and table size - all can have an impact. And, since it can vary - I typically estimate somewhere between 25% and 33% as a rough tipping point (and, you'll see from a bunch of my examples, that number is not EXACT). Then, I translate that into rows.

Math for Tipping Point Query 3: If a table has 500,000 pages then 25% = 125,000 and 33% = 166,000. So, somewhere between 125,000 and 166,000 ROWS the query will tip. Turning that into a percentage 125,000/1million = 12.5% and 166,000/1million = 16.6%. So, if a table has 500,000 pages (and 1 million rows) then queries that return less than 12.5% of the data are likely to USE the nonclustered index to lookup the data and queries over 16.6% of the data are LIKELY to use a table scan. For this table, that percentage seems "reasonable". But, most of us say that the tipping point happens at a much lower percentage... why? Because row size - which determines table size (and therefore pages) is really what has the greatest impact. So, let's look at Tipping Point Query 2... 

Math for Tipping Point Query 2: If a table has 10,000 pages then 25% = 2,500 and 33% = 3,333. So, somewhere between 2,500 and 3,333 ROWS the query will tip. Turning that into a percentage 2,500/1million = .25% and 3,333/1million = .33% (not even 1%). So, if a table has only 10,000 pages (and 1 million rows) then queries that return less than a quarter of 1% of the data are likely to USE the nonclustered index to lookup the data and queries over one third of one percent are LIKELY to use a table scan. For this table, that percentage seems really low BUT, at the same time it makes sense (to a point) that a small table would be scanned... but, for less than 1%. 1% is NOT selective enough. For small tables, it might not matter all that much (they're small, they fit in cache, etc.) but for bigger tables - it might be a big performance problem. 

Math for Tipping Point Query 1: If a table has 50,000 pages then 25% = 12,500 and 33% = 16,666. So, somewhere between 12,500 and 16,666 ROWS the query will tip. Turning that into a percentage 12,500/1million = 1.25% and 16,666/1million = 1.66% (under 2%). So, if a table has 50,000 pages (and 1 million rows) then queries that return less than 1.25% of the data are likely to USE the nonclustered index to lookup the data and queries over 1.66% are LIKELY to use a table scan. Again, this seems like a low number. Again, for small tables, it might not matter all that much (they're small, they fit in cache, etc.) but as tables get larger and larger - it CAN be a big performance problem. 

Why is the tipping point interesting?

  • It shows that narrow (non-covering) nonclustered indexes have fewer uses than often expected (just because a query has a column in the WHERE clause doesn't mean that SQL Server's going to use that index)
  • It happens at a point that's typically MUCH earlier than expected... and, in fact, sometimes this is a VERY bad thing!
  • Only nonclustered indexes that do not cover a query have a tipping point. Covering indexes don't have this same issue (which further proves why they're so important for performance tuning)
  • You might find larger tables/queries performing table scans when in fact, it might be better to use a nonclustered index. How do you know, how do you test, how do you hint and/or force... and, is that a good thing?

Real example of an interesting tipping point

Earlier today, I went on facebook and twitter and gave the following information - very vaguely - and I asked "why" is Q2 so much slower than Q1 if Q2 returns only 10 more rows. Same table and no hints (other than MAXDOP)...

Q1: SELECT * FROM table WHERE colx < 597420 OPTION (MAXDOP 1)

  • returns 197,419 rows
  • takes 116,031 ms (1 minute, 52 seconds)
  • 1,197,700 logical reads, 5 physical reads, 137,861 read-ahead reads
  • 7,562 ms CPU time

    Q2: SELECT * FROM table WHERE colx < 597430 OPTION (MAXDOP 1)

  • returns 197,429 rows
  • takes 244,094 ms (4 minutes, 4 seconds)
  • 801,685 logical reads, 1410 physical reads, 801,678 read-ahead reads
  • 9,188 ms CPU time

There were lots of great guesses... but, it's the tipping point. SQL Server chose to "tip" the second query because it was "over the line". But, it's important to realize that there are cases when that's NOT a good idea. And, what are your options?

In SQL Server 2005 - the only option is to force the nonclustered index to be used:

Q2: SELECT * FROM table WITH (INDEX (NCInd)) WHERE colx < 597430 OPTION (MAXDOP 1)

But, this can be TERRIBLY bad on some machines where the IOs could be a lot faster (and where data might already be in cache). These specific numbers are exactly that - specific to this HARDWARE (and, I chose not-so-optimal HW in this case to highlight this problem). And, depending on what number you use (what if this is a parameter in sps?) you might force SQL Server to do WAY more IOs by forcing the index than allowing the tipping point to do its job. But, depending on your hardware (and/or what you know to be in cache at the time of execution), it might be better to force an index instead of letting SQL Server choose. So, should I force the index? Be careful, if you're wrong - it could take more time and actually be slower.

In SQL Server 2008 - there's a new hint - FORCESEEK:

Q2: SELECT * FROM table WITH (INDEX (FORCESEEK)) WHERE colx < 597430 OPTION (MAXDOP 1)

FORCESEEK is better because it doesn't tie you to a particular index directly but it also doesn't let SQL Server tip to a table scan. However, just like forcing an index - you can be wrong!

So, what should you do? It depends. If you know your data well and you do some extensive testing you might consider using a hint (there are some clever things you can do programmatically in sps, I'll try and dedicate a post to this soon). However, a much better choice (if at all possible) is to consider covering (that's really my main point :). In my queries, covering is unrealistic because my queries want all columns (the evil SELECT *) but, if your queries are narrower AND they are high-priority, you are better off with a covering index (in many cases) over a hint because an index which covers a query, never tips.

That's the answer to the puzzle for now but there's definitely a lot more to dive into. The Tipping Point can be a very good thing - and it usually works well. But, if you're finding that you can force an index and get better performance you might want to do some investigating and see if it's this. Then consider how likely a hint is to help and now you know where you can focus.

Thanks for reading,
kt

Recently, the SQL Server 2008 Internals title was released (and only in the past few days have people actually received their copies)! In fact, I still haven't seen the book in person... soon though!

Anyway, Kalen, Paul, Conor, Adam and I worked to create a comprehensive resource on SQL Server 2008 internals and to supplement the written content, many of us created demo scripts and examples. I've now gone back and created a sample script based on ALL of the code in the entire chapter (and in many cases I extended the code in the samples). And, while this updated content is going to be located on the companion content site, I thought I’d also release it here with some notes.

The zip contains a solution with 3 projects, each with a few scripts:

I know the names seem a bit strange but everything is ordered EXACTLY as it is shown in the book. And, in the book, I referenced "a" script called EmployeeCaseStudy-AnalyzeStructures.sql but that script was so big that I broke it down into 7 parts (hence the naming convention of 06...01, 06...02, etc.). Regardless, each script contains a brief header and a few details about the script. To get the most from the script, do not just open the script and execute it. If you really work your way through the script, you should see all of the comments and they will help you to make instance specific changes so that everything runs without error. Just take your time and really try to step back and think about each command (and what you're expecting the output to be) to test yourself while your working through the results. Taking your time and really grapsing all of these internals is what makes it fun!

Quite a bit of this content can stand alone but it's really best as companion content to the title as there's a lot more "text" and detailed information in the book. But, the scripts are really a great way to dive deeper, learn documented/undocumented commands and really get to know what the heck is really going on internally.

Finally, I only worked on Chapter 6 so here it is: 20090810-IndexInternals-Chapter6-Resources.zip (6.13 mb). As for the other companion content, you'll need to get the links from the book. Ah, or from Kalen :). 

And, certainly, if you find a typo or anything that you think needs more clarification, let me know! I'm more than happy to post updates to this content. 

Have fun,
kt

UPDATES/ERRORLOG:
2009-Apr-13 (8am): Updated the zip after remembering in my sleep (yes, sad, but true!) that one of my comments that referenced some line numbers didn't get updated in the final version. So, no errors and if you don't get this update, it's not going to break anything. But, the script that's been tweaked is script: 05_EmployeeCaseStudy-TableDefinition.sql. Enjoy! kt
2009-Apr-13 (4:30pm): Ha... guess what arrived today. Yes - our copies of the book. Wow, it's great to see it in person. Again, enjoy!
2009-Aug-10: Added a :CONNECT option inside the IndexInternals restore script AND, cleaned up the zip as it had an extra copy of the IndexInternals database in it.

A couple of weeks ago I wrote a blog post titled Whose job is it anyway? It's an interesting debate and something I've been hearing more and more - that SQL Server is a "set it and forget it" technology - a black box where you just don't need to know how it works to do well with it. In fact, I've even had a few folks comment that they think it would be better to "roll their own" database rather than have to learn how to work in a "general purpose" database. And, while there are certainly lots of different angle to this debate - one fact remains... if you don't know anything about the database on which you're developing (whether it's SQL Server, mySQL, Oracle, whatever), I *PROMISE* you won't have a truly scalable, optimal solution. Why do you think there are so many knobs? It's because there are so many different ways to work with data. There is more than one way to query, more than one way to design. This is also why every answer to a "how should I do this" question starts with "It depends". And, while that seems like a scary response it's actually a good one. It means that you have lots of options - options that can offer many different pros/cons. And, as a result of knowing these pros/cons, you can make better decisions - decisions that will ultimately determine how well you can scale.

So..... while I don't think this debate will EVER be finished (as to WHOSE job it is to know these things), I do think a lot of folks are seeing the effects of not knowing more about their store (and, again, this is NOT limited to SQL Server in any way, shape or form).

At a minimum, hear the discussion on RunAsRadio with Richard, Greg and I and let us know what you think!

Kim Tripp on the Roles of Developers and DBAs with the Database!

Cheers,
kt

I started the series here: http://www.sqlskills.com/BLOGS/KIMBERLY/post/Spring-cleaning-your-indexes-Part-I.aspx and I want to continue with Part II today by clarifying some great questions/comments that have come up on the series. In Part III, I'll give you a few more ways to get rid of (or consolidate) indexes. And, I think there's still a bit more that Paul and I will investigate further (wrt to operational stats) but, I want to address a few comments and a few interesting things that both Paul and I have found.

In the Part I post, I talked about using sys.dm_db_index_usage_stats to see if there are any indexes that just aren't being used at all... A few comments asked why I didn't use operational_stats instead. To address that first, there are a few key differences:

  • dm_db_index_operational_stats is persisted only as long as an object is in cache (however, it's not cleared when objects are forced out of cache with DBCC DROPCLEANBUFFERS). If you want to clear ALL DMVs for a specific database, then a relatively easy way to do this (IN TESTING) is to take the database offline and then immediately bring it online again.
    • ALTER DATABASE <dbname> SET OFFLINE
    • ALTER DATABASE <dbname> SET ONLINE
      • NOTE: If there are any suspect files, you will NOT be able to bring your database back online without FIRST taking all suspect files OFFLINE. And, if you take a FILE offline then it's even more important to know that THERE IS NO WAY TO BRING A FILE ONLINE without restoring it from backups. So, it's VERY important to understand that OFFLINE/ONLINE for a database is really easy IF AND ONLY IF there are no other problems with the DB. You really need to resolve those problems first (or at least know that you're going to need to resolve those problems later through backup/restore) before you take a database offline.
  • dm_db_index_operational_stats is (from BOL) neither persistent nor transactionally consistent. This means you cannot use these counters to determine whether an index has been used or not, or when the index was last used. For information about this, see sys.dm_db_index_usage_stats.

Having said that though, none of these are really any guarantee of perfect information. And, they're not meant to be. I look at these DMVs as being a quick and easy way to get some relatively descent insight into what is or is not happening in my environement. However, even though the sys.dm_db_index_operational_stats might give you insight that you have a problem it still doesn't give you good insight into exactly what that problem might be. For example, it *might* be splits that causes some of your wait times to increase (column: page_io_latch_wait_in_ms) but, it could be something else too (some other system issue).

The main point, you can use these to get insight into which tables have the biggest problems (i.e. the biggest waits) and where they might have a lot of splits (column:nonleaf_allocation_count) but, in all honesty, that's not a guarantee. In fact, the reason I said "might" is that pages that are allocated at the end of the leaf level STILL allocate a page and require an entry to be made in the next level up in the index. So, a lot of nonleaf_allocations COULD be for a perfectly unfragmented index. So, it still doesn't tell you how fragmented the objects are or what the REAL problem is (or even if it is a problem yet).

Basically, these just tell you where you have the most activity and give you a starting point for problem solving. But, none of these (usage or operational) really tell you how to solve the problem. However, sometimes even knowing where to start IS the problem in and of itself. So, I'm not against these DMVs and I really do think you can get some good insight from them. Just use them as a tool to help focus your investigations. Use better tools like sys.dm_db_index_physical_stats to really see if you have fragmentation and where it's the worst.

hth,
kt

First and foremost, happy spring! I truly hope we're on the path to summer (although who would know it here - we're in Florida for SQL Connections and the weather is a bit chilly and it's been raining off/on today - I hope this is short term (no, I don't want to look at the forecast as I don't want to jinx it :)). But, wherever you are - I hope you're on your way to nicer weather and minimal cold weather (ok, I guess I only have warm wishes for the northern hem... for you southies - I hope your fall is lovely!!).

But, for everyone - now's a good time to start thinking about cleaning out the [non-literal index] closet... and getting rid of some of those dusty indexes?

Why/when should you get rid of some of your indexes?

  1. It's possible that some of them aren't being used at all...
  2. Especially when they're not being used but even when they're "redundant" (or minimally useful) they're still costing you in many ways:
    1. Wasting space on disk
    2. Wasting space in memory (well, if they are being used then they're cluttering up your cache)
    3. Wasting space/time in your maintenance routines (so, here they're cluttering up your cache for sure!)
    4. Wasting space in your backups
  3. You might be able to reduce your overall indexes with index consolidation...

So, for this post, I'm going to target #1 - are there any indexes that just aren't being used at all...

First, how do you know if your indexes are being used?
In SQL Server 2005 and higher, there's a DMV (dynamic management view) called sys.dm_db_index_usage_stats and it's there to track index usage patterns. However, it's not persisted since the beginning of time and as a result, if you look at this and believe that it's telling you ALL of the indexes that have been used in your database - then you might be mistaken. The index usage stats DMV is cleared when SQL Server is restarted as well as when you detach/attach the database or when you backup/restore the database.

Therefore, you don't want to just run the following query and drop all of the indexes that aren't being used. A better way to "trust" this information is to periodically persist the data from the DMV in your own table and then query it after you've completed a business cycle's worth of activity - logging all of the usage stats. Then you can trust this much more. Again, here are a couple of negatives:
1) it's not persisted
2) it only keeps the database_id and the index_id (which could change over time). You're right in thinking it probably shouldn't change but, a nonclustered index's ID is not permanent so, it's better to track the index name in addition to the index_id. And, when you run your queries to determine what to delete, you can easily verify indexes against the current indexes because your comparison is within the same db (more on this below).

If you want to persist this, then you have two ways to do this:

Store the index usage patterns in a table within the specific database you're tracking:

Pro: it goes with the database when you back it up, etc. and, it's easier to reverse engineer which actual indexes you're referring to (grabbing the names and not just the IDs).

Con: it's a bit more complex of a query to run and you'll need to run it for all of your databases (ok, it's really not all that bad - but, using something like sp_msForEachDB will really help)

Store the index usage patterns from all databases in a table within master or your own "performance database":

Pro: you only need one job to handle all the index usage info AND object_name *does* support TWO parameters (object_id and database_id) so, as long as you trap the name at the time of insert then you'll be good.

Con: it doesn't go with the database (e.g. backup/restore - and if you're restoring to a test system and you want to see what the usage patterns were then you'll need to get this information as well...)

Here's a simple query that you can run that shows all the indexes used right now - and adds the databasename/objectname into the results - in a persisted table you'll also want to add the runtime:

SELECT getdate() AS RunTime
, DB_NAME(i.database_id) as DatabaseName
, OBJECT_NAME(i.object_id, i.database_id) as ObjectName
, *
FROM sys.dm_db_index_usage_stats AS i
WHERE object_id > 100

And, if you want to get a few more insghts into how to persist this on Paul's blog here: http://www.sqlskills.com/BLOGS/PAUL/post/Indexes-From-Every-Angle-How-can-you-tell-if-an-index-is-being-used.aspx.

OK, so, you have a few options to think about and I have a few more parts to post!
kt

PS - The Tipping Point is coming soon too. I'm still adding a few things to that one!!! ;-)

This is a tough topic. It's a big topic and more than any other - I think there are a lot of misunderstandings about what the log is for, why it's so critical and ESPECIALLY when/why it gets extrememly large. Simply put, it gets large when it's not managed correctly. OK, there are times when it can become large - even if it is well managed. But, more often than not, when a transaction log is wildly out of control (orders of magnitude larger than the data itself) it indicates a management/maintenance problem.

There are a lot of places where you can go to find out the technical details behind the transaction log but I'm going to target this blog post to the relatively straightforward easy (no, really easy!!) facts about transaction log maintenance.

What kind of transaction log management is right for YOUR database?

First and foremost, you MUST decide whether or not you need to do log backups. SQL Server *requires* you to make some form of decision. Well, I take that back. They don't tell you anywhere that you need to make this decision but the transaction log can get wildly out of control if you don't (see the next section for more details on this one :)).

Why? Transaction log backups will allow you better recovery options in the event of a disaster. If you create a good backup strategy, you should be able to recover from a disaster very close (possibly even up-to-the-minute) to the time of the disaster. Howevre, you are not required to do log backups. Instead you can do only database-level backups and recover with those. That's fine. There's really nothing wrong with that strategy. However, it does mean that you have a greater potential for data loss. Basically, if you decide that you're doing to do weekly full backups - then you need to be OK with losing everything that's happened since your last full backup. If that's OK, then performing full database backups (and never worrying about the log) is absolutely fine.

However, if you want more granular control and more recovery options (again, possibly even up-to-the-minute recovery - which is transactional recovery up to the time of the disaster), then you MUST add transaction log backups into your disaster recovery strategy.

So, make this decision FIRST:

  1. Am I OK with some data loss? (then you're probably OK with just database-level backups... but, you will need to do something else! be sure to keep reading!!!)
  2. Do I want to minimize data loss to the smallest amount possible? (then you're going to want to AUTOMATE transaction log backups)

But I didn't do anything - why is the log WAY out of control (in terms of size)?

OK, even if you consciously make the decision to ONLY do database-level backups, you are NOT DONE!!! In fact, this is actually what led me to do this post. I found these two (relatively dated but interesting nonetheless) MSDN forum discussions for TFS (Team Foundation Server) databases:

    MSDN Forum discussion "Recommended SQL Maintenance Plan": http://social.msdn.microsoft.com/forums/en-US/tfsadmin/thread/b23f7018-3eaa-4596-96e4-728b02cf6211/ 
    MSDN Forum discussion "Huge log files":
http://social.msdn.microsoft.com/forums/en-US/tfsadmin/thread/605d51f7-23fd-470c-945e-53fa7ed5aa87/

And, I know EXACTLY what happened in ALL of these cases (and MANY more... Paul and I see this ALL the time, in fact). In the "Huge log files" thread, there's a database mentioned (TfsWareHouse) with a 124MB mdf and a transaction log of 61.8GB. It didn't mention whether or not there were other data files but my guess is that there weren't. My guess is that they were completely shocked by why the data portion had grown to a size that's 510 TIMES the size of the database... The reason is actually somewhat simple (no pun intended). If you're not going to do transaction log maintenance (meaning transaction log backups), then you need to tell SQL Server that. (This is the part that's completely unexpected.)

When a database is created, SQL Server runs that database in a "pseudo simple recovery model". (Yes, I know - that didn't help.) What that means is that SQL Server automatically clears inactive records from the transaction log once it knows that it no longer needs them. It no longer needs them to be stored in the log because no one is using the log (i.e. you're not doing ANY backups). However, once you do start to do backups (and, people generally start by doing a full database backup), then SQL Server looks to your recovery model to determine what to do with log records. If the recovery model is set to full (and, yes, this is the default), then SQL Server gives you the "full feature set" with regard to backup/restore. SQL Server is expecting YOU to manage the transaction log by backing it up. Once it's backed up, SQL Server can remove the inactive records from the transaction log (and when you do a transaction log backup, it automatically clears the inactive records by default).

So, there are really two choices - and ONLY two choices here:

  1. Perform transaction log backups as part of your maintenance plan
  2. Change the recovery model to the SIMPLE recovery model so that SQL Server clears inactive transactions from the log automatically

Is there anything else to do for the transaction log? 

Yes! If you decide that you want to do transaction log backups then I would recommend a few things. I'd first recommend reading 8 Steps to Better Transaction Log Throughput and when you decide how large your transaction log needs to be, then also read Transaction Log VLFs - too many or too few?. These two posts will help you to create a more appropriately sized log as well as one that won't be prone to performance problems (such as internal VLF fragmentation).

If you want to learn more about the transaction log, I'd suggest a few of Paul's resources (it's probably because he has such a fantastic tech editor... oh, I'm asking for trouble with this comment!! ;-):

  1. Read Paul’s blog post to his TechNet article on Logging & Recovery. It’s a great article that covers a lot of different aspects of logging. He also did a great short video on why the transaction log grows wildly out of control. Here’s a link to the blog post that pulls all of the TechNet resources together: http://www.sqlskills.com/BLOGS/PAUL/post/TechNet-Magazine-feature-article-on-understanding-logging-and-recovery.aspx.
  2. Read Paul’s blog post to his TechNet article on Database Maintenance. It’s a great overview of all of these maintenance tasks and will give you a good overview of what each one does. Here’s a link to the blog post that pulls all of the TechNet resources together: http://www.sqlskills.com/BLOGS/PAUL/post/TechNet-Magazine-Effective-Database-Maintenance-article-and-August-SQL-QA-column.aspx

OK, so, I think that sums up part III. I think that's the last one in the series for now. I'll go through and explain "The Tipping Point" next. However, I was hoping for more results to my brain teasers (in those two posts)!!

Cheers,
kt

OK, it seems as though there's A LOT of confusion about what steps are required for proper database maintenance. And, it seems as though some recommendations are being given as "quick fixes" without any real recommendation for root cause analysis. I'm not saying that the generalizations are horribly wrong but in many cases they're just too broad and/or unspecific to actually be useful (and, well, in all honesty, some of them are just really bad recommendations because they’re so ambiguous). And, in my random internet trolling for the day, I found 4 different references that I want to go through (which is why this is only Part I). For this post, I’ll focus just on Sharepoint.

First, what did I see that’s motivating this post?
I found the following KB article – which was referenced by numerous sites as recommended reading. And, without knowing a lot about SQL (and, that’s NOT meant as a dig at all – it’s just that most apps that sit on SQL don’t ever even recommend that you need to know SQL and I can argue certain aspects of that point as well BUT, wrt to maintenance, it can really become a problem if you don't know a few things about these tasks), it does seem like good reading: Information about the Maintenance Plan Wizard in SQL Server 2005 and about tasks that administrators can perform against SharePoint databases

Here is the part that over-simplifies picking what maintenance tasks to run vs. what not to run:

DIRECTLY TAKEN FROM THIS KB ARTICLE IS THE FOLLOWING:

We have tested these tasks and the effects that these tasks have on database schema and performance. The following table summarizes the results of these tests.

Task                                        Safe to perform this task?
Check database Yes
Reduce a database Yes
Reorganize an index Yes
Clean up the history Yes
Update statistics Yes. However, this task is unnecessary because the SharePoint Timer service performs this task automatically.
Rebuild an index No. The task does not restore existing index options before the rebuild operation. However, you can use scripts that restore index options.
Note This problem was corrected in SQL Server 2005 Service Pack 2.

We used the following criteria to determine whether a task was safe to perform:

  • Whether the task modified the database schema from its default state
  • Whether the task decreased performanceResults may vary depending on the environment.

However, if you use the Maintenance Plan Wizard to perform the tasks that are listed in the table as "safe to perform," you are likely to experience increased performance in SQL Server 2005.

The big problem is: this is just too little information about too many VERY important tasks!

Let me break this down task by task and give you a few other places to go for more information.

Check database

The check database task refers to DBCC CHECKDB. This is definitely an important part of any maintenance plan. And, it really is a safe task to run as it’s NOT corrective by default. However, there is nothing mentioned about how this command may completely flush your buffer pool as it reads all of the pages of all of the objects it’s checking. So, this might impact performance but, of all of the tasks, this is the safest to run and it’s definitely a recommended task.

If you want to learn more about DBCC CHECKDB, check out these things: 

  1. Read Paul’s blog post to his TechNet article on Database Maintenance. It’s a great overview of all of these maintenance tasks and will give you a good overview of what each one does. Here’s a link to the blog post that pulls all of the TechNet resources together: http://www.sqlskills.com/BLOGS/PAUL/post/TechNet-Magazine-Effective-Database-Maintenance-article-and-August-SQL-QA-column.aspx  
  2. Read Paul’s blog post on Myths around causing corruption – so that you can get better insight into where/why the actual corruptions are occurring. 
  3. Finally, if you’re really interested in the internals of CHECKDB and how it works – Paul’s written a ton about it in his CHECKDB from Every Angle category. FYI, 3 of his 9 years on the SQL Server Development Team were spent writing CHECKDB and repair for SQL Server 2005 (so, he definitely knows how it works J). Here’s the link to the category: http://www.sqlskills.com/BLOGS/PAUL/category/CHECKDB-From-Every-Angle.aspx

Reduce a database 

OK, I’m sure I’ll get a lot of responses to this one but IMO, a database maintenance plan SHOULD NEVER INCLUDE A SHRINK.

Let me explain… J
To be honest, I'm not even a fan of manually running database-level shrinks (DBCC SHRINKDATABASE) either. Don't get me wrong - there are ACCEPTABLE times to shrink parts of a database but, in general, I'd recommend only using DBCC SHRINKFILE for individual file-level shrinks. I wouldn't schedule shrinks nor would I EVER turn on [the database option] autoshrink. I don't think shrinks should EVER be automated - either through the database option OR through maintenance plans.

If you need to do regular shrinks - then it's likely that you have some other problem. And, without DIRECTLY addressing this problem, you *might* be making things worse.

This is a bigger discussion and I’ve found a few other references that I want to pull together. I’ll post another post about this within the next day or so – and link to it from here BUT, for right now…Know this – free space is generally GOOD. Excessive free space has happened FOR A REASON. Maybe there’s a pattern to it but often shrinking is worse than just leaving the free space for the next data explosion (a bunch of data comes in, the database grows, the data is archived, the free space remains for the next set of data that comes in).

If you shrink the database you might make things worse by fragmenting everything. Paul’s video that goes with the TechNet article on Database Maintenance shows you the [shocking if you didn’t know this] effect of shrinking a database on indexes.

Reorganize an index, Update statistics and Rebuild an index

These need to be grouped together to start because this KB article does NOT address the impact of running these together. In fact, the problem, if you run these together – is that you MIGHT make things worse. First, let me give you an overview of each:

Reoganizing an index removes fragmentation in the largest part of an index (it’s called the leaf level of the index) and removing fragmentation in this level has the greatest (and positive) effect on range query scans and cache. So, this is really the most important type of fragmentation to remove. However, this is NOT the only way to do it… 

Rebuilding an index completely and totally removes ALL forms of fragmentation in all levels of an index; however, this is the most expensive (yet most effective) way to do it. As a result of rebuilding an index, SQL Server also updates the statistics for the indexes that were rebuilt. Therefore you do not need to update statistics OR reorganize an index if it gets rebuilt.

Updating statistics is important for query processing and optimization. The query processor uses statistics on your data to help determine how many rows will be processed by your query/statement. If SQL Server can accurately estimate the rows, then it can choose a more effective plan. However, if it doesn’t have good statistics, then it may not do as good of a job at accurately estimating rows and therefore it might not come up with as optimal of a plan. So, this is an integral part to good database health. However, some of this might be done via the database option: auto update statistics which is ON by default in SQL Server (and, YES, you should leave this on). Check out this post on: Auto update statistics and auto create statistics - should you leave them on and/or turn them on??

However, if you use a maintenance plan then I really see two problems: 

  1. You’ll end up doing maintenance on things that may not need it. The default behavior for these tasks is just to run them on the selected objects. And, since many people will choose all objects (possibly even of all databases) then you’ll probably select objects that won’t really need this as frequently as you run this maintenance plan. 
  2. You might end up running a combination of things that either – wastes cycles/CPU and a MASSIVE amount of log space (which can translate into all sorts of concerns for DR technologies like database mirroring which will need to send all log rows to the secondary server). For example, if you run ALL three of these things then they’ll have to be run in a certain order (you can change this in a maintenance plan). However, the default order is: Reorganize Index(es), Rebuid Index(es), Update Statistics. This means that the work that’s done by reorganizing is effectively wasted as the rebuild would have taken care of it AND the work that’s done for updating statistics could mean that they update statistics TWICE (during the rebuild AND after) and the end result is WORSE because the update statistics command might use a sampling mechanism to generate statistics (which can lead to LESS EFFECTIVE statistics information). However, this is ONLY if you change the wizard’s default. The default is for the updating statistics command to do a “full scan”. So, even if the statistics will end up being the same – it’s still problematic because it means that for all indexes you’ve just rebuilt – you’ve now updated their statistics TWICE.

SUMMARY

A database maintenance plan is CRITICAL for best performance (especially for databases that are prone to some of the problems corrected by these maintenance tasks (yes, you can read SHAREPOINT into that statement). Sharepoint uses GUIDs as PRIMARY KEYs (read this post to hear more about the side effects of this choice) and as a result, as clustering keys. This means that many Sharepoint tables are prone to [potentially a MASSIVE amount of] fragmentation.

You absolutely need to have a maintenance plan. But, what should it be?

My absolute preference is to NOT use the Database Maintenance Plan Wizard UNLESS you really know what you’re doing. It just doesn’t give enough prescriptive advice. And, if you just select the defaults, you will end up with an inoptimal maintenance plan.

A better approach would be to create your own maintenance plan. If you write the code yourself (or leverage one of the custom ones that are already out on the web) then you can strategically target ONLY the objects that have the warning signs and/or are out of date and you can set when to rebuild vs. when to reorganize (generally people rebuild if a table has more than 30% fragentation and they reorganize when it's less than 30%). Fragmentation is something that can be detected programmatically using the DMV: dm_db_index_physical_stats (in SQL 2005/2008) or by using DBCC SHOWCONTIG (in SQL 7.0/2000). Here are a few places to go to see the more flexible and programmatic way of rebuilding/reorganizing indexes:

Smart Indexing Part II - Conditional Rebuilding a blog post (with conditional index rebuild code) from SQLMCA Bob Duffy (a good friend who is located in Dublin, Ireland and whose wife (Carmel) just had a baby last week – congrats again Bob!! You guys are seriously outnumbered now!!!) here: http://blogs.msdn.com/boduff/archive/2007/06/08/smart-indexing-part-ii-conditional-rebuilding.aspx

Custom Index Defrag / Rebuild Procedures - a blog post with some posted code as well. http://www.sqlstuff.info/post/2008/03/Custom-Index-Defrag--Rebuild-Procedures.aspx

Rebuild and Reorganize Indexes in SQL 2005 – an article (with conditional index rebuild code) from SQL Server Central here: http://www.sqlservercentral.com/scripts/31857/  (NOTE: You will need to become a subscriber to get to this article.)

Rebuild Only the Indexes that Need Help - an article by Andrew Kelly (SQL MVP) on SQL Server Magazine here: http://www.sqlmag.com/articles/index.cfm?articleid=99019&pg=1 (NOTE: You will need to become a suscriber to get to the full text of the article.)

Or, build your own! Check out the BOL topic for the sys.dm_db_index_physical_stats for SQL 2005 here: http://msdn.microsoft.com/en-us/library/ms188917(SQL.90).aspx, Example D has sample code to help you get started! For SQL 2008 it’s here: http://msdn.microsoft.com/en-us/library/ms188917.aspx. It’s still Example D for the sample code to leverage. J

The most important thing I can tell you is that a SMALL amount of time getting familiar with what’s really happening in SQL as well as WHY it’s happening to you WILL BE A LOT MORE PRODUCTIVE then just slamming in a maintenance plan that solves some problems but probably creates others.

Hope this helps!
kt

OK, so this is interesting. I've got a few answers to my last survey (Tipping Point Query #1) and well, there's a good mix of answers (and, yes, some are correct! ;)). Be sure to go back and review that last post so that you can evaluate it and these two tipping point questions completely. So, now I want to see if people really know the basis of "the tipping point".

Try these two:

Tipping Point Query #2

Table1 (t1) has 1 million rows at 100 rows per page. The table has 10,000 pages. A nonclustered index exists (on name) but it does not cover the query. At what percentage (of the table) is this nonclustered index no longer selective enough to use:

Tipping Point Query #3

Table2 (t2) has 1 million rows at 2 rows per page. The table has 500,000 pages. A nonclustered index exists (on name) but it does not cover the query. At what percentage (of the table) is this nonclustered index no longer selective enough to use:

OK, so I'd really love to see quite a few responses to these *3* "tipping point" questions. I PROMISE to do a nice long (and detailed) post for what is the actual tipping point AND the answers to all three of these questions. I'll explain the math as well as how you can generalize "what is selective enough" so that you can better create your nonclustered indexes!!!

Thanks for reading - and responding to these brain teasers!!

Cheers,
kt

PS - It's snowing here (ah...again)... maybe I'll spend the day creating brain teasers??! Do you guys like this kind of a post? (well, I suppose you won't really know until I post the answer part of it... but, just in general??). I think it's pretty cool. But, don't worry, I won't (nor will Paul) make all of my posts surveys. But, I think this is a really good one. I'm anxious to see if the asnwers come in correctly for these two as well! Have at it!

Along the same lines of improving database design and getting better performance on SQL Server (which [IMO] DOES take an experienced SQL Server database developer - but, we'll talk more about "whose job this really is" in many more posts and probably even a RunAs - which Richard and I just setup to record on Thursday (Mar 12)), I started thinking about how I could convince people of why they NEED a database developer. So, I thought I'd ask this VERY important question...

What percentage of data IS selective enough to use a nonclustered index which doesn't cover the query... in other words (just in case you're not entirely sure of what I mean :)), think of indexes in the back of a book... if you need to go to the back of the book to reference a bunch of data (this is called a [bookmark] lookup in SQL Server), there's a point where the randomness of the lookups (especially if you think in terms of many rows on a page) becomes too expensive. For example, imagine that the index is customer name and the data (the book) is customer orders - and, each page (of this rather weird book ;)), has 20 orders on it. Doing a query to lookup customer number 12's orders might be really easy (if they only have only a few orders) BUT, what if the query is "show me all of the orders for people that have an 'e' in their name". First, the number of people have have an 'e' in their name is probably better than 50% (that's TOTALLY a guess) and, if there's 20 orders per page then a lookup from the index into the book would require SQL Server to touch every page roughly 10 times. If the table has 50,000 pages (therefore 1 million rows - at 20 rows per page), then to find the 500,000 rows (remember, I'm estimating half), SQL Server would have to do 500,000 bookmark lookups. For a table with only 50,000 pages that's terribly expensive.

So, here's the question - what's the tipping point? When is a nonclustered index on customer name NOT going to be used to lookup rows of sales orders? I'm going to use a survey to see what you think and then within a week, I'll give the specific SQL Server math AND a query you can run within your own DBs to see EVERY one of your table's "tipping points". It's really interesting and I think will really help you to understand why SQL Server might not be using those nonclustered indexes.............

Cheers,
kt

It's an exciting year for us for DevConnections! SQL Server 2008 has now been out for a few months and an SP is coming up soon. This is the sign that some customers wait for to migrate over to the new release feeling that an SP indicates a higher level of stability. But, this is also a time when some companies are shying away from upgrades because of the immediate and very quantifiable costs. And so Paul and I really struggled with what to focus on when we put together our Connections line-up.

What we decided to do is focus on your getting the most from the system that you have now - with best practices that apply to SQL Server 2000, SQL Server 2005 and SQL Server 2008. Personally, Paul and I are going to demo and focus on 2005/2008 but the concepts work on all 3 versions (and even most of the syntax as well - but, for index fragmentation analysis and maintenance the commands changed between 2000 and 2005/2008 so that's one minor difference). For the conference itself, we're focusing on upgrade, new features in 2008 and things to be aware of architecturally in all areas of Administration/Ops, Development and Business Intelligence. And, given that this isn't a "new product year" for SQL, other big conferences are likely to have fewer SQL sessions than usual (and most do...seriously).

So, with SQLConnections you get 46 *SQL* sessions and 3 full-day *SQL* workshops (and workshops on other technologies as well - all of which are spread over 2 pre-con days and 1 post-con day - with none of the SQL ones running concurrently so you could attend all 3). Top it all off with a more intimate event than many others means more interaction to get your tough questions answered! In fact, to help make sure we see as many people as possible, Paul and I usually schedule our sessions before and after lunch so that we can spend the entire lunch gap inside our session room answering even more questions! We even have a session called "Follow the Rabbit" where YOU drive the session with your questions. It's great fun and we've been doing this for the past few years with a lot of success!

If you want to see a bit of the personality and flair offered at Connections - check out MyConnections - it's our conference magazine (note: it's 9.80 MB to download but, it's 84 pages). It's something you get automatically after attending and it's filled with technical articles and all sorts of additional information that comes from Connections. And, here's a link to a fun and fast-paced video with highlights of the conference itself (nothing technical - just fun shots of the event). For example, did you know that EVERY year a Harley is given away at Connections? Here's the quick video: mms://bcast.sswug.org/sswugtv/DevConnectionsFall08.wmv.

Finally, did you know that EVERY attendee gets a FREE SQL Server 2008 Standard Edition license with one CAL. That can cover your attendance right there and get you started on development and learning with SQL Server 2008.

We really have a great time at Connections and we hope to see you there!! (And, Florida in March is a nice destination from the winter weather for many of us as well. :)

We hope to see you there!!
Kimberly and Paul

Something I learned while the SQL Server 2008 Internals book was in tech edit (thanks to our *awesome* tech editor Ben Nevarez - who, unfortunately, does not have a blog or anything...yet! (well, I'm hopeful)), was that you can use a FOREIGN KEY constraint to reference a UNIQUE index - one without a PRIMARY KEY or UNIQUE key constraint. At first glance this might seem like something relatively insignificant but in terms of reducing indexes and/or consolidating indexes it offers something that constraints do not. When you create a UNIQUE index you can use INCLUDE to reference (and include) non-key columns in the leaf level of an index. This offers more choices for covering and if you want to cover a query using INCLUDE but also have a UNIQUE column(s) as the key - you can do that with a regular index but not with a constraint based index. So, that got me thinking even more - can I use a UNIQUE index with INCLUDE and even a filters - from a FOREIGN KEY. My guess was that it probably wouldn't work because it would be too costly to have to verify it on every referencing row BUT, I did have hopes that a filter of IS NOT NULL would work. However, it does not. ;-( 

So, you CAN reference a UNIQUE index with INCLUDEd columns but not filters. Even that's really cool!

And, when you start your spring cleaning - try and cleanup and/or consolidate some of those redundant indexes!!

Cheers,
kt 

Given the general state of the economy...many companies are looking to cut back. Going back over what we've done and "optimizing" things -> budgets, expenses, etc. is the norm right now. And, scaling back is not always a bad thing - unless the wrong things are cut. Unless the wrong things are used to motivate you. Prioritizing and/or really assessing what gives you the biggest gains for your dollars is hard. In fact, one of the things that always seems to be first on the cutting block is training. Training is hard to quantify. And, the results of good training are also hard to quantify. Instead of fixing a problem (which you can often see the exact improvement) you might instead avoid a problem. Avoid downtime. Avoid data loss. Process more rows - with the same hardware. But, how do you know the cost of what could have happened. Ugh. To be honest, if I could do that - Paul and I would be on a beach. ;)

But, I do have a reason for this post... what should you be thinking? Where should you focus your attention? What can you cut - safely, temporarily, permanently and what might you help to prioritize?

Should you upgrade software?

  • Is there a feature that makes something easier? Some new features are really powerful "big" features. For example, Policy-Based Management (PBM) might help you to better centralize certain rules (in PBM-speak "policies") and then enforce them on many servers - even 2005 servers... so, you might be able to upgrade a smaller number of servers and still get some of the benefits. Many of the tools work against multiple versions so you might be able to minimize (and/or prioritize) which servers you upgrade and slowly migrate others. Potentially following an every-other-version upgrade strategy... upgrading some servers from 2000 to 2008 and leaving some of your 2005 servers to wait to upgrade until SQL11 (the next version after SQL10 - which is SQL Sever 2008).
  • Are you starting a new project - architecting a new database? Wouldn't it be easier to start on the newer version and get better longevity (maybe?!)? For example, sparse columns might make a major difference in your base table's architecture...and be easier than if you were to architect (and write all of the code) for 2005 but then later need to do a major architectural change to move to 2008 (well, to *really* benefit from things like sparse columns). There are some really good features in 2008 and some *might* warrant upgrading... upgrading now. But, if you don't have a direct need then I'd argue that you could probably stay with 2005 (or even 2000) and then push this out a bit until you absolutely need to move forward.

Should you upgrade hardware?

  • Again, are there features that will directly impact: performance, availability, manageability?
  • Can you wait? I can't really answer this and - for everyone - the answer is going to be "it depends". There might be something that significantly reduces costs and/or minimizes downtime and as a result, you'll just have to do cost-benefit analysis. This is a tough one... but, maybe you can do rolling upgrades and let some of the lesser servers take the hand-me-downs. :)
  • Can you do rolling upgrades moving the most critical to a new server and then a less critical server to the one freed up by the last upgrade...

Is there anything you can do to get more out of what you already have??

In my opinion, this is probably even more important than the two above. Upgrading hardware and software is something you will ALWAYS need to consider but if you could get better performance, scalability and availability out of the hardware/software you have now, then you'll benefit *now* without additional funds spent (actual outgoing funds) and you still be able to leverage what you do today when you do upgrade. So, what this really translates to (IMO) is tweaking and tweaking a bit more - what you already have? How? What can you look for? What can you do to help??

  • Upgrade to the latest service packs/hotfixes (at least upgrade to the free stuff - you might see some gains and in some cases (like SQL Server 2005 SP2+) you might get some new features. (important note: test this on a non-production server FIRST!!)
  • Update your hardware's firmware? You might have missed an update that improves performance (important note: test this on a non-production server FIRST!!)
  • Bottleneck Analysis - Some good resources for this are: Performance Tuning Using Waits and Queues and the SQLCAT team.
  • Workload Analysis - Some good resources for this are: Troubleshooting Performance Problems in SQL Server 2005, Working with Tempdb in SQL Server 2005, Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005...well, there are lots of good whitepapers that are specific to certain types of workloads and/or perf problems...check out our whitepapers page here: http://www.sqlskills.com/whitepapers.asp and the CAT team's whitepapers pages here: http://sqlcat.com/whitepapers/default.aspx and the general SQL Server on microsoft.com pages here: http://www.microsoft.com/sqlserver/2008/en/us/white-papers.aspx and for 2005 here: http://www.microsoft.com/sqlserver/2005/en/us/white-papers.aspx
  • Maintenance - often overlooked and incredibly important. A database that has solid maintenance practices (fragmentation analysis and cleanup, VLF analysis and cleanup, transaction log management, finding corruption in its early stages through automated CHECKDB executions...) performs better, is easier to recover, might naturally stay smaller (more compact) and therefore require less hardware. In fact, analyzing indexes - to get rid of unused indexes and to consolidate redundant indexes can end up saving disk space, backup space, cache, maintenance costs, etc. Both Paul and I have blogged quite a bit about many of these!
  • Other tips and tricks
    • Blogs... which is why you're here and there are so many out there! Here's a link I recently found that lists a bunch of SQL-related blogs: http://technet.microsoft.com/en-us/sqlserver/bb671052.aspx and, of course, Paul's post on "So many blogs" and the PASS list of blogs here: http://www.sqlpass.org/Community/BlogDirectory.aspx.
    • Webcasts... there are lots out there and we now have a page which has most of ours listed on it (thanks to Paul for creating this!!) here: http://www.sqlskills.com/webcasts.asp and there are LOTS more on TechNet, MSDN, etc.
    • Conferences... OK, maybe a shameless plug for conferences like SQLConnections *but* in having put together the agenda (with Paul) where we specifically focused on best practices topics and performance tuning - I can tell you that some of the tips and tricks that we recommend can significantly improve performance, may minimize needed disk space (by creating more optimal and often fewer indexes), may improve availability with better design practices and/or maintenance and much more than that! And, in getting away from the office for a few days and focusing just on learning you might do two things. First, you might learn some tips and tricks that you never would have (or it would have taken *a lot* more time and/or been harder to really understand?). Second, you might come back with a whole new and renewed enthusiam for doing things - and with an ordered/prioritized list of things to try. And, this might even help to motivate you because it also shows that your company really is committed to you/your job (having spent money specifically on your learning) - and you to them.

So, I do think that there are SMARTER ways to save. A well trained employee is worth a lot more than a cheaper one. And, there are smarter things to cut. I hope this might help you think of things to do and/or places to look to get better performance with what you have! I think blanket "no training" or "no upgrades" statements are never good for anything - even the budget (the longer term effects can be much worse - but also much harder to quantify).

Really, the answer is always different. It depends............

kt

Paul and I started discussing a comment that came up regarding the many issues surrounding logging & recovery. It's one of our favorite topics and in fact was the title to an article that Paul recently wrote for TechNet here: http://technet.microsoft.com/en-us/magazine/2009.02.logging.aspx. And, as a sidenote, depending on how much you already know about the transaction log - you might want to review that article first!

The comment that came up was related to a common misunderstanding on what is and what is not required to make a backup transactionally consistent when restored. And, in my opinion, some of the confusion as to whether or not log backups are "required" is because many changes have occurred release to release. Also, a lot of us say "log backups are required for better recovery" and while restoring log backups is what allows features like up-to-the-minute recovery and point-in-time recovery, not all strategies or recovery procedures actual require additional and/or separate log backups (some backups actually backup part of the log during their backup - and this is actually something that has changed release to release). And so, this is the reason for this post, I want to try and clear up a few of the many misconceptions about what happens with regard to the log during backup and restore. What's really interesting is that some of the best features (seemingly minor) have been around exactly this - the behavior of the transaction log during other backups and the requirement during restore. So, I thought I'd give a play by play from 2000 to 2005/2008 to discuss the differences and what's changed and why those changes were significant. The biggest changes were between 2000 and 2005.

First and foremost, the log portion of a database is required to make that database transactionally consistent. The transaction log is the key to SQL Server's durability (data integrity even after power loss). Transaction log backups are the key to our being able to recover from more catastrophic failures (possibly even point-in-time recovery if the right backup strategy exists). Inside the database, SQL Server doesn't really need all of the transaction details after they've guaranteed a transaction's durability (or, more simply put, once the effect of the change has been reflected in the data portion of the database then the details of that change are no longer needed in the log portion of the database). As a result, you can have SQL Server clear the "inactive" portion of the log by setting the database's recovery model to the SIMPLE recovery model. Loosely translated the SIMPLE recovery model means "when SQL Server no longer requires the transaction information to guarantee durability - then the log information can be removed from the log". Setting the recovery model to SIMPLE limits your backup options and makes administration easier (i.e. simple :)); however, it does not offer any other protection in the event of a more catastrophic disaster (because the log is being regularly cleared then there's no transactional information to backup). For some development/test databases and databases where data loss is not a major concern, then this can be an easy choice because log management (i.e. backups) does not need to be performed. However, if you want to minimize data loss - you can't choose the SIMPLE recovery model; you must choose either the FULL (which is the default) or the BULK_LOGGED recovery model. However, the discussion on when/why to choose BULK_LOGGED is a lengthy one and it does NOT impact the rest of this blog post. However, I did write a chapter for a SQL Server 2000 HA book and I described in detail the best uses for the BULK_LOGGED recovery model as well as the benefits and concerns. While this was written for SQL Server 2000, most of it *still* applies (and there are a few timeline based examples as well). You can download a pdf of this chapter here: http://www.sqlskills.com/resources/SQLServerHAChapter9.pdf.

As for the main purpose of this post - there are basically a few key questions that I want to answer/clarify by version:

  • Is the log backed up as part of the other backups?
  • Is it cleared?
  • Is there anything else that's affected?
  • What happens to the log during other database, filegroup, file, database-differential, filegroup-differential and file-differential backups? And, since the behaviors and internals seem to be grouped into two groups, I will differentiate between these two different groups of backup strategies with the following types:   
    • Database-level backup strategies are backups that use database and optionally database-differential backups
    • Granular backup strategies are backup strategies that use file and/or filegroup backups and optionally file-differential/filegroup-differential backups

SQL Server 2000
Database-level backups cannot occur simultaneously with log backups. However, granular backups *CAN* occur concurrently with log backups.
If a log backup is attempted while a database-level backup is running, then the log backup is paused. This can have the following affects:

  • the transaction log may require auto-growth and become very large
  • the transaction log for a secondary server (i.e. through log shipping) can fall *very* far behind the primary server. And, this is a HUGE concern for high availability. If a full backup takes 4 hours to run, then logs cannot be shipped for 4 hours. As a result of this limitation, some chose to use a granular backup strategy. The reason why log backups CAN occur concurrently in SQL Server 2000 is because, in implementation, SQL Server does NOT to backup the log as part of these more granular backups. As a result, transactional integrity is not guaranteed until the appropriate log chain is rolled forward. This has the following affects:
    • Granular backups only support the BULK_LOGGED or FULL recovery models (somewhat negative but not really)
    • The transaction log backups could run and even clear the inactive portion of the log while these granular backups were running (this is a huge benefit because it limits the need to auto-grow during these backups)
    • Recovery during restore is required (for transactional integrity) which means that all logs need to be restored to cover the time of the granular backup (and then all of those up-to-the-minute or to the desired point-in-time). And, even if a filegroup is set to READ_ONLY - *all* transaction logs need to be restored (this is a big negative but there is a trick: perform periodic file/filegroup differentials (after setting the filegroup to READ_ONLY) so that you can avoid having to perform numerous transaction log restores).

So, to answer the questions for SQL Server 2000:

  • Is the log backed up as part of the other backups?
    • for Database-level backups: YES
    • for Granular backups: NO
  • Is it cleared?
    • When a transaction log backup is performed then the default behavior is to clear the inactive portion of the log
    • When a database-level backup is performed AND there's no corresponding log chain (meaning the first time you backup the database OR the first time you backup the database after the transaction log chain was broken), then the transaction log is cleared. NOTE: Breaking the continuity of the log is relatively easily done in SQL Server 2000 when someone executes a BACKUP LOG with NO_LOG or a BACKUP LOG with TRUNCATE_ONLY command. To disable these from execution (for the FULL or BULK_LOGGED recovery model, use TRACE FLAG 3231). This is a VERY COOL and *SAFE* trace flag. I blogged about this trace flag in a "MSDN webcast Q&A" here. An important side note here is that in SQL Server 2000, log backups can be performed AFTER the continuity of the log has been broken. So, if someone manually cleared the log (using NO_LOG or TRUNCATE_ONLY) and did NOT follow that with a database-level (or appropriate granular backups), then scheduled log backups could continue to run without failure or errors. However, log backups performed AFTER the continuity of the log has been broken CANNOT be restored. So, during recovery you might receive an error that a log backup cannot be applied because it's too "late" to apply. Using Trace Flag 3231 reduces this possibility. However, SQL Server 2005 fixes some of these issues.
  • Is there anything else that's affected?
    • Log backups are paused during database-level backups
    • When restored, database-level backups are transactionally consistent (and can be recovered directly - without restoring additional logs)
    • When restored, granular backups require transaction log backups to guarantee transactional integrity (note: this can be complex to determine the "minimum effective log sequence" and I wrote a series of articles for SQL Server Magazine on how to determine the appropriate log sequence here)
  • What happens to the log during other database, filegroup, file, database-differential, filegroup-differential and file-differential backups?
    • for Database-level backups: log backups cannot occur concurrently 
    • for Granular backups: log backups can occur concurrently and are required for recovery

SQL Server 2005
The biggest improvement in SQL Server 2005 was that log backups are no longer paused by database-level backups - they *can* occur simultaneously; however, this change to database-level backups also applied to granular backups. While 2000 did allow log backups at the same time as a granular backup, they did so by NOT maintaining transactional integrity in the backup. In SQL Server 2000, you need to restore logs to make the granular backup transactionally consistent. In SQL Server 2005, they changed ALL backup strategies to follow the same behavior - database-level and granular backup strategies ALL backup the required log information needed to recover the backup to a transactionally consistent point in time which is essentially when the backup completes (this is a lot more complex than it sounds but Paul wrote a comprehensive post on exactly what this means here). Simply put, this requirement means that transaction log backups CAN occur concurrently; however, the log CANNOT be cleared until the backup completes. The primary negative effect is that the transaction log may require auto-growth and become very large. However, the positives are that you can do granular backups in any recovery model (although there are still some limitations to how this works in the SIMPLE recovery model but they added a new option during backup to allow a backup of ALL of the READ_WRITE_FILEGROUPS as a unit - separately from the read-only file groups which could be backed up at any time after they are set to READ_ONLY).

So, to answer the questions for SQL Server 2005:

  • Is the log backed up as part of the other backups?
    • for Database-level backups: YES
    • for Granular backups: *YES*
  • Is it cleared?
    • When a transaction log backup is performed then the default behavior is to clear the inactive portion of the log
    • When a database-level backup is performed AND there's no corresponding log chain (meaning the first time you backup the database), then yes, the inactive portion of the log is cleared. As far as breaking the continuity of the transaction log... In SQL Server 2005, they significantly reduced the problems that occur after the log chain is broken by NOT allowing log backups to continue. If a log backup is attempted after the continuity of the log is broken then you will receive error: 
        Msg 4214, Level 16, State 1, Line 1
        BACKUP LOG cannot be performed because there is no current database backup.

      So, this means that you don't necessarily need the trace flags. However, I still recommend using the trace flag because it would be better to not break the continuity of the trace flag to begin with! And, in fact, in SQL Server 2005, there are two trace flags: 3231 and 3031. They are both safe and here's how the two differ:
      • Trace Flag 3231 (same as 2000): When set, BACKUP LOG with TRUNCATE_ONLY and BACKUP LOG with NO_LOG do not allow a log backup to run if the database's recovery model is FULL or BULK_LOGGED.
      • Trace Flag 3031 (new in 2005): When set, BACKUP LOG with TRUNCATE_ONLY and BACKUP LOG with NO_LOG run as a CHECKPOINT - regardless of recovery model.
  • Is there anything else that's affected?
    • Log backups are *NOT* paused during database-level backups
    • When restored, database-level backups are transactionally consistent (and can be recovered directly - without restoring additional logs)
    • When restored, granular backups are transactionally consistent (and can be recovered directly - without restoring additional logs). However, you must always remember that the database cannot be brought online until the entire database is at a single transactionally consistent point in time. All read-write-filegroups must be restored as a unit (if in the SIMPLE recovery model) OR you must use transaction log backups to recover the entire database up to the SAME point in time.
  • What happens to the log during other database, filegroup, file, database-differential, filegroup-differential and file-differential backups?
    • for Database-level backups: log backups *can* occur concurrently (but the log will not be cleared until the backup completes) 
    • for Granular backups: log backups can occur concurrently (but the log will not be cleared until the backup completes) 

SQL Server 2008
Almost everything is the same in SQL Server 2008 as it was in 2005 - they made the largest number of improvements in 2005. However, one thing did change. In SQL Server 2008, the BACKUP LOG with NO_LOG and BACKUP LOG with TRUNCATE_ONLY options are not allowed at all. There is no need for the trace flags (3231/3031) because breaking the continuity of the log is not allowed (well, there is still a way... I'll get to that in a moment :)). In SQL Server 2008, if BACKUP LOG with NO_LOG or BACKUP LOG with TRUNCATE_ONLY are attempted, you will receive this error:
     Msg 3032, Level 16, State 2, Line 1
   One or more of the options (no_log) are not supported for this statement. Review the documentation for supported options.
But, what if you really don't want to backup the log? Why? Take this scenario (from a real customer!)... You have a 10GB database that's been around for quite some time AND you're doing regular full database backups... then, all of a sudden you run out of disk space. In looking around for large files (to investigate why you ran out of space), you find that this 10GB database's log is 987GB... so, you wonder - what happened? A database that is in the FULL recovery model (remember, this is the default) requires transaction log management. The easiest way to manage the log is with regular log backups; however, you're only doing full database backups (which do NOT clear the log). As a result, the transaction log grows and grows and grows and grows - until you're out of disk space (Paul demo'ed this in a TechNet Podcast here). At this point, how do you get rid of this 987GB transaction log? In prior releases, you can "clear" the log by using TRUNCATE_ONLY or NO_LOG but in 2008, what do you do? Switch to the SIMPLE recovery model. And, if you only want to do full database backups, stay there. And, if you want to physically shrink down the size of the transaction log file to a reasonable size - check out these two related blog posts: 8 Steps to Better Transaction Log Throughput and Transaction Log VLFs - Too many or too few?. And, in related news, Linchi Shea posted a good post on some tests he ran related to too many VLFs here and a second post that shows that some workloads don't see any issues wrt to lots of VLFs here. But, the long story short is that you still want to be proactive about creating a reasonably sized transaction log (my two other previously mentioned posts). Significant auto-growth can cause problems and backup operations (and managment in general) can be more difficult with lots of VLFs.

Wow, that was much longer than I was expecting... and, in writing it all down - pretty complex (I had a hard time trying to section things but I think this works?!). Regardless, all the facts are there so this should help to clarify what happens the when, where and why - wrt to the transaction log. Let me know if you have more questions!

Thanks for reading,
kt

Well, this has been a great "conference season" but I'm also glad that it's over for a bit. In the past 4 weeks, Paul and I have been to Vienna (check out Paul's post here), Barcelona (for TechED EMEA ITPro), Las Vegas (for SQL Connections) and finally to Seattle (for PASS). We live in Redmond - about 15 miles from Seattle - so, this was a nice and rare "conference at home" but I have to admit that a conference "at home" is even a bit harder than one on the road. See, when we're on the road, we're only moments away from the conference (and often even in the hotel of the conference) and we don't have any of our regular "home obligations"... And, so, a conference "at home" is actually even harder to coordinate. And, wow, we almost got bitten by the Seattle traffic with it taking an hour to get into Seattle the day we did our PASS pre-conference workshop. However, we *did* start on time (ah, it was close though :).

And, so, after 4 weeks on the road, we're *finally* starting to post all of our demo scripts and continue to respond to all of the great emails we've received from folks (and, we have received quite a few!). Here's a quick rundown of what we did over these 4 weeks:

  • We delivered the equivalent of 7 days of lecture at a customer (we did 3 days together and then 2 days separated (with different teams))
  • We delivered 5 sessions and 4 ILLs at TechED EMEA
  • We delivered 2 full day workshops and 4 sessions at SQL Connections
  • We delivered a full day workshop and 2 sessions at PASS
  • I also joined in on the Women In Technology panel at PASS and it was *great* fun - we had a great discussion around advancing your career, how to start your own business (and network!) and just in general, how to strive to have a better work/life balance (and, as many admit - this is NOT super easy regardless of whether you're a woman or a man!!)

And, now we're tired... Actually, it's amazing how much energy I seem to have when I'm presenting but when all of these events are over, wow can I sleep! In fact, this weekend I've been a bit in a haze just starting to adjust back to regular-non-conference life... and, it's included a lot of Blokus with the kids and some Wii time.

So... check out our "Past Events" page as we've posted all of our demo scripts there. And, if for some reason I missed a script that you remember but can't find... let me know. I'll either add it to the zip (and let you know) and/or respond with a blog post on it. There are definitely some really good scripts and some interesting demos in terms of internals, indexing for performance, optimizing procedural code, etc. and so I'm happy to find that script and send it to you if for some reason you can't find it (and/or I forgot - which is always possible?! :)).

Finally, DVDs... over these past few weeks there have been a lot of DVDs flying around. For some events, we handed out DVDs. And, for some events - we just found out - Microsoft handed out DVDs (specifically the SQL Server 2008 Resource Kit - a 3-DVD pack). So, I thought I'd write a bit here to clear up the madness because one of the Resource kit DVDs is actually the same as one of ours - but, with all of the files renamed (and, for virtually no reason). There is one *tiny* difference - which I'll explain - but, I thought I'd clear everything up so that we're all on the same page, per se.

The DVD we handed out: SQL Server 2008 Hands-On Lab DVD, August 2008
This is a packed DVD that includes a self-extracting executable to create a virtual PC environment. There's a "Readme1st.txt" file which gives you a bit of insight into the requirements (16GB of disk space for the virtual PC hard drive (.vhd) and 1.5GB of memory - by default - for the virtual PC). The DVD also includes all of the labs manuals in .doc form. This DVD was created in August (make sure it says "August 2008" on the right hand side) and there are some specific setup/installation steps that you need to do to get everything "up to date". Check out my blog post on these instructions here: http://www.sqlskills.com/BLOGS/KIMBERLY/post/SQL2008HOLsVPC-setup-instructions-and-a-note-for-the-August-2008-release.aspx. This VPC expires on December 31, 2009.

The SQL Server 2008 Resource Kit - DVD #2
This is a 3-DVD set. I haven't reviewed all of the DVDs but a few folks were asking about the 2nd DVD so I grabbed a copy to check it out. It turns out that the second DVD is the same as the August 2008 DVD - with one exception. After I burned about 1000 of our August DVDs, I found out about the password expiration (which is explained/fixed in the setup instructions in the link here/above) and rebuilt a new image for the SQL team. However, instead of using that image - they renamed it and rebuilt it again (not sure why - other than to rename it). Anyway, on DVD2 of the SQL Server Resource Kit you can run the exe in the root and it will bring up a nice interface (ok, this is nicer than ours but - it's also a bit confusing). The interface is confusing because it says that you need to "DOWNLOAD" the virtual PC (and, a few folks asked me about how long it would take to download - because it's 4GB+). Anyway, there's NOTHING to download (luckily!!), it's actually on the DVD in the DVD:\AlwaysOn subdirectory. See, this is the frustrating part... about 3 years ago I created a DVD that was called "the AlwaysOn DVD". Unfortunately, (since then) all of my HOLs DVDs seem to be named this - even though they're definitely NOT all AlwaysOn Technology related. And, I had renamed it (on my version)... and, well, they renamed it back on the resource kit. LOL. In the end, it really doesn't matter. I'm just *VERY* happy to see that this content is freely available (yeah!) and that it's included in the Resource Kit. Finally, if it turns out that you can actually order this... I'll let you know. I've got an email off to a few folks to see if it's possible.

And, there was still some interest in SQL Server 2005 Hands-On Labs (yes, the SQL Server 2005 Always On Technologies DVD and the SQL Server 2005 Manageability DVD - which were combined into one for SQL Server 2008) and well... since I was home... I grabbed some of these *EXPIRED* DVDs (DVDs that were earmarked for the bin). And, I just want to make sure (if you requested one) that people know there's a trick to using "expired" DVDs... Before the DVD expires - you need to disable the Virtual PC clock synchronization (so that the VPC does NOT pick up your computer's local time). As a result, you can set the date inside the virtual PC back even further. This is something that I've read about on other blogger's blogs. So, do a "windows live search" :) :) and check out how to disable the virtual PC clock synchronization.

What's going on right now?
Paul and I have a few more weeks of craziness. We're still completing our writing for the new SQL Server 2008 Internals book by Kalen Delaney (available for pre-order here) - she's doing the lion's share of the work (I don't know how she does it!!) and we're each writing a chapter (Paul on DBCC, Conor Cunningham on the QP, Adam Mechanic on XEvents/Profiling and me on Indexing - not a surprise for me/Paul, eh?). We're also editing/working on a new title called the SQL Server MVP Project - with Paul Nielsen. PaulN really had the initial idea and drive for this project and there are a few editors (PaulN, Kalen, Adam, Greg Low, Paul [Randal], and myself) bringing everything together from dozens of Microsoft SQL Server MPVs. For this project, many MVPs have submitted a topic or two and in it they are describing a tip/trick and/or best practice - so this title will be all about tips and tricks from the trenches AND the title will completely go to benefit a children's charity (Steve Balmer had challenged MVPs to "give more" at the MVP Global Summit 2008). And, next week, I'm heading off to Dublin for a week (and, I'm speaking at their local SQL Server user group - again - as we were just there in Sept!) and then after that, it's time to relax for the holidays............... yippie! Next year we'll being heading west a lot more (India, Thailand, Australia...). We'll keep you posted for sure!!

I hope everyone's had fun at these conferences and may all your queries/procedures be optimized!

Thanks for reading,
kt

This is something that both Paul and I want to research more but, there have been a lot of questions about this recently. So, I thought I'd pull together a quick post of some important resources if you're interested in virtualization.

1) You should know what the support implications are - Bob Ward (Microsoft PSS Principal Escalation Engineer extraordinaire) - has blogged about the support implications here: SQL Server Support in a Hardware Virtualization Environment

2) You should know some of the best practices in configuration and setup - and the SQL team (specifically the SQLCAT team) has just recently released a whitepaper: Running SQL Server 2008 in a Hyper-V Environment - Best Practices and Performance Recommendations.

3) Finally, here are a few additional - and helpful resources: 

  • SQL Server's virtualization section on microsoft.com: http://www.microsoft.com/sqlserver/2008/en/us/virtualization.aspx which has a lot of links/resources/references.
  • If you're a SQL Server Magazine subscriber, check out Demystifying SQL Server Virtualization by Mike Otey in the the September 2008 edition of SQL Server Magazine

Have fun and post back with any issues on virtualization that you've run into!

kt

Way back in June 2005, I blogged about '8 Steps to better transaction log throughput'. I did this blog post after seeing (again and again) overly fragmented transaction logs... Transaction logs can become *VERY* fragmented when they are not preallocated and instead they grow excessively through unmanaged (and probably the default settings for) auto-growth.

While having WAY too many VLFs because of auto-growth is still the most common form of problem within transaction logs, another problem has been creeping up more and more... too few VLFs. If you preallocate a very large transaction log (10s to 100s of GB), SQL Server may only allocate a few VLFs - as a result, log backups will be allowed to run normally but, SQL Server only clears the inactive VLFs when you've moved into a different VLF. If your VLFs are 8GB in size, then you need to accumulate 8GB of log information before the log can be cleared...so, many of your log backups will occur normally but then one (the one that finally hits > 8GB in used size) will take quite a bit more time AND possibly cause you performance problems because it's now clearing 8GB of log information.

First, here's how the log is divided into VLFs. Each "chunk" that is added, is divided into VLFs at the time the log growth (regardless of whether this is a manual or auto-grow addition) and it's all dependant on the size that is ADDED not the size of the log itself. So, take a 10MB log that is extended to 50MB, here a 40MB chunk is being added. This 40MB chunk will be divided into 4 VLFs. Here's the breakdown for chunksize:

chunks less than 64MB = 4 VLFs

chunks of 64MB and less than 1GB = 8 VLFs

chunks of 1GB and larger = 16 VLFs

And, what this translates into is that a transaction log of 64GB would have 16 VLFs of 4GB each. As a result, the transaction log could only clear at more than 4GB of log information AND that only when it's completely inactive.

To have a more ideally sized VLF, consider creating the transaction log in 8GB chunks (8GB, then extend it to 16GB, then extend it to 24GB and so forth) so that the number (and size) of your VLFs is more reasonable (in this case 512MB).

Have fun and thanks for reading!!
kt

PS - I've been made aware of a bug when you use an exact size of 4096MB. I'll get more details and post them here but the long story short is to avoid 4096MB as an exact value. I've been told (and I haven't played with this one yet), that 4095 doesn't have the problem. Oh, and the problem is that the 4GB does NOT get divided into equally sized VLFs.

There's a new feature of SQL Server 2008 that during beta was called a Configuration Server... In RTM it was renamed to a Central Management Server. It's not overly obvious and without knowing how cool it can be - you might skip right by it. And, if you're using SQL Server 2008 to manage 2000, 2005 and 2008 servers - this actually works for all of those (and can be quite cool).

In the "Registered Servers" pane, expand the "Database Engine" option to see this feature. It's basically a separate option in addition to Local Server Groups. Right-click to register a Central Management Server.

A Central Management Servers (at first glance) seems as though it's solely a way to store server groups and registered servers - centrally - so that you (and your team) have easier access to the registration properties of a group of servers. And, in setting it up, adding groups, and registering servers - well, it doesn't seem like it is any different. In fact, that's all it is - in terms of creation. However, there are new options in terms of how to use it. It definitely IS different! In fact, it's in this simplicity (and the options it exposes) that lies its coolness. Let me show you :)

On a VPC (that we typically use for demos/labs, etc. for SQL Server 2008), we have 5 instances installed (SQLDev01, SQLDev02, SQLDev03, SQLExpress and ConfigServer). All instances (except Express) are SQL Server 2008 RTM Developer Edition and they're just simply named instances (there is no special designation (other than its name) for "ConfigServer"). I created the "ConfigServer" instance as a "server to hold configuration and general centralized management features - like Policy-Based Management, the management data warehouse for Performance Data Collection, a Master Server (for Master Server/Target Server Administration), and even a centralized management server. For many of these things you'll want to use a non-Express Edition of SQL Server as some features won't work on SQL Express (I'll explain more on this later). So, for this example, I'm going to use the ConfigServer instance (note: Centralized Management Servers *DO* work with SQL Express - something I previously thought was not possible... and, this is WAY cool). Regardless, the overall features that I plan to use with this server (like Policy-Based Management) require a version other than SQL Express so I'm going to use a Developer Edition instance for this centralized management server.

OK, so in this case, I created two groups (Development and Production) and each group has servers...

If I right-click on ANY level (a specific server, a group, or the Centralized Management Server itself) then you get multiple options as seen above:

  • New Query
  • Object Explorer
  • Evaluate Policies
  • Import Policies

In this case, I'm going to right-click on the Production Group and choose New Query – this opens a query window with one slight difference. The status bar at the bottom has a different color. OK, I bet you won’t even really notice this but a regular query window has a pale yellow status bar. A centralized management server query window has a pale pink status bar. I find that this isn’t overly noticeable – so I change it to a much more bold color (fuschia!). To change this, use Tools, Options, Text Editor, Editor Tab and Status Bar and then change the setting for the Group Connections option under the Status Bar Layout and Colors section. Anything that stands out is preferred. The reason why will be apparent soon…Within this query window, anything I execute will be executed against each of the servers in this group. I can even choose to execute a query against the centralized management server itself and this will execute against every server in every group. Powerful, but potentially-too-easily dangerous.

Here, I’ll execute SELECT @@version against the Production group:

The default behavior is that this is executed against all servers and the results are unioned (or merged). If you want to change this behavior you can also set this in Tools, Options under Query Results, SQL Server, Multiserver Results – “Merge Results” equals True. You can also set whether or not the login name is appended to the results (this defaults to false) in addition to whether or not the server name is appended (this defaults to true). You can change all three of these settings.

So, why did I make the status bar fuschia… what if I execute DROP DATABASE dbname… yes, it will try and drop this database from all servers in the group. While very powerful, this is something of which you should be careful.

OK, so here are the most interesting things about centralized management servers:

1) To designate a server as a CMS - that server must have an msdb. Originally, I thought that you couldn't use SQLExpress but because SQLExpress *does* have an msdb, this does work. However, SQLExpress lacks many other features/capabilities so outside of simplifying connectivity and query access, I'd probably target a higher edition of SQL Server. Even more specifically, if you're interested in where all of the server group information is stored:

Local Server Groups stored in: C:\Documents and Settings\<user>\Application Data\Microsoft\Microsoft SQL Server\100\Tools\Shell\RegSvr.xml

Centralized Management Server details are stored within the msdb in:

dbo.sysmanagement_shared_registered_servers_internal

dbo.sysmanagement_shared_server_groups_internal

2) Not only can you execute queries but you can right-click and choose Object Explorer and each server will be connected to and entered into your Object Explorer window. This will minimize your manually connecting/opening each server one by one.

3) AND, the servers can be servers other than SQL Server 2008... I've connected to 2005 and 2008 and you shouldn't have a problem adding any other SQL Servers - even 7.0...but, I haven't tried that (let me know if it works for 6.5 :).

3) Finally, in addition to executing queries, you can also Evaluate or Import Policies. This is something that can be really powerful. In addition to executing something directly, you can evaluate a policy against a number of servers in one step. Policy-Based Management is something that you'll start (if you haven't already) hearing a lot more about. And, since I'm always behind at blogging - check out these other links/blogs for more info on PBM:

Enjoy and thanks for reading,

kt

Well... I think I had had too much tea that morning ;-). But, as always, chatting with Richard and Greg was great. Here's the specific show link: http://www.runasradio.com/default.aspx?showNum=76.

Oh, and just for the record, I didn't come up with that title. But, I do hope that all your [high-priority and important] queries are indexed!

Enjoy!
kt

OK, I know this post doesn't really apply to most folks but I did want to have the setup instructions on my blog so that I could refer people to it. For the August release we updated all 17 labs and all of the demos for SQL Server 2008 RTM; however, we did use a "security enabled" VPC. And, well, security is bugging people right about now because the password policy in this VPC was set for 42 days. I know all of you run into this on production servers but on a test VPC, it's pretty much nothing more than an annoyance. So... I thought I'd mention how to disable it for the VPC.

(1)    Remove the policy for 42 password expiration

                Local Security Policy Editor (Admin Tools, Local Security Policy)

                Security Settings

                                Account Policies

                                                Password Policy

                                                                Set the Maximum password age to 0 days (which is no password expiration)

 (2)    Change the SQLService account to not require a password on next login

Computer Management

                                Users and Groups

                                                SQLService

                                                                Deselect the option "User must change password at next login"

         you might not actually need to set this but it's good to check!

 (3)    Windows Update

While you're at it then, you should also consider running Windows Update...

 (4)    Shutdown and save changes...

And, here's the full instructions that we usually give out at conferences/workshops: SQL2008Aug HOLs DVD SETUP Instructions.pdf (25.93 KB)

As for accessing this content on the web, it's part of Jumpstart and it will be part of the SQL Server 2008 Resource Kit. Also, I just refreshed the image for the resource kit so that it does NOT have the password policy enabled so you won't need to do that step. Also, the Jumpstart folks will probably refresh the image there so that it also uses this version. Regardless, you might still want to run Windows Update. And, you'll also find a lot of these resources at upcoming events and conferences as well as through these resources.

Enjoy!
kt

I've tackled this type of question before but it's a very challenging thing to decide. Budgets are tight and training is always hard to quantify. However, working more efficiently and effectively - when properly trained - means smoother operations, less downtime, less data loss, less stress.

So, if you're looking for some ideas on training (conferences vs. traditional classes vs. "immersion events") then check out my post where I explain different events and different types of training here.

As for the most in-depth - and public event that Paul and I offer... it's coming up quickly in San Fran and I thought I'd do a quick post to get you some details.

San Francisco, CA - Oct 6-7-8, 2008
SQL Server 2008 Workshop for DBAs/ITPros (and even developers who just want to know more about how things work!). This is probably closest to an Immersion Event but it's great because we'll be doing lecture and hands-on (bring your own laptop) by handing out our newly updated 2008 Hands-on lab DVD (which has a 2008 RTM VPC on it that does't expire until 12/31/2009!). Here's a link to a detailed post that Paul did about the event here and click on the pic above to get to the site itself. This will be a relatively small and in-depth event where we get 2.5 days to cover a ton of new 2008 features (and general best practices/tips/tricks)!

Finally, Paul and I have been offline a bit recently as we've been out of the country - first, presenting in London, Dublin and Edinburgh and then attending Paul's sister's wedding in Scotland (and then celebrating my (yes, 29 [again!]) birthday with a sight-seeing trip around Skye). I think I need to do a post with a few pictures... Paul in a kilt springs to mind!

Happy September!
kt

Categories:
Events | SQL Server 2008

I first posted an update to sp_helpindex here. My version of sp_helpindex was solely to expand what sp_helpindex showed and adds 1 or 2 things based on version: for SQL2005+ it adds included columns and for SQL2008 it also adds the filter predicate. So, there were two versions of sp_helpindex2 depending on which verison you're using. A lot of folks like the changes to this sp but, alas, it had a bug (or two :) and in fact, I found a few others when I went back over this as well. So, thanks to Josh (who commented here) and to a private email (thanks Vasco!), I have an updated version of sp_helpindex2:

For SQL Server 2005, here's your new sp_helpindex2 script: sp_helpindex2_2005.zip (2.89 KB)

And, here's a simple test script for 2005:

DROP TABLE tbl1
GO

CREATE TABLE tbl1( c1 int, c2 int, c3 int, c4 int)
GO
CREATE INDEX ix_1 ON tbl1(c1) INCLUDE (c2)
CREATE INDEX ix_2 ON tbl1(c1)
CREATE INDEX ix_3 ON tbl1(c1) INCLUDE (c2, c3)
CREATE INDEX ix_4 ON tbl1(c1, c3) INCLUDE (c2)
CREATE INDEX ix_5 ON tbl1(c3) INCLUDE (c1, c2, c4)
CREATE INDEX ix_6 ON tbl1(c1, c2) INCLUDE (c3, c4)
go

sp_helpindex2 tbl1
go

index_name

index_description

index_keys

included_columns

ix_1

nonclustered located on fg1

c1

c2

ix_2

nonclustered located on fg1

c1

NULL

ix_3

nonclustered located on fg1

c1

c2, c3

ix_4

nonclustered located on fg1

c1, c3

c2

ix_5

nonclustered located on fg1

c3

c1, c2, c4

ix_6

nonclustered located on fg1

c1, c2

c3, c4

For SQL Server 2008, here's your new sp_helpindex2 script: sp_helpindex2_2008.zip (2.84 KB)

And, here's a simple test script for 2008:

DROP TABLE tbl1
GO

CREATE TABLE tbl1( c1 int, c2 int, c3 int, c4 int)
CREATE INDEX ix_1 ON tbl1(c1) INCLUDE (c2)
CREATE INDEX ix_2 ON tbl1(c1)
CREATE INDEX ix_3 ON tbl1(c1) INCLUDE (c2, c3)
CREATE INDEX ix_4 ON tbl1(c1, c3) INCLUDE (c2)
CREATE INDEX ix_5 ON tbl1(c3) INCLUDE (c1, c2, c4)
CREATE INDEX ix_6 ON tbl1(c1, c2) INCLUDE (c3, c4)

CREATE INDEX ix_1f ON tbl1(c1) INCLUDE (c2)
WHERE c3 IS NOT NULL

CREATE
INDEX ix_2f ON tbl1(c1)
WHERE c4 > 2

CREATE INDEX ix_3f ON tbl1(c1) INCLUDE (c2, c3)
WHERE c4 > 2 AND c1 < 50 AND c2 = 12

CREATE INDEX ix_4f ON tbl1(c1, c3) INCLUDE (c2)
WHERE c4 IS NOT NULL AND c1 = 12

CREATE INDEX ix_5f ON tbl1(c3) INCLUDE (c1, c2, c4)
WHERE c1 > 5

CREATE INDEX ix_6f ON tbl1(c1, c2) INCLUDE (c3, c4)
WHERE c4 < 20
go

sp_helpindex2 tbl1
go

index_name

index_description

index_keys

included_columns

filter_definition

ix_1

nonclustered located on PRIMARY

c1

c2

NULL

ix_1f

nonclustered located on PRIMARY

c1

c2

([c3] IS NOT NULL)

ix_2

nonclustered located on PRIMARY

c1

c2

NULL

ix_2f

nonclustered located on PRIMARY

c1

c2

([c4]>(2))

ix_3

nonclustered located on PRIMARY

c1

c2, c3

NULL

ix_3f

nonclustered located on PRIMARY

c1

c2, c3

([c4]>(2) AND [c1]<(50) AND [c2]=(12))

ix_4

nonclustered located on PRIMARY

c1, c3

c2

NULL

ix_4f

nonclustered located on PRIMARY

c1, c3

c2

([c4] IS NOT NULL AND [c1]=(12))

ix_5

nonclustered located on PRIMARY

c3

c1, c2, c4

NULL

ix_5f

nonclustered located on PRIMARY

c3

c1, c2, c4

([c1]>(5))

ix_6

nonclustered located on PRIMARY

c1, c2

c3, c4

NULL

ix_6f

nonclustered located on PRIMARY

c1, c2

c3, c4

([c4]<(20))

Have fun!
kt

YES!!!

OK, well, I guess I should be more specific because as in most things in SQL Server, the real answer is "it depends". And for these two options, it depends mostly on your SQL Server version. Since SQL Server 7.0, the way that auto update works, has changed (much so for the better!!). So, if you're in SQL Server 2005 or SQL Server 2008, I would say most definitely - leave these ON (or if you turned them off - turn them back on!!!)! If you still have problems with a specific index causing your grief, then turn off auto update at the index level NOT at the database level. To turn of auto update at the index level use STATISTICS_NORECOMPUTE in the index (or statistics) definition.

Now, as for why (and how!) this has changed over the versions... here we go:

SQL Server 7.0

  • Invalidation: Statistics were *invalidated* when a row modification counter (sysindexes.rowmodctr) was reached. This meant that they could not tell where the modifications were occuring and, if modifications were somewhat isolated to a specific column ALL of the statistics for the TABLE would be invalidated (so, statistics could be invalidated earlier than necessary)
  • Updating: Even worse, in SQL Server 7.0, when statistics were invalidated, they were immediately updated. This caused two problems - thrashing at the time of invalidation because all of the stats needed to be updated AND two, if the statistics were not used for awhile then extra work was involved to update them and by the time they were used, they might already be somewhat out of date already.

SQL Server 2000

  • Invalidation: Statistics were still invalidated based on a row modification counter.
  • Updating: SQL Server 2000 fixed the "updating-potentially-too-often" problem by only updating statistics when they were needed.

SQL Server 2005

  • Invalidation: The biggest changes were introduced in SQL Server 2005 where they decided to NO LONGER user the sysindexes.rowmodctr and instead use an internal (and undocumented) columns specific modification counter. Now, statistic invalidation is more isolated to only those columns which are heavily modified. This internal/undoc'ed column is sysrowsetcolumns.rcmodified and can only be seen when connecting to SQL Server using the DAC (Dedicated Admin Connection).
  • Updating: Updating didn't really change but, SQL Server 2005 added "Aynch Auto Update" for statistics so that when the QO (query optimizer) encounters an out-of-date (i.e. invalidated) statistic, they can "trigger" the update but not wait for the update (meaning that they'll optimize using the out-of-date statistic). This can be both positive (faster) and negative (might not be the best plan if the statistics have changed drastically). It is off by default and IMO, I'd leave it off in most cases but if you find that auto update events (which can be Profiled) are causing you grief, then you can turn this on at the database level.

SQL Server 2008

Nothing new except "Filtered Statistics" and these are interesting as the density vector is still relative to the table (not the predicate) but the histogram is just over the predicate (OK, I know I'll have to blog a lot more about this one!). Anyway, I'm still playing/learning a lot more about these and they make the most sense with filtered indexes (as opposed to just a filtered statistic) but, just like statistics on secondary columns you will also potentially want statistics on the secondary columns of your indexes. The next question is should they have a filter or not. I've found that sp_createstats doesn't seem to create statistics with filters and I'm going to need to do some testing here but I think statistics with filters (filters that match the non-clustered index) should help to make the stats better (and even allow better usage of filtered indexes) but, I'm really going to need a bunch of time with this - and another post :). As for auto create/auto update - no changes there!

Long story short, if you're using SQL Server 2005 or SQL Server 2008, you should leave auto create/auto update ON.

Thanks for reading!
kt

PS - A few of you have mailed me about a bug in the sp_helpindex2 script(s). OK, that's my next post!!! Possibly with an sp_helpstats2 script as well!

OK, so SQL Server 2008 came out *BEFORE* Visual Studio SP1 but VSSP1 was required (this was a bit annoying and created a lot of confusion/emails/newsgroup posts, etc.) but luckily VSSP1 came out only a couple of days after SQL Server 2008 RTM'ed. As for Data Dude, well, it's not RTM'ing until later this year (I think there will be one more release - probably an RC0 - before RTM) but they've been pushing out various CTPs... so, what works with what and how the heck do you get everything to install?? Oh my!

Really, it's very simple:

  1. Install Visual Studio 2008 (with Team System/Database Edition), then, install SP1
  2. Install SQL Server 2008
  3. Download/Install the latest Data Dude CTP. BOTH CTP15 and CTP16 work with SQL Server 2008 RTM but 16 has a lot of updates.

SQL/VS can really be installed in any order but this (above) is what was recommended as the best order. However, I've not heard any problems with SQL then VS. As for where to go to get more details on Data Dude - check out Gert's "Data Dude" blog for links to the downloads and a lot of tips/tricks.

Have fun!
kt

Categories:
SQL Server 2008 | Tips

OK, so, I don't blog very often. I don't know what it is... I think it's that I feel like I always need to blog huge posts and the thought of writing my huge post makes me not want to blog... so, sometimes my time-between-blogging (TBB :) is long. I'm going to turn over a new leaf (no, really... I'm *really* going to try this time!!) and I'm going to try and write smaller posts and more of them. And, yes, don't worry, I'll still write the long in-depth ones but I'll at least try to give you a few tidbits of things that I encounter - more frequently. And, that's partially why I'm posting this entry...

I've been working quite a bit with SQL Server 2008 and yesterday I went to test of a few things with database mail. Normally this gets setup quite early and gets setup with a lot of other things so restarts of the server/agent happen because of saving/tweaking my test VPC. I don't usually go in and setup Database Mail and then immediately try to send mail. Yesterday, I did... and, guess what, it didn't work. I got error [264] An attempt was made to send an email when no email session has been established. So, I felt like I had seen that before... and, with a bit of web searching I landed on Gops Dwarak's blog for a known issue of SQL Server 2005: http://blogs.msdn.com/gopsdwarak/archive/2006/04/25/583434.aspx. And, yes, restarting the SQL Server Agent solved the problem. I'm surprised that this is still a bug in 2008 but it's not entirely the worst I could come upon. However, having said that, I also thought that there was a general initiative for software to stop requiring restarts of services and/or the OS. And, yes, it's *just* an Agent restart so it won't directly impact your server's availability but, I'm still surprised that it wasn't resolved. Has anyone else found a bug (or even a documented "issue") that requires a service to be restarted - and you think it shouldn't?

And, well, that's definitely not the most interesting thing I've learned/found with SQL Server 2008, there's lots more (yes, I know - I need to blog a lot of it :). The most fun I've had has probably been learning sparse columns and filtered indexes. For right now, I want to give you some quick key points about some of the changes around indexes and in particular around sparse columns/filtered indexes:

  1. Do NOT create non-clustered indexes on sparse columns without filters (a filter that says WHERE sparsecolumn IS NOT NULL) because indexes do NOT have sparse columns in their definition. So, if you don't use a filter you will end up storing all of the NULL values in the index - which will waste a tremendous amount of space.
  2. A non-clustered index on a sparse column (without a filter) will be the same size as a nonclustered index on a non-sparse column (which is essentially the same point as above but, this might help clarify it a bit).
  3. Do NOT think that the increase in total indexes (from 250 in 2005 to 1000 in 2008) is because you should have more indexes... it's NOT. It's specifically because you might have a lot more columns (these changed as well from 1024 in 2005 to 30000 in 2008) because even that increase should not be used unless you're using sparse columns... So, you STILL want to use a lot of the best practices we've recommended in past webcasts, whitepapers, etc. and you still want to care about row size (and page density) and therefore work to create narrower tables (in general and relatively speaking - depending mostly on usage patterns). However, if you have a need for *lots* of wider tables BECAUSE you have an interesting set of properties that only some rows will have (i.e. the main reason to use sparse columns) then using sparse columns to handle these columns that will largely be filled with NULLs is a good thing. It's a good thing because a row that has a NULL for a sparse column will take ZERO BYTES. Absolutely no space is used for a sparse column that is NULL. So, this allows your tables to be wide (in definition) but your rows to be narrow (in practice). And, with well defined indexes you can VERY efficiently and effectively search on these properties.

OK, I hope to get some examples posted as well. Between Paul and I and our upcoming events - where we're demo'ing/discussing a lot of these principles, we'll plan to post a demo or two on how effective these really can be. And, I know... some of you will fight back with the thought that sparse columns introduce bad database design practices... I know, you want (instead) tables that have name/value pairs - which are the way most of us did this in all versions prior to 2008. However, name/value pairs tables become fragmented messes that also have fragmented indexes and therfore overall poor performance (for both inserts and queries). Not to mention, they can be difficult to query/join with (because you need to join multiple times to retrieve multiple properties) and the code gets messy quickly. Anyway, sparse columns - while they may not seem quite right at first - can really be a *MUCH BETTER* way to design (and perform!!!) around this problem.

And, speaking of events. We have a bunch of upcoming events... I'll give you the short bulleted list here because I'm sure many of you also read Paul's blog and he's mentioned quite a few of these coming up. Also, if you're interested in learning more details for these, the full abstract/links for many of these can be found on our Upcoming Events page. However, specific links are also listed below!

  • Best Practices in Performance and Availability for SQL Server 2005/2008, 1-3 September 2008 in Hatfield, England. You can get more details/register here
  • Dublin SQL Server User Group, Index Internals and Fragmentation, 4 September 2008, Dublin, Ireland. Bob Duffy blogged about this here and you can get more details/register here.
  • Microsoft SQL Academy 2008 - Session 1, 5 September 2008, Dublin, Ireland. You can get more details/register here.
  • Indexing for Performance in SQL Server 2000/2005/2008, 8-9 September 2008, Edinburgh, Scotland. You can get more details/register here.
  • SQL Connections "Power Workshops Series" Microsoft SQL Server 2008 Overview for Database Professionals (Hands-on — Bring Your Own Laptop), 6-8 October 2008, San Francisco, CA, USA. You can get more details/register here.
  • Microsoft Tech Ed EMEA ITPro, 3-7 November 2008, Barcelona, Spain. You can get more details/register here.
  • SQL Connections Fall Conference, 9-14 November 2008, Las Vegas, USA. You can get more details/register here.
  • PASS Community Summit 2008, 17-21 November 2008, Seattle, WA, USA. You can get more details/register here.
  • And, the last thing I'll leave you with is a recipe for the best darn Chocolate Chip Oatmeal Cookies *ever*. OK, I did a web search for exactly that (ok, it all started when Paul (who really doesn't really like sweets) decided that for his birthday he wanted Choc Chip Oatmeal Cookies (we asked becuase the girls and I wanted to bake something)). The search led me to here. And, they're definitely right that these are the best cookies ever! And, they make a few really good points:

    1. Don't microwave the butter... set it out for an hour before you're going to make the cookies. It's definitely different/better when the butter is naturally soft.
    2. Definitely take the cookies out when they look like they have a few more mins to go...

But, Paul's birthday was in July... and, it was not the only time we've made these cookies... mostly because *everyone* we introduce to these cookies is addicted - yes, the girls, me, Paul, my Mother, Brian Randell (who was forced to take some "togo" after a BBQ here) and countless of our other friends since this is now our "when-we-entertain" dessert ;-). So, since I've had the opportunity to tweak the recipe myself, here's what I do differently...

  1. I add less chocolate - usually 1-1.5 cups of chocolate chips. Sometimes I mix half/half semi-sweet and milk chocolate but I always use closer to 1 cup...
  2. I add about 1 cup of coconut - right at the end, with the chocolate chips...
  3. I make a single 12x18 cookie sheet "uni-cookie" and then I cut it up like brownies. As for the cook time, I've found that this giant cookie takes right about 14 minutes (but, you'll have to experiment with your oven... I usually take the giant cookie out when it's puffed up and the edges are just starting to go lightly brown). Oh, and you don't have to spread this out perfectly to the edges - it will expand and fill in. Just get somewhat close. The nice thing about this form factor is that it's a lot faster than making 36 individual cookies and then you get to choose the size(s) when you cut up the uni-cookie.

Oh, and don't blame me when you make 10 batches of these things and get all of your friends addicted as well. They really are yummy!!

Next up... installing Data Dude CTP16 with SQL 2008 and Visual Studio... what works with what and what's the best install order!!!

Thanks for reading!
Kimberly

I started this post while Paul and I were in the TechEd Bloggers Lounge... which, from the number of folks "stopping by" must be more of an online thing :). We did have lots of folks visit with us in the DAT "Green" area but in the bloggers lounge, I thought I'd blog (maybe that's the point?!). But, as luck would have it, I got side tracked at the end and ended up on the Women In Technology panel, then chatting with a customer, and then another... and, well, now it's Friday and time to pack up and head home. This year's TechEd was a combination of crazy busy highs with a very "where is everyone" lows. I asked about YOUR opinion of the split for TechEd this year and I think folks are waiting for the event to complete before they form opinions but for me, I felt like the event was smaller...without as much buzz as in TEs past. I've always really viewed TE as "the biggest US event of the year" where all of the buzz was created. It's where I often learn about new (usually outside of SQL Server :) technologies and where I hear some of the latest buzz. This year, I just didn't feel as though there was as much buzz (or as many people). And, I guess I've complained about how crowded it's felt in years past so I guess both have their ups/downs - but, overall, TE just didn't (in my opinion) have the same energy of TEs past... I'd definitely like to hear more opinions on this. Overall, it was nice to catch up with a few of the usual suspects.

As for a trip (and resources/tips/tricks) report... here I go :)

Monday was our "Overview of SQL Server 2008" precon and that went really well... we were completely exhausted by the end of the day but we had a lot of great feedback. The general feedback was that it was the perfect way to start the week as it allowed people to get a really good more-than-just overview of the new features and help to determine which topics/sessions might be the most relevant for folks to attend. Also, I learned about a Performance Dashboard bug here that I thought I'd pass on as well - it's not likely to happen to everyone and it's related to a potential timeout problem. I didn't find too many references to this problem though so if someone does have a more detailed link - let me know. The end result (a gentleman named Leif sent this script to me) is that you need to tweak the C:\Program Files\Microsoft SQL Server\90\Tools\PerformanceDashboard\setup.sql file so that the stored procedures deal with the timeout correctly. The change is to line 276:

From:
sum(convert(bigint, datediff(ms, login_time, getdate()))) - sum(convert(bigint, s.total_elapsed_time)) as idle_connection_time,

To:
sum(convert(bigint, CAST ( DATEDIFF ( minute, login_time, getdate()) AS BIGINT)*60000 + DATEDIFF ( millisecond, DATEADD ( minute, DATEDIFF ( minute, login_time, getdate() ), login_time ),getdate() ))) - sum(convert(bigint, s.total_elapsed_time)) as idle_connection_time,

Tuesday, I delivered my Indexing session "Are your indexing strategies working?" and the group really enjoyed some of the things I focused on - from DMVs to DTA to SSMS to Performance Data Collection in SQL Server 2008... there are many aspects to indexing and depending on your environment some are more relevant/important than others. There are definitely a lot of things to talk more about here... INCLUDE columns, statistics... In the interim, I've posted all of my demo scripts to our past conferences page here

Wednesday, Paul delivered his Corruption session and it was excellent. He showed a variety of different situations that all have different workarounds - as well as discussed a few that don't. He's posted a few things about these sessions and he's going to slowly detail each of his demos in blog posts.

Thursday, we delivered a Database Maintenance session which was a "Top 10", per se. This was a great session for us but I wish we had done it as our first session instead of our last. I felt like this session was a wider session (in terms of topics) and our other two were deeper sessions (in terms of technical content)... But, it was still great fun to do. The thing that made it the most fun for me is that I decided at about 5pm the day prior that I wanted to tweak my demos a bit... and, 14 hours later, I was done. I decided that I would create two copies of a database - one for each of two laptops and then I would work really hard to create a "best practices database" and a "worst practices database". What was most stunning to me was that the cumulative effect of all of the worst practices was actually a lot worse than I had thought it would be. In the first test run I did on stage my best practices database (before the index maintenance) ran at about 3500 measured statements per second. My worst practices ran at about 71. I absolutely did not expect that large of a difference... and, one of the things I'm truly looking forward to doing is breaking each of the worst practices down and comparing the breakdown of each to find what has the greatest impact.

The week was a big success overall and it was great to see a few folks that we seem to only see at these types of events....... see you next year!

Cheers,
kt

OK, I first posted on some of the limitations to indexes in SQL Server 2005 and 2008 in part one here. Now, I want to dive into index internals for a post (or two). And, I often get the question “who is the best audience for your blog – or, for this post” and well, that’s a bit hard to answer. At SQL Connections in Orlando, I delivered a session titled: Index Internals & Usage and while we (fyi – Paul and I co-chair the SQL Connections portion of “DevConnections”) put it in the "developer-focused track," it was more of a Dev/DBA "hybrid" session with the emphasis on database development and best practices in creating and managing indexes (rather than management/maintenace/operational tuning - which is more for DBAs). Here at TechEd this week, I'm going to focus more on the management/maintenace/operational tuning side with a session called Are your Indexing Strategies Working? I'll also do a complementary blog post for that as well...

Having said that thought, indexes are definitely in a group of topics - very much so related to performance and scalability (index internals, indexing strategies, log maintenance, general database maintenance) which really needs to cross almost all database-related disciplines (dev, admin, ops, etc…). If you work with SQL Server in almost any capacity, you need to get a feel for at least some aspect of indexing for performance.

So, for this post, I’m continuing with some internals. In the first post (in this series), I wrote about limits. Limits/boundaries are interesting to discuss but it's also important to remember that good performance takes a lot more than just staying within the bounds of what’s possible. Creating indexes solely because you can – without reason and only with upper limits in mind – can be even worse than under indexing. So, if you find that you're wanting more about indexes (I have many blog posts that are solely Q&A posts), check out my Indexing category here. Now that you know how many indexes you can create, a better question would be when is it appropriate to create indexes at all?

So, what is “finding the right balance” in indexing? In my opinion, there are three requirements/pre-requisites:

  1. knowing the data
  2. knowing how the users use the data
  3. knowing how the underlying structures and database stores/manipulates and uses indexes

Bringing all of these things together is what I try to do in my workshops, seminars and lectures – in this post, I'll start with a smaller more digestible piece - internals.

Indexes have 2 components: a leaf level and a non-leaf level (or b-tree). The non-leaf level is interesting to understand and discuss (in terms of internals) but simply put, it’s used for navigation to the leaf level (more than anything else). So, we'll start with the leaf level (as does SQL Server - the leaf level is always built first). The leaf level of an index contains something (I’ll explain more coming up) for every row of the table in indexed order (note: I am focusing on traditional indexes in every release from SQL Server 2000 up to and including SQL Server 2008 – with the exception of filtered indexes which I will write about in a later post). Once the leaf level is built, non-leaf level(s) can be built to help navigate to the leaf level but the architecture is rather straightforward. The non-leaf level stores something for every page of the level below – and levels are added (each smaller than the previous because each level only contains one the first entry from every page) until the index gets to a root of one page. While it sounds like this could result in a lot of levels (ie. a tall tree), the limitation on the size of the key (which has a maximum of 900 bytes or 16 columns) helps to keep index trees relatively small. In fact, in the example I’ll show coming up – which has a fairly large (large meaning WIDE) index and has a key definition which is at the maximum size – even the tree size of this example index (at the time the index is created) is only 8 levels high/deep…

To see this tree (and the math used to create it – which is the same thing that SQL Server would go through to create it), we’ll use an example where the leaf level of the index contains 1,000,000 “rows.” I put quotes around “rows” because I don’t want to imply that these have to be data rows – these are really just leaf level rows and I’ll explain more on what leaf level rows can be... The leaf level rows are 4,000 bytes per row (therefore only 2 rows per page) or 500,000 pages. This is not ideal but at least the pages are almost full and we’re not wasting a lot of space – if we had two 3000 byte rows we’d still only fit 2 per page and then we’d have 2,000 bytes of wasted space. Now, as for why these are just “rows” and not specifically data rows is because this leaf level could be the leaf level for a clustered index (therefore data rows) OR these leaf level rows could be rows in a non-clustered index that uses INCLUDE (which was new to SQL Server 2005) to add non-key columns to the leaf level of the index (which therefore creates wider leaf rows (wider than the 900 bytes or 16 column maximum). Again, while this doesn’t currently sound interesting, I’ll explain why this can be beneficial coming up (possibly in another post depending on how long this particular post becomes… J).  

The leaf level of this index would result in a 4 GB structure (and this is only at the time it’s created – if a lot of rows are added and the key is not ever increasing then this structure could become heavily fragmented and therefore much larger/taller). In this case, it’s relatively large (again because of “row” width) and with an index key of 900 bytes you can even see that in this case, the tree would be relatively small and only result in 8 levels – as shown below.

Root page of non-leaf level (Level 7) = 2 rows = 1 page

Intermediate non-leaf level (Level 6) = 15 rows = 2 pages (8 rows per page at 900 bytes)

Intermediate non-leaf level (Level 5) = 122 rows = 15 pages (8 rows per page at 900 bytes)

Intermediate non-leaf level (Level 4) = 977 rows = 122 pages (8 rows per page at 900 bytes)

Intermediate non-leaf level (Level 3) = 7,813 rows = 977 pages (8 rows per page at 900 bytes)

Intermediate non-leaf level (Level 2) = 62,500 rows = 7,813 pages (8 rows per page at 900 bytes)

Intermediate non-leaf level (Level 1) = 500,000 rows = 62,500 pages (8 rows per page at 900 bytes)

Leaf level (Level 0) = 1,000,000 rows = 500,000 pages (2 rows per page)

 

Having said that though, this is NOT a goal. :) In more realistic scenarios [where the key is much smaller and] even when there are more rows, there are fewer levels (3-4 is quite normal). Most importantly, the size of an index (and the number of levels) depends on two things – the width of the key (in terms of the number of bytes) and the number of pages in the leaf level of the indexes. The number of pages in the leaf level of an index depends on the number of rows and the size of the rows (again, in terms of bytes) of the rows in the leaf level.

You can see the size of your index by using one of the following commands:

In SQL Server 2000: DBCC SHOWCONTIG … WITH ALL_LEVELS

In SQL Server 2005/2008: querying the dmv: sys.dm_db_index_physical_levels

To see the syntax of these commands and their output, we’ll use some structures created in the credit sample database. Using credit, you can see exactly how these commands work and how they return the details about every level.

NOTE: you can download a zip of a SQL Server 2000 backup of this database here – and since this is a SQL Server 2000 backup, you can restore this to SQL Server 2000, SQL Server 2005 or SQL Server 2008.

USE credit
go

SELECT *
FROM sys.dm_db_index_physical_stats
    (db_id(), object_id('Charge'), 1, NULL, 'DETAILED')
go

DBCC SHOWCONTIG('charge', 1) WITH ALL_LEVELS, TABLERESULTS
go

Using the DMV or DBCC SHOWCONTIG you can get the same picture of the charge table. Using the detailed (or ALL_LEVELS) parameter, you get the entire structure (all levels) for the clustered index (index_id = 1 is always the clustered index, IF the table is clustered). The reason it returns all levels is that the 'DETAILED' mode has been specified.

The clustered index in this table has 1,600,000 rows (DMV column: record_count or SHOWCONTIG column: rows) and these are stored on 9303 pages (DMV column: page_count or SHOWCONTIG column: pages). If you read to the next level which is level 1 because the leaf level is level 0 (remember index levels always start with the leaf level 0 and then go up to the root), you can see that it's number of "rows" is equal to the number of pages in the leaf level... and this keeps going until you get to a root of 1 page. In this case, the clustered index (which is the widest structure of the table) has a very narrow clustering key (the key is on charge_no which is an int) only has a total of 3 levels even though the table has 1,600,000 rows. Ideally, you should run this on a few of your production tables (in a development/test environment) and you can start to get some insight into how big your structures are. However, a BIG factor that you might see in production is fragmentation. If a particular level (or levels for that matter) are heavily fragmented then each level might be wider and less compact (and therefore less performant). Reviewing the DMV columns avg_fragmentation_in_percent and avg_page_space_used_in_percent, you can get a feel for how full each page is. Poor page density reflects that your pages are not as full as they could be but there are many factors for why this is the case: bad row size, splits due to inserts, splits due to updates of varchar columns or even a poorly chosen fillfactor that has left too much space on the pages. However, page density is only one piece of the puzzle and if your avg_fragmentation_in_percent is very low (0-5%) then I wouldn't be over worried about your pages not being entirely full unless you have the time to possibly re-design tables (eg. vertically partition them) and then rewrite your applications to direct your statements at only the appropriate base table. But, another factor to consider is the rate at which your fragmentation occurs as well as when you can fix that fragmentation. This is a HUGE discussion that requires time... And, I want to get back to index structures for now. However, both Paul and I have blogged quite a bit about rebuilding v. defragging indexes and what those operations do/how, etc. In fact, just today, Paul has blogged a Q&A about myths and misconceptions about index rebuild operations. So, I'll get back to internals for now! :)

You can use LIMITED (which is the default mode), SAMPLED, or DETAILED. All three have excellent uses and all use IS locks (to minimize blocking). Limited gives you a quick overview of fragmentation and mostly describes how intact and in order the levels are. Limited is quite clever in that it only scans the first non-leaf level above the leaf to determine how much fragmentation there is... since the non-leaf level always tracks the first entry (and a pointer to the page) then they know EACH and EVERY page in the leaf level by ONLY reading the non-leaf level (which is [typically] a lot smaller and therefore faster). However, because they don't touch every page and determine page density then they only track how out of order the levels are and not how dense/full the pages are (which is also a form of fragmentation). So, if you want a bit more details, you can use SAMPLED. The SAMPLED mode returns the fragmentation from reading every 100th page of the index (or heap). If the table has less than 80MB used (which is 10,000 pages), every page is read instead (which is a DETAILED scan). The DETAILED mode reads every page of every level to calculate the most accurate picture of your tables fragmentation. This is the best form of analysis but also takes the most time.

If you’re interested in learning a few more of the tips/tricks with using this DMV, check out the following script: Using dm_db_index_physical_stats.zip (2.23 KB)

A favorite tip is that the database in which you want to analyze tables does NOT have to be in 9.0 compatibility mode in order to use this DMV. Don’t get me wrong, you will get errors if you try to use this DMV in a database that’s not in 9.0 compat mode; however, if you are in master (which is set appropriately and cannot be changed) and then use the first parameter to target a non-9.0 compat mode database, then this DMV works great. However, a second "gotcha" is for parameter 2... as long as you don’t use 2-part naming for the objectname (2nd) parameter, everything will work as expected. If you specify object_id('tablename') from master for a table that's in credit then object_id will return NULL. The query will still run but against all tables in credit rather than the one you thought you were targeting. If you want to use this DMV across databases, you will need to supply the database name in the first parameter and then make sure that you use 3-part naming for the second parameter.

Now that you are getting to know some of the structures (in terms of seeing physical structures and internals), where do we go from here? The best route to start “finding the right balance” for performance is to know the data and as well as get some general insight into usage patterns (this is probably the hardest component to know and sometimes you only know exactly what’s going on if you profile what’s actually happening in production – is that too late? To a certain extent yes and to another extent no…there are still many things for which you can plan and other things you can confirm or test once the application is running (i.e. Profiler). All of those things together are going to help to “find the right balance”.

Having said that, and having discussed the general internals of a b-tree (and therefore an index structure), what’s the difference between a clustered and non-clustered index? Well… stay tuned, that will be part 3 in this series. And, then (finally), we'll get to appropriate uses for INCLUDE (which was new for SQL Server 2005) and then appropriate uses for Filtered Indexes (a new feature in SQL Server 2008). Also, somewhere in there I'll post a few tips from my TechEd session so that you can start to determine if your indexing strategies are working??

Thanks for reading!
kt

Memorial Day weekend we were in Chicago to celebrate my Father's life. We did a "Celebration of Life" memorial and we had a few drinks (celebratory Meyers, Tonic and lime - which was my Father's favorite drink), we (7 of us) gave a few heartfelt speeches, and a few friends wrote a song (and passed out the words - to which we all sang along) and we grieved... but, in a refreshingly-not-overly-depressing way. I have to admit - it was exactly what I'd want as well. It was a wonderful day filled with memories and friends. After that, we visited with my Mom as well as my Grandmother. Paul blogged a couple of pics (yes, that chair is VERY big!). Then, we were back in Seattle for only one week...back to work...and preparing for TechEd 2008 ITPro week.

As for TechEd being spread over 2 weeks, well... I think it offers some excellent logistical options (smaller size means more possible venues AND/OR it means that they could possibly grow the size for each event). And, for some topics, I think there is a very strong separation between developer and ITPro (admin/ops) but for SQL - I think it's hard to get it perfectly right. I think there's a lot of developers who need to know more about admin/ops just so that they can develop more optimal (and even manageable solutions) and I think that DBAs should have a really good architectural overview of a lot of features to better administer them. So, for SQL, I'd *love* to hear your comments on what you think............

For Paul and I, the decision is relatively simple, we came for this second week for ITPro/Ops. But, we've also spoken at the developer events (and sometimes we even write/present sessions specifically targeted at developers at our SQL Connections shows and/or at User Groups (we just did a local .NET user group in Redmond and the discussion around Indexes became so popular that we're going back in August (for Indexes) and again (tbd) for Disaster Recovery techniques). Basically, developers tend to say...oh, that's why I should x or y or z....... so, maybe next year we'll hit both? Regardless, I'd still like to know what you think? Were you at the Developers event? Do you wish you could be at both? Are you at both?

As for what we're doing - Paul's already blogged it here: http://www.sqlskills.com/blogs/paul/2008/06/05/OffToTechEdUSITProTomorrow.aspx.

But, I thought I'd do a quick recap so that you can get some insight into our week as well as where to find us to come and chat. We'd love to meet you and/or hear your success (or disaster) stories!

Monday

  • Full day pre-con seminar: SQL Server 2008 Overview for DBAs

This is ACTION packed (and a very full day!) and will include giving out the updated SQL Server 2008 HOLs DVD. We weren't really sure we were going to be able to do it... we didn't really burn too many of the CTP6 version of the DVDs, nor were we sure that CTP6 would still be the most relevant. But, it's still excellent to learn on and this time our DVD includes 17 labs:

HOL Lab Filename

Length

Lab Name

Lab Abstract/Description
Using Policy-based Management.doc 75 minutes SQL Server 2008 Policy-based Management Security, best-practices, proper configuration settings - how do you control these things on one or more server? These hands-on labs show you how to implement and leverage the new policy-based management framework to define and control your business rules and your server's compliance for one or more instances of SQL Server 2008. 
Data Recovery Preventative Techniques.doc 75 minutes SQL Server 2008 Data Recovery and Preventative Techniques Hands-on Lab Can you recover from a dropped table? Can you reconcile tables that have become out of sync due to human error? These hands-on exercises show you how to bring a database back online quickly after a table is dropped as well as how to reconcile the differences between a production environment and a recently restored version of your database - so that you can manually merge the recovered data back into your production database. Once all of the recovery techniques are shown, the last exercise shows how DDL triggers can prevent some of these human errors altogether.
Table and Index Partitioning.doc 75 minutes SQL Server 2008 Table Index & Partitioning Hands-On Lab Table and Index Partitioning allows large tables to be managed more granularly. These hands-on labs show you how to implement and leverage these key features: a partition function, a partition scheme and the sliding window scenario.
Database Mirroring Part I.doc 75 minutes SQL Server 2008 Database Mirroring, Part I Database Mirroring allows you to create a secondary (mirror) database to handle requests either automatically or manually, in the event of a diaster at the principal database. These hands-on labs show you how to implement and leverage as well as when and how to use Database Mirroring. You will setup database mirroring in a High Availability configuration (synchronous mirroring with a witness), see the affects of failover, and see how automatic page repair restores damaged pages in the principal or the mirror.
Peer to Peer Replication.doc 75 minutes SQL Server 2008 Peer-to-Peer Replication Hands-On Lab Replication gives you a scale-out configuration where multiple servers participate in bi-directional transaction replication. Setting up and configuring this topology has a few requirements - many of which are minimized by using the Replication Wizards - but all need to be understood to configure a peer topology correctly. These hands-on exercises show you how to implement a peer topology correctly.
Using Performance Data Collection.doc 75 minutes Performance Data Collection in SQL Server 2008 Performance Data Collection brings together many key tuning features into one cohesive toolset. These hands-on labs show you how to create a Management Data Warehouse, setup and control the collection intervals and analyze the results of system data collection sets.
Instant Initialization.doc 45 minutes SQL Server 2008 Instant Initialization Instant Initialization allows data files of any size to be created instantly - eleminating zero-initialization. These hands-on exercises show you how to configure your server's permissions to leverage instant as well as the security vulnerability created by enabling this feature.
Online Operations.doc 75 minutes SQL Server 2008 Online Operations Hands-On Lab Online Operations are critical to the success of any server that needs to be highly available. These hands-on labs show you how to implement and leverage these key features: online index operations, partial database availability and online piecemeal restore.
Database Development Clients Lab.doc 120 minutes SQL Server 2008: Database Development Hands-On Labs The goal of these hands-on lab materials is to get an understanding of when to use one or more of the advanced features of SQL Server 2008 Database Development. After completing these self-paced labs, you will be able to:
* Set up a Database Project using Visual Studio 2008 Team System Database Edition
* Make changes to the database schema and deploy those changes
* Create and edit a project that uses the LINQ to SQL Object Relational mapper
* Use LINQ to SQL to query and maintain a SQL Server database using the managed classes
* Use LINQ to SQL with stored procedures
* Create and edit a project that uses the ADO.NET Entity Data Model mapper.
* Use the ADO.NET Entity Data Model to model a many-to-many relationship in the database
* Use the ADO.NET Entity Framework classes and LINQ to Entities to query and update a database
* Use Visual Studio 2008 to quickly get an ADO.NET Sync Services application up and running.
* Set up SQL Server 2008 Change Tracking
* Use ADO.NET Sync Services with SQL Server 2008 Change Tracking
Snapshot Isolation.doc 75 minutes SQL Server 2008 Snapshot Isolation Hands-On Lab The goal of these hands-on lab materials is to get an understanding of the appropriate uses of transaction isolation levels as well as how snapshot isolation affects conflicting readers and writers.
Database Mirroring Part II.doc 120 minutes SQL Server 2008 Database Mirroring, Part II Part II of the Database Mirroring HOLs allows you to go through setup, implementation and numerous failover scenarios - step-by-step. While Part I offers quicker setup through SQLCMD scripts, Part II works through the setup process more slowly allowing you to see how things work together. This lab is longer but also goes through changing the mirroring configuration as well as forcing failover. Part I should be completed first and Part II should be completed only if time permits.
Service Oriented Database Architecture.doc 120 minutes SQL Server 2008 Service Oriented Database Architecture Hands-On Lab Manual The goal of these hands-on lab materials is to get an understanding of how and when to use Service Broker in deploying a service-oriented database application.
Database Snapshots.doc 75 minutes SQL Server 2008 Database Snapshots Hands-on Lab The goal of these hands-on lab materials is to get an understanding of how to use the Database Snapshot feature of SQL Server 2008. After completing these self-paced exercises, you will be able to:
* Understand how to create a database snapshot
* Understand how to investigate file sizes and sparse file configuration (using both T-SQL queries and Windows Explorer)
* Understand the benefits and challenges with creating multiple snapshots
* Understand how a database snapshot is created when transactions are in flight as well as when they're not
* Understand how to use database snapshots for testing and reverting databases
* Understand the requirements to drop database snapshots and drop databases that have database snapshots
* Understand how to create a database snapshot on a mirror database
Dynamic Management Views.doc 75 minutes Understanding and Using DMVs Hands-on Lab The goal of these hands-on lab materials is to get an understanding of the more advanced new features of SQL Server 2008 that give access to server information that can be used for performance tuning, server health monitoring, and problem diagnosis. After completing these self-paced labs, you will be able to:
* Determine what DMVs exist, what their input parameters are, how and where their data is stored, and be able to persist DMV data to your own tables.
* Access information from the query plan cache, including determining frequently executed queries and their query plans.
* Access physical statistics information about indexes (e.g. fragmentation).
* Access information about tempdb space utilization.
Resource Governor in Action.doc 45 minutes SQL Server 2008 Resource Governor Hands-on Lab The goal of these hands-on lab materials is to get an understanding of when to use one of the more advanced features of SQL Server 2008: Resource Governor.  After completing these self-paced labs, you will be able to:
* Understand appropriate uses for Resource Governor
* Create Resource Pools
* Create Workload Groups
* Monitor Resource Usage
Understanding Spatial Data.doc 120 minutes SQL Server 2008: Understanding Spatial Data Hands-on Lab The goal of these hands-on lab materials is to get an understanding of one of the more new features of SQL Server 2008:  Spatial Data Support. After completing these self-paced labs, you will be able to:
* Understand what spatial data is
* Understand the different types of spatial data
* Create instances of spatial data
* Investigate the properties of spatial data
* Query the relationships between different instances of spatial data
* Integrate spatial data into a managed code application
* Move spatial data between managed code and the database
* Create spatial data graphically using WPF 
Using SQLCMD.doc 75 minutes Understanding Command-line Management with SQLCMD in SQL Server Hands-on Lab The goal of these hands-on lab materials is to get an understanding of one of the more new features of SQL Server 2008:  Spatial Data Support. After completing these self-paced labs, you will be able to:
* Use SQLCMD with an initialization file, system environment variables and parameters to create customized “master” scripts for automation and administration
* Use SQLCMD and SQL Server Management Studio with the Dedicated Admin Connection for troubleshooting
* Use SQL Server Management Studio to modify and execute SQLCMD mode scripts
* Upgrade databases from SQL Server 2000 using a parameterized script running with SQLCMD

This is a GREAT resource for playing with a lot of these new technologies and it's exciting that we have enough copies to give away to our pre-con attendees! A few of these are featured as HOLs at this year's TechEd as well and some of these can also be found with our prior Jumpstart resources.

Tuesday

  • 13.15 - 14.30 (Room N230) DAT354 Are Your Indexing Strategies Working? (featuring me as speaker/presenter)
  • 15.00 - 16.00 (TechEd Online Stage) Panel: Leveraging SQL Server Technologies to Build a Solid High-Availability Strategy
  • 16.00 - 18.00 DAT track booth (green)

Wednesday

  • 10.15 - 11.30 (Room N220D) DAT375 Corruption Survival Techniques: From Detection to Recovery (featuring Paul as speaker/presenter)
  • 11.30 - 14.45 DAT track booth
  • 15.00 - 16.00 Blogger's Lounge

Thursday

  • 10.15 - 11.30 (Room S230E) DAT363 Essential Database Maintenance (we're co-presenting this one)
  • 11.45 - 13.00 Speader Idol judging (I'll do my best to play Paula and I have hopes that Paul won't play Simon but he does have that British thing going for him)
  • 14.30 - 18.00 DAT track booth

Friday, we fly home... then, we're going to test all of our dive gear and take a little dive vacation at the end of the month. Hopefully, I'll be able to post a couple of nice underwater shots!

Oh, and I've finally tweaked my Indexing post (the one that survived the drive corruption - oh, but as an update to that post... Even though I got that drive repaired, almost all of the jpgs, some of the pdfs and even a few of the Office files were still corrupt. Office opened a few of them and "repaired" them on open (which was really cool) but I did lose the photos I had taken that weekend (well, all of the ones that I removed from my camera's SD card). Anyway, I plan to (well, hope to) post the Indexing post (part 2 to this one) tomorrow!

Cheers,
kt

PS - It's hot as hell here... and the humidity is NASTY!!! But, it beats the SOLID rain that we've been having in Seattle...

In my blog post on my new sp_helpindex proc (sp_helpindex2), I mentioned that the indexes in my sample were not necessarily a recommended set of indexes - just a test set of indexes. So... in this post, I thought I'd start a series on indexes, limitations and best practices/uses... Especially, why/how to best choose when to use INCLUDE v. having columns in your key. To start, I thought I'd give some background, as well as limitations that exist in various releases from 2005 to 2008 CTP6 (Feb CTP), plus what's expected in the SQL Server 2008 RTM (ah... I did say "expected" so don't come back and yell at me if/when I'm wrong ;-))

First, let's go through a few rules and limitations and background:

SQL Server 2005

  • 250 total indexes per table: 1 clustered index and up to 249 nonclustered indexes (no, this is not a goal!)
  • The index key can be up to 16 columns OR 900 bytes - whichever comes first...
  • The leaf level is limited just as a table is limited to 1024 columns (and, all column types are acceptable in the leaf level of an index - even LOB columns)
  • Statistics are kept for every index (so, up to 250 index-related stats) and there can also be statistics on columns or sets of columns in addition to the index-related stats. In earlier releases, statistics used index ids and as a result, the number of statistics were limited to 250 total minus the statistics used by indexes... in SQL Server 2005, they changed to having statistics kept/managed separately (use sys.stats to see them). As a result of using sys.stats, you can now have 2000 statistics on a table, in addition to the 250 (total) indexes and their statistics. If you want to test this out (and check it on various versions of SQL Server), use this script to setup a test database, a test table and then use dynamic string execution to loop through (until it errors) with creating nonclustered indexes and statistics: testindexmax.zip (.47 KB)

SQL Server 2008 CTP6

  • So far, it seems as though most of the maximums have not yet been lifted...
  • 250 total indexes per table: 1 clustered index and up to 249 nonclustered indexes (and this number  - for CTP6 - includes filtered indexes AND spatial indexes too!)
  • The index key limit hasn't changed (it can be up to 16 columns OR 900 bytes - whichever comes first)
  • The leaf level is still limited just as a table is limited to 1024 columns (and, all column types are still acceptable in the leaf level of an index)
  • Statistics in CTP6 seem to be limited to only 2000 stats per table...

SQL Server 2008 RTM (expected/target... no guarantees on this one :)

  • 30,000 columns per table (mostly to allow sparse columns)
  • 1,000 total indexes per table: 1 clustered index and up to 999 nonclustered indexes. This is also not a goal BUT, it makes sense because of both sparse columns and filtered indexes. Both Paul and I will try to post some entries about sparse columns and filtered indexes in the coming days...
  • The index key limit won't change
  • The leaf level is will be limited just as a table is limited to 30,000 columns (and, all column types are still acceptable in the leaf level of an index)
  • Statistics are also said to be increasing and likely to 30,000... And, for having extra statistics just sitting around and possibly not being used - well, outside of a minimal amount of disk space taken by the stat blob (which does start to get interesting at 1,000s I suppose), even stats that don't get used don't really create much of a problem. So, I'm OK with this one increasing - even significantly - but I have to admit I'm somewhat nervous about the significan't increase in indexes.........

So... you can have A LOT more indexes in SQL Server 2008 but just because you can - DOES it mean that you should?!

And on that - I'll leave you hanging for my next post where I start to talk about WHY they're increasing this (hint: sparse columns and filtered indexes = more columns/more indexes)....

Have fun,
kt

OK - so this has been frustrating me for many months... when you create indexes with included columns (which was a new feature of SQL Server 2005), they're not shown by sp_helpindex or by DBCC SHOW_STATISTICS. I understand this not showing for statistics because included columns are not factored into the histogram (that's only the high order element which is the first column in the index) OR the density vector (which only shows the densities (or averages) for the left-based subsets of the key). So, why doesn't sp_helpindex show it? Well... I guess it just didn't get updated for SQL 2005. So, in SQL 2008, I was hoping I'd not only see included columns but also filtered indexes... well, neither is there and sp_helpindex is still the same old proc. Don't get me wrong, you can use SSMS to see all of the index properties for a single index (pane, by pane for each property) OR you can run queries to find the included columns for a given index:

SELECT
(CASE ic.key_ordinal WHENTHEN CAST(AS tinyint) ELSE ic.key_ordinal END) AS [ID],
clmns.name AS [Name],
CAST(COLUMNPROPERTY(ic.object_id, clmns.name, N'IsComputed') AS bit) AS [IsComputed],
ic.is_descending_key AS [Descending],
ic.is_included_column AS [IsIncluded]
FROM sys.tables AS tbl
   
INNER JOIN sys.indexes AS
      
ON (i.index_id >AND i.is_hypothetical = 0) AND (i.object_id = tbl.object_id)
   INNER
JOIN sys.index_columns AS ic 
      
ON (ic.column_id >AND (ic.key_ordinal >OR ic.partition_ordinal =OR ic.is_included_column != 0)) 
         
AND (ic.index_id = CAST(i.index_id AS int) AND ic.object_id = i.object_id)
   INNER
JOIN sys.columns AS clmns 
   
ON clmns.object_id = ic.object_id AND clmns.column_id = ic.column_id
WHERE (i.name = N'[MyIndex]') AND ((tbl.name = N'[MyTable]' AND SCHEMA_NAME(tbl.schema_id) = N'[MySchema]'))
ORDER BY IsIncluded, [ID] ASC

but, there isn't a nice clean way to show all of the included columns for all indexes for a particular table... until now :)

A couple of weeks ago I sat down and rewrote sp_helpindex. I was actually on a plane from Hyderabad to Frankfurt or from Frankfurt to San Fran or from San Fran to Seattle (it was a long day :) and I was using (and well, forcing myself to learn how to use :) my new Vista laptop. OK, that's a HUGE story in and of itself and it definitely warrants its own post but I'll sum up the story with the fact that I had to purchase a new laptop while in Hyderabad because BOTH my primary laptop (T61p) AND my backup laptop (T60p) BOTH (yes, BOTH!!!) suffered catastrophic disk failures on their boot drives within 24 hours of each other. In the end, I really cannot believe the "coincidence" of two laptops crashing within 24 hours of each other. Yes, I thought MTBF too (at first) but the laptops were two Lenovos - one Lenovo (the T60p) was purchased in Feb 2007 and the second, a Lenovo T61p was purchase in Oct 2007. And, it was the T61p that went first. The only thing I can even begin to speculate about and/or think to attribute it to (as I was in India for 17 days from Mar 3 through Mar 20 and this all started on Mar 17) was an overactive metal detector at the hotel at which I was staying (or something related to St. Patrick but I've since ruled that out - and no, I wasn't drinking green beer either...). OK, I really need to do another post to give you all of the details about this trip BUT, I did get a new laptop... and, having just bought it only shortly before I flew back I felt like I really needed to get my money's worth so I just *had* to work on the flights home (ah, security with *3* laptops was NOT fun and I'm *VERY* glad that none of them asked me to "boot" my laptops to prove they were working... that could have been a VERY bad situation... lol).

OK - so back to the story... I was working on the flights and I was preparing to deliver some content on the Friday after I returned (yes, I taught a full day in India on Wednesday then flew back leaving India at 2:15am Thursday morning so that I could arrive back in Redmond at roughly 7pm Thursday night - about 30 hours later - and then teach Friday morning for an 8:30 start time... ah, I was *really* tired on Friday night :). Anyway, in preparing, I decided that I finally needed to re-write sp_helpindex. When I was first writing it, I was only thinking of SQL Server 2005. So, here's the 2005 version that I wrote: sp_helpindex2_2005.zip (2.71 KB).

So, I had wanted to blog that when I got back to Redmond but in preparing for the trip we're on now AND rebuilding my primary and backup laptops, well, it got tabled. So now, today, Paul and I are in Iceland (working with our great friends at Miracle Iceland) and we're teaching "the Accidental DBA" (this past Monday) and SQL Server 2008 New Features in Database Infrastructure and Scalability (Tue through Thursday)... I was giving a lecture on Filtered Indexes in SQL Server 2008 and I, once again, found myself needing a better sp_helpindex. So, when Paul got up to talk about Compression (which is no short lecture for him :), I had time to rewrite sp_helpindex... again. And, here's what I ended up with...

exec sp_helpindex2 'member'

index_name index_description index_keys included_columns filter_definition
member_corporation_link nonclustered located on PRIMARY corp_no NULL NULL
member_ident clustered, unique, primary key located on PRIMARY member_no NULL NULL
member_region_link nonclustered located on PRIMARY region_no NULL NULL
NCIndexCoveringLnFnMiIncludePhone nonclustered located on PRIMARY lastname, firstname, middleinitial phone_no NULL
NCIndexCoversAll4Cols nonclustered located on PRIMARY lastname, firstname, middleinitial, phone_no NULL NULL
NCIndexLNinKeyInclude3OtherCols nonclustered located on PRIMARY lastname firstname, middleinitial, phone_no NULL
NCIndexLNOnly nonclustered located on PRIMARY lastname NULL NULL
QuickFilterTest nonclustered located on PRIMARY lastname phone ([lastname]>'S' AND [lastname]<'T')

So, in the end, I can quickly see whether or not my index has included_columns and/or a filter_definition. Don't get me wrong, these indexes above are NOT necessarily a good combination of indexes (or recommendation of ANY kind) to have - these were just created to make sure that my code works. And, as my good friend Gunnar would say - "it's not my best code but it's not my worst code either" <G>. And, so, here it is: sp_helpindex2_2008.zip (2.75 KB).

Pretty darn useful for sure! Oh, and I used the undoc'ed sp_MS_marksystemobject so that I could still create the sp_ in master but then execute it in all other databases. It's frustrating that this behavior (with sp_ named objects) no longers works in 2005/2008 but at least the sp_MS_marksystemobject still sets the behavior so that we can create this one proc in master but use it in all other databases.

Have fun!
kt

A couple of weeks ago, Paul and I recorded two interviews with TechNet Radio... both are ready for download and in multiple formats! 

Our specific interviews can be downloaded from the following links/formats:
  SQL 2008 Part 1 of 2: Security and Availability WMA | MP3 High | MP3 Low
  SQL 2008 Part 2 of 2: Management, Troubleshooting and Throttling  WMA | MP3 High | MP3 Low
  More TechNet Radio interviews (and *lots* of other shows), can be found on Channel 9.

Enjoy! 
kt

Categories:
Events | Resources | SQL Server 2008

(Cross-posted on Paul and Kimberly's blogs)

With the Spring SQL Server Connections show coming up next month, it's time to start planning for the Fall show. Once again, Kimberly and I will be co-Chairs of the Conference and we'd like to invite you to submit abstracts for the Fall 2008 SQL Connections conference, to be held in Las Vegas, November 10-14th, 2008.

The conference will take place shortly after the SQL Server 2008 launch (when it actually RTMs, not the 'launch' that happened February 27th), and will focus heavily on SQL Server 2008. Abstracts are still welcome on best practices for SQL Server 2005 and how to upgrade and migrate applications from SQL Server 200x to SQL Server 2008. However, we will consider all topics - from design to performance to troubleshooting to architectures to programming. Also, we're likely to closely follow our past conference format with three primary themes/tracks: Infrastructure and Scalability, Programming, and Business Intelligence. You should aim for the intermediate to advanced audience for any SQL Server 2005 materials and for SQL Server 2008 we'll consider any level - especially depending on the newness of the release.

Note: You will not be able to speak at a DevConnections show if you are also presenting at a competitor’s show, in the same state, within 30 days of the DevConnections show. To be more specific, if you wish to speak at SQL Connections or any other DevConnections show this Fall in Las Vegas, you are precluded from speaking at any competing conferences in the state of Nevada between October 10, 2008 and December 13, 2008.

For submitting session abstracts, please use this URL:  http://www.deeptraining.com/devconnections/abstracts

The tool will be open from March 10th to midnight EST April 9th, after which we won't accept any abstracts. If you have used this site before and have forgotten your password, you can have your secret word mailed to you. It's better to do this then to create a new account. Please keep the abstracts under 200 words each and in one paragraph. Please do not use bulleted items or line breaks, and please use a spell-checker. If you have an issue with the site, please follow up with an email to paul@sqlskills.com.

Please create a new speaker record (if you haven't submitted before) and submit at least 3 abstracts; however, it will help your chances of being selected if you submit 5 or 6 abstracts. There's one slight change for this conference only - we won't have a fourth track for speakers just delivering a single conference session, so if you only submit a single session abstract, you're unlikely to be picked.

What you will get if selected:

  • $500 per conference talk. (Additional compensation for pre/post conference workshops.)
  • Coach airfare and hotel stay paid by the conference
  • Free admission to all of the co-located conferences
  • Speaker party
  • The adoration of attendees :-)
  • etc.

Thanks for your continued support for our SQL Connections conference and we look forward to seeing your bright and shiny new abstracts!

Thanks,
Kimberly L. Tripp & Paul S. Randal
SQL Connections Conference Chairs

Categories:
Events | SQL Server 2008

I know that Paul and I recommended that you subscribe to Conor's blog... but have you? He's posted some great details on Partitioning (Part 1 and Part 2) as well as statistics and it always reminds me of how much I can learn from other people's perspectives!

And, just to dove-tail on some of his statistics comments... I, too, have found that as tables get significantly larger AND have non-standard distributions of more than 200 distinct values (and un-even distribution between those values as well), that the optimizer just cannot possibly do a perfect job. The only way an optimizer can be good is when it can "find a good plan fast" (which I first heard from Nigel Ellis (former Development Manager of the Query Processor team) - back when he delivered a Pacific Northwest SQL Server User Group meeting many moons ago). The most important thing to realize is that it's just not possible to waste time to find the absolutely best plan... mathematically analyzing all permutations would be prohibitive - you'd have to take a vacation between query executions (wait, that's not a bad idea... I digress :).

The point:

  1. Make sure that statistics are up-to-date (either through the database option: auto update stats OR by manually updating statistics)
  2. Consider re-evaluating statistics over large tables (and, when poor performance occurs - look at the estimated rows v. the actual rows - if the estimate/actual are off by a fact of 10, then it could be the statistics). I'd try updating stats first and then if that doesn't work, updating with a fullscan. If neither of those work, I'd also re-evaluate other possible indexes (there are some distributions between tables being joined that just can't show a correct correlation between the values when in multiple indexes... sometimes the best index is a multi-column (ie. composite index)). 
  3. Consider breaking very large tables down into smaller chunks (not just table index partitioning but possibly Partitioned Tables AND Partitioned Views) as this can give the optimizer additional details about partiticularly interesting data sets. Even in SQL Server 2008, statistics are still table-level (filtered indexes can provide some, but not complete, relief... I'll give more details in a later post) but I'd often argue that some of the best table designs are not just for a single table. Consider the statistical, locking, and indexing implications for mixed workloads against a single table (and the tremendous amount of blocking that could occur in addition to varying access patterns). And, even while 2008 will offer Partition-level lock escalation, well-designed tables may not need it! I know I've mentioned this before but different perspectives on statistics, optimizers and the fact that a good optimizer has to be efficient in-and-of itself, remind me of some of the most basic things that are also the most common problems contributing to poor query performance.

Returning to the basics and optimizing a system from the ground up always leads to better scalability!

Enjoy!
kt

 

OK, I still have a way to go in learning about data compression in SQL Server 2008 but one thing that I do know is that nothing is free. So, the trade-off will be performance (i.e. CPU) v. space. And, that's not really a new trade-off wrt to compression. Sometimes that trade-off has other benefits that still minimize the overall cost (for example, backup compression compresses in-memory and before it goes to disk... this actually makes the overall backup process faster because the actual backup written to disk is smaller). However, if we're talking about data and data access, then we need to think more about how the data is going to be used as well as the impact on performance. I can definitely think of many reasons to compress older (and read-mostly, if not read-only) data (mostly due to volume) but depending on the queries and the impact to uncompress it (based on the volume of data being accessed), I'm going to do a lot of testing before I compress high performance/OLTP data. To help estimate the savings on space, SQL Server 2008 offers a stored-proc: sp_estimate_data_compression_savings.

Compression in SQL Server 2005
SQL Server 2005 offers the ability to have read-only data compressed using Windows NTFS file compression. File compression is only supported for secondary non-primary data files and only when they're set to read-only. If the entire database is set to read-only then all files (incl. the primary and log) can be on compressed drives. While supported, and it can make sense to do this when you have large amounts of historical data, it's still not very granular.

The other form of compression in SQL Server 2005 was introduced in SP2 as data compression for the decimal/numeric data types, called vardecimal. First, you enable compression at the database level and then you turn it on at the table level. The primary form of compression used by vardecimal is when your actual values are generally much smaller than the defined/declared decimal/numeric column. For example, if you've chosen to define a lot of columns as precision/scale (38,4) then as a decimal column each value (per column, per row) will take 17 bytes whether you use all of it or not. If you only store the value 87.5 (which would normally take only 5 bytes as a decimal(3,1)) then you're wasting 12 bytes. This form of compression will still be supported in SQL Server 2008 so if you're interested in how the vardecimal type works, check out this whitepaper. As for the new forms of compression... row-level compression is similar to vardecimal, but the other forms are quite different, and very interesting (especially the page-level dictionary compression)!

Compression in SQL Server 2008
In addition to offering support for NTFS file compression and vardecimal, SQL Server 2008 offers row-level compression or page-level compression (which includes row-level compression) AND it offers the ability to turn these on at the partition-level or at the table-level for all partitions. While I think the per-partition option is excellent, you might still want to separate your OLTP and read-only data into separate tables for other benefits (like online index operations which I mentioned here) but, the "table-level only" options are certainly starting to decrease! And, more granular options always means better manageability.

So, how does compression work in SQL Server 2008:
   Paul wrote about backup compression here.
   Sunil wrote about data compression here and here.
   Chad Boyd wrote about both here.

Paul and I will post more on compression... I really want to get some numbers regarding performance and Paul will dive into all of the internals using DBCC PAGE (go figure! :).

Enjoy!
kt

Does it sound too good to be true: transparent data encryption? Well, it kind of is and kind of isn't. Let me explain. Transparent means that the application developer doesn't do anything (and if you have a third party app in which you can't do anything anyway, then this is even more important). Once enabled, the data is seemlessly protected.

But, protected from what? That's the part that needs a bit of explaining...

We all talk about the importance of physical security (I hope this doesn't sound new)? We all talk about the importance of protecting the physical media to which you've backed up your database(s)? We all talk about protecting the server/machine from physical access and of course, protecting the mdf/ldf files from unauthorized access if we copy them to other machines for attach. And, we all know that with all releases of SQL Server, up to and including SQL Server 2005 (with one exception - details coming up), that our databases are susceptible to unauthorized access if someone gains access to the physical files (whether the actual data files and/or the backup). The one exception is if data within the database has been encrypted using SQL Server 2005 column level encryption. As long as your database master key (created to allow encryption within the database) is strong and protected, then the data that is encrypted with it is also protected (you're only as good as your key protection mechanism). However, any data that is not encrypted is fair game to any system administrator who solely attaches the database or restores it. This is not good. However, it is reality in general. Physical security is incredibly important.

Now, having said all of that, SQL Server 2008 is going to change the landscape a bit. SQL Server 2008 is going to offer a way of protecting this "resting" data - transparently. Basically, once this is turned on, your data is protected from an unathorized attach OR when backed up, unauthorized restore.

This feature is not in the current CTP but I hope to see it in the next publicly available CTP (which will hopefully be out soon). So, when we can finally see it, I'll give you more details such as how to implement it, whether or not it negatively/positively impacts other features and the impact on performance.

Thanks for reading!
kt

Categories:
SQL Server 2008

With the November 5-9th SQL Connections conference in Las Vegas around the corner, Shirley Brothers, the Connections Conference Manager, would like to start planning the Spring show. And, for the Spring show forward, Paul and I will be the co-chairs of the SQL Connections conference. We would like to invite you to submit abstracts for the Spring 2008 SQL Connections conference, to be held in Orlando, April 20-24th, 2008.

The conference will take place shortly after the SQL Server 2008 launch (Feb 27, 2008), and will focus on best practices for SQL Server 2005, how to upgrade and migrate applications from SQL Server 200x to SQL Server 2008 and new features to leverage in SQL Server 2008. However, we will consider all topics - from design to performance to troubleshooting to architectures to programming. Also, we're likely to closely follow our past conference format with three primary themes/tracks: Infrastructure and Scalability, Programming, and Business Intelligence. You should aim for the intermediate to advanced audience for any SQL Server 2005 materials and for SQL Server 2008 - we'll consider any level - especially depending on the newness of the feature.

For submitting session abstracts, please use this URL: 
http://www.deeptraining.com/devconnections/abstracts

Please keep the abstracts under 200 words each and in one paragraph. Please do not use bulleted items or line breaks, and please use a spell-checker. If you have an issue with the site, please follow up with an email as well to
paul@sqlskills.com.

Please create a new speaker record (if you haven't submitted before) and submit at least 3 abstracts; however, it will help your chances of being selected if you submit 5 or 6 abstracts. We need to have all your abstracts by October 26th. 

Finally, please see some important notes about your sessions at Connections conferences. These comments are from our Connections Conference Manager - Shirley Brothers:

=========================================================================
I would like to thank each and every one of our Connections speakers for helping us make this event so successful. Last year in Vegas we had just over 4,700 attendees; we hope to do that or better this fall. By the end of September we had over 3,000 registrations for our Fall show.

A successful show happens when you have a combination of things come together: great speakers, good venue, great partners, fun events, and ongoing relationship building. I hope we can all continue to work together to make Connections the very best event outside of Tech Ed, for years to come. As a more successful show, our competitors are trying to compete by moving their shows closer to our dates and in some cases, to the same cities.

It’s very disheartening for me to see our speakers presenting similar topics at competing shows that are scheduled so close to our own shows. So I want to make a small change in how we handle speakers. Essentially, I don’t want to schedule a speaker at a Connections show who is also presenting at one of our competitor’s shows, in the same state, within 30 days of a Connections show. Many of you have known me for years and you know that I would never discourage a speaker from doing something that is good for his/her career and company. I have never asked a speaker not to speak for a competitor and I am NOT saying that now. I am saying that if another show sits within 30 days of ours in the same state, that the speaker should choose to speak at one or the other conference, but not both. If for whatever reason a speaker does not choose Connections during that time period, there will NOT be hard feelings and they can speak for one of our shows at another time. 

I want Connections conferences to be different and unique in the minds of our attendees, our sponsors, and our speakers. I have given this a lot of thought and I think the best way to produce our Connections shows is not to have any of our speakers presenting at our show and at a competitor’s show a few weeks earlier or later in the same place.

Shirley
=========================================================================

Thanks for your continued support for our SQL Connections conference and we look forward to seeing your bright and shiny new abstracts!

Thanks,
Kimberly L. Tripp & Paul S. Randal
SQL Connections Conference Chairs

Categories:
Events | SQL Server 2008

ITForum.gifITForum.gif

 

 

 

OK, so after SQL Connections in Las Vegas, Paul and I head off to Barcelona for the second week of TechEd's two week event (week one for developers and week two for IT professionals). November's definitely a busy month. So, if you're in the US - we hope to see you at Connections and if you're in Europe, we hope to see you at Microsoft TechEd ITForum 2007.

 

Here's what we're doing:

Sessions

  • (DAT205) The Next Release of Microsoft SQL Server: Manageability Overview

The next release of SQL Server will contain exciting new manageability features targeted at reducing total cost of ownership. Come learn more about what's in store in areas like policy-driven administration and performance data collection and analysis. The session focuses on the database engine.

  • (DAT301) SQL Server Indexing - Unravelling the Unknown

Knowing tips and tricks to indexing is extremely helpful and will help you to solve "known" query problems. But what's lurking in the unknown? Is SQL Server using your indexes? Or, do you have a bunch of indexes sitting around wasting space and negatively impacting performance? Finally, SQL Server 2005 has an answer! SQL Server 2005 DMVs (Dynamic Management Views) can provide you with valuable information about your current indexing strategies, what should be removed, and even what's missing. Do you know how to find this information, leverage it, and then programmatically respond to it? Come to this session to learn a few tips and tricks as well as how to figure out the unknown indexing problems!

  • (DAT305) Secrets to Fast Detection and Recovery from Database Corruptions

How can you tell whether your data is corrupt? If you have corruption, how do you work out what’s wrong with the database? How do you ensure you have a valid backup? If you don’t have a valid backup, how and what do you repair? If you do have a backup, how do you work out whether you should restore or repair? And at what granularity? How do you go about determining what went wrong in the first place? It’s all about limiting downtime and data-loss when a corruption occurs - from knowing the tools to understanding the choices to planning a successful strategy. Some of the features discussed:

  • Torn-page detection and page checksums
  • IO read-retry
  • Backup checksums
  • Consistency checks (DBCC CHECKDB and related commands)
  • Database repairs

Facing database corruption is almost inevitable in every DBAs career - make sure you're prepared when it happens to you.

Chalk-talks

  • (DAT01-IS) SQL Server Upgrade Best Practices, Tips, and Tricks

Even though SQL Server 2005 has been out for a while, many companies are only just getting ready to upgrade. Come to this session to learn best practices, tips, and tricks distilled from two years of customer experiences. We'll also touch on some issues you'll face going to SQL Server 2008 when it's released next year. Come to this session to learn and share - bring your questions and experiences!

  • (DAT02-IS) SQL Server 2005 Database Mirroring: Setup to Implementation to Monitoring

Database Mirroring is one of the most exciting technologies in SQL Server 2005. With more and more people including it in their Disaster Recovery Strategies, it's important to know when to implement Database Mirroring as well as the implications of the architecture you choose. In this session there will be no slides just demos that will explore how Database Mirroring works in its various configurations and how that may effect your performance. Join us and see database mirroring in action and get your questions answered!

  • (DAT07-IS) DBCC Internals

All DBAs should have heard of (and used) DBCC – especially for consistency checking. Get down deeper than 400-level with this session on how some of the most important DBCC commands work. Topics covered include CHECKDB, SHRINKFILE, INDEXDEFRAG, and more.

Instructor-led labs

  • (DAT03-ILL) SQL Server Always On Technologies Instructor-Led Lab: Part 1 - Database Mirroring

See Database Mirroring in action! From implementation to monitoring to failover, database mirroring provides an ideal solution for many disaster recovery scenarios and this session will prepare you to handle them with minimal downtime or data loss.

  • (DAT04-ILL) SQL Server Always On Technologies Instructor-Led Lab: Part 2 - Database Snapshots

Database Snapshots are useful in many situations: database maintenance, data recovery, and point-in-time data access. You can even create a snapshot on a mirror database to get better return-on-investment (ROI) on your high-avaialbility (HA) investments. In this session we will explain how database snapshots work as well as go through several exercises, including working with multiple database snapshots and creating database snapshots on a mirror database.

  • (DAT05-ILL) SQL Server Always On Technologies Instructor-Led Lab: Part 3 - Online Operations

The bane of any DBA's life is to have to take data offline to perform maintenance or recover from a disaster. The various Online Operations in SQL Server 2005 alleviate much of this frustration. This session will show you how to move a table online for better isolation and control, partition a table online, access a database that's partially damaged, and perform online piecemeal restore.

Lunchtime Demos

  • (DAT01-PD) Database Recovery Techniques

In this fast-paced demo session nasty things will be done repeatedly to database. Then the methods and approaches to recovery will be shown. Not for the faint-hearted!

So, just like SQL Connections the week before, serious amounts of info with tips and tricks for you to take home and implement!

It's going to be a great week. We hope to see you there!!

Kimberly (and Paul)

120x240_SQLConn_IBT.jpg120x240_SQLConn_IBT.jpg

 

 

OK, it's about that time again - the Fall conference season - is here! Building on our co-presented Database Maintenance workshop at SQL Connections in Orlando, Paul and I are doing a *ton* of stuff at SQL Connections this Fall. The conference is back at the Mandalay Bay hotel and officially runs from November 5th to 8th, with pre-con workshops on the 5th. But, after Spring, we decided to significantly add recovery content to our maintenance content that we decided to have our maintenance content as a pre- pre-conference workshop on the 4th and then on the 5th, we have all new content on Disaster Recovery and Lessons Learned. AND, after the week of sessions, we decided to add a HANDS-ON (bring your own laptop) post-con workshop on the 9th! Our day off is Tuesday the 6th as it's Microsoft day... with a session line-up looks great with lots of juicy details about SQL Server 2008 - as well as some best-practices sessions for those of you who are happy with SQL Server 200x for now.

 

 

Here's what we're doing:

Workshops

  • November 4th - Pre-pre-con: SPR301: SQL Server Database Maintenance: From Planning to Practice to Post-Mortem

No matter how much effort you spend on the design of your database, if you don't maintain it in production then it will suffer from performance and manageability problems. The key to continued performance and smooth operations is a well thought-out and automated database maintenance plan. This full-day workshop has three sections: planning, practice, and post-mortem. Planning for database maintenance actually starts with database design, so one of the things covered will be how to avoid design choices that limit database maintenance or contribute to maintenance problems. We'll discuss a laundry-list of maintenance problems and then explore how to tell if you need to mitigate them, strategies and best-practices for doing so, and how to avoid having your mitigation choices cause unforeseen and undesirable side-effects. Topics covered will include database files (shrink, grow, virtual log files, log size/management), consistency checks and corruption detection, fragmentation, statistics, backup/restore (options, granularity, strategies) and recovery models. The workshop will vary between 200-400 level covering ALL the key concepts of maintenance features. Finally, we'll spotlight some real-world examples where people made good and bad choices and discuss how you can repeat or avoid them, respectively. If you're wondering how to bring your database back under control, and keep it there, then this full-day workshop will help you tame maintenance problems whether you're a full-time system administrator or a reluctant DBA.

  • November 5th - Pre-con: SPR303: SQL Server Disaster Recovery: From Planning to Practice to Post-Mortem

Every DBA's nightmare is having down time and data loss and not knowing how to recover. However, designing and implementing a successful disaster recovery strategy is easier said than done. It's about asking all the right questions and figuring out all the best answers for your situation. This full-day workshop has three sections: planning, practice and post-mortem. Planning is a critical part of disaster recovery, but the most-often disregarded. Topics we'll cover here include: How do you choose technologies to fit requirements and effectively use key features of SQL Server 2005? How do technology choice affect workload performance? Putting a well-thought out plan into practice requires even more planning and in this section we'll discuss technology implementation, building step-by-step recovery/operation guides for when disasters happen, and, most importantly, testing your plan by simulating real problems. In the final section, we'll spotlight some real-world examples where people made costly mistakes and show you how they could have been avoided with a little planning and practice. If you've ever had nightmares about disaster recovery (or actually had a disaster!) and been at a loss for what to do, then this full-day workshop will give you the direction and technical details you need for success!

  • November 9th - Post-con: SPS302: SQL Server - Put Your Knowledge Into Action (Bring Your Own Laptop)

After a week of learning and watching demos - spend your last conference day putting your knowledge into action and diving deeper into the implementation details. Bring your own laptop to install our VPC environment setup with hands-on lab exercises to walk you through some of our most important features in Database Maintenance and Disaster Recovery. All labs will be ILLs (instructor-led labs) with supporting hands-on lab content *and* you will walk away with your own copy of the DVD to continue the exercises back at your office. You can attend without a laptop but your experience will be significantly better with one! This is meant as an advanced workshop and you're expected to bring a reasonable laptop configuration in order to participate: * Virtual Server or Virtual PC - already installed * At least 1GB of physical memory w/512MB dedicated to the VPC environment (2GB is preferred w/1GB dedicated to VPC) * 12 GB of physical disk space (20+ GB is preferred) * DVD Drive

Sessions

  • SDB351: Follow the Rabbit - Interactive Q&A on Availability

In this session, Kimberly Tripp and Paul Randal will have only 5-10 slides. The focus of this session is on mixing availability technologies to create the best overall architecture to minimize downtime and data loss. In general, we're going to focus on best practices and then open up to your questions so that you can drive the discussion! This session might not seem as structured as other sessions, but you'll be surprised at how informative and fun it is! Grab your lunch and come back - we'll probably still be hanging out!

  • SDB350: SQL Server Table Strategies - Designing for Performance and Availability

Often tables are designed based solely on the data that needs to be tracked (here's a column name, here's a data type - done!). Unfortunately, design does not usually take into account how the data is going to be used OR how SQL Server uses the data. Knowing the internals of table structures as well as the optmizations that come with good design will make your database truly scalable. Come to this session to learn some internals as well as various design strategies such as vertical and horizonal partitioning. Additionally, are there any other features that require changes in your design and thinking? For example, online index operations impact design because of the limitations that exist with partitioning and LOB columns. If you want to scale, you need to be here!

  • SDB347: SQL Server Indexing for Performance - Finding the Right Balance

In terms of performance tuning, there are few silver bullets. If I had to choose ONE area that improves performance the most (when designed appropriately!), it's indexing. However, indexing strategies depend on the data and even more so, the usage of the data. Come to this session to see what indexing strategies help the base table the most as well as how to optimize your worst performing queries.

  • SDB348: SQL Server Indexing Strategies - Are You Sure?

Knowing tips and tricks to indexing is extremely helpful and will help you to solve "known" problems. But what's lurking in the unknown? Is SQL Server using your indexes? Or, do you have a bunch of useless indexes? Finally, SQL Server 2005 has an answer! SQL Server 2005 DMVs (Dynamic Management Views) can provide you with valuable information about your current indexing strategies, what should be removed, and even what's missing. Do you know how to find this information, leverage it, and then programmatically respond to it? Come to this session to figure it out!

  • SDB349: Follow the Rabbit - Interactive Q&A on the Storage Engine and the Relational Engine

In this session, Kimberly Tripp and Paul Randal will have only 5-10 slides. Each slide cover topics for discussion as well as the reason(s) for why something might be behaving badly and/or things to try to solve your problems. In general we're going to focus on best practices and then open up to your questions so that you can drive the discussion! Paul will focus on the SE (Storage Engine) and internals and Kimberly will focus on the RE (Relational Engine) and query tuning/performance. This session might not seem as structured as other sessions, but you'll be surprised at how informative and fun it is!

At this event, you'll be able to get what most conferences offer (and that's breadth - in terms of session choices, etc.) but with the large number of workshops and the detailed planning that went into sequencing the conference sessions, you'll also get depth that no other conference offers. Serious amounts of tips and tricks for you to geek-out on with us and take home to immediately apply.

It's going to be a great week. We hope to see you there!!

Kimberly (and Paul)

OK, let me start by saying that I absolutely love when a feature improves in granularity options. Better granularity in locks means that contention is reduced and concurrency improved. And even though the overhead to manage smaller locks (and typically more of them) is usually higher - the improved concurrency benefits often significantly outweight the costs. Additionally, design is often simplified as more granular locks typically means you don't have to work as hard to minimize contention. Let me give you some history...

In the old days (ok, remember, I started working with SQL Server when I was 12 :) :), SQL Server used to have page-level locking (all releases prior to SQL Server 6.5 sp3). In SQL Server 6.5 sp3 they made an internal change to allow "insert row locks" but that was very targeted in what it improved (in terms of locking). However, in SQL Server 7.0 the locking architecture completely changed (as well as the SE and most of the RE) and that's where they introduced true row-level locking. This resulted in a significantly reduced complexity in table design. No longer did we have to choose clustered indexes to remove page-level locking (and therefore insert hotspots). And, in fact, some designs solely improved their performance by upgrading. The nice thing about internal changes like these is that they mean you can get away without knowing all of the internals, not worry as much about design and yet still get gains in performance. All of which is good.

However, if you do know the internals and you leverage this knowledge then you might be able to see even greater gains. With the change in locking from page to row (as well as based on other changes to the internal dependencies of non-clustered indexes on the clustering key), databases whose indexing strategies changed between 6.5 to 7.0 made the greatest gains in performance. How did they change - I've blogged about "the clustered index debate" a few times so I'll stay away from that one here... but, the key point is that while these changes might allow you to do more with less work - a bit more work to truly leverage the new features/changes might result in the best combination!

And so, that's what brings me to partition-level lock escalation. This is an absolutely necessary step to truly allowing SQL Server to treat partitions like mini-tables. Here are a few of the concerns I've had with regard to SQL Server 2005 table and index partitioning:

* lock escalation can still occur between the read-only and read-write portions of your partitioned table if the read-only portion is accessed by large queries that escalate (in SQL Server 2005 escalation is either row to table or page to table)
* indexes must be exactly the same for all partitions (not related to escalation but it does have bearing on my solution)
* index rebuilds are supported at the partition level; however, partition-level rebuilds must be performed OFFLINE. Only table-level index rebuilds can be performed online (again, not directly related to escalation but it's another problem around blocking)

And, this last one is very frustrating to me in general as I'm finding more and more environments moving to "real-time analysis" where they want to do queries on as-close-to-current data as possible. In fact, real-time data warehousing in a relational database is one of the primary areas of improvements for SQL Server 2008 with features such as partition-level lock escalation, improvements in indexed views, grouping sets, and star join optimizations - just to name a few.

So, in terms of partition-level lock escalation. Am I happy that it's going to be there - for sure! However, the other two issues mentioned above might not change. Having different indexes at the partition level is likely through a feature called "Filtered Indexes" which has not yet appeared in any CTPs but it has been discussed at conferences/events. So, we might solve 2 out of 3 but what about online index rebuilds at the partition level? At this point, I'm pretty sure that they won't be able to solve that for SQL Server 2008... As a result, I would suggest a slightly different architecture. Instead of using only a single partitioned table for both read-only and read-write data, use at least two tables. One table for read-only data and another for read-write data. If you think this might be defeating the purpose of partitioning... then look at these benefits:

* the read-only portion of the table (which is typically the *much* larger portion of the table - can still be managed with partitioning)
* the read-only portion - once separated from the read-write - can have additional indexes for better [range] query performance
* the read-only portion of the table can actually be partitioned into multiple partitioned tables - to give better per-table statistics (statistics are still at the table-level only so even if your partitioning scheme is "monthly" you might want to have tables that represent a year's worth of data...especially if your trends seem to change year to year)
* large range queries against the read-only portion of the data will only escalate to the "table" (which is now separated from the read-write data)
* the read-write portion of the data can have fewer indexes
* the read-write portion of the data can be placed on different disks (MORE fault tolerant disks) due to the importance/volatility of the data
* finally, and most importantly, the read-write portion of the data can be maintained completely separately from the read-only portion with regard to index rebuilds

So, then how do you make it appear as one table? Use partitioned views over partioned tables and consider using a synonym for the hot/insert table. At the end of each month (or whatever your partitioning strategy uses - daily, weekly, monthly, etc.), "switch" the read-write portion of the table into the read-only portion of the table. You should be able to do all of this with no data movement and the synonym used for inserts will mean that your applications don't need to change either.

In summary, I do like the partition-level lock escalation feature especially as it doesn't require rearchitecting your solution/design. However, by creating two or more tables where read-only data is isolated from read-write, you can leverage many other features (like online index rebuilds).

If you're more interested in hearing about lock escalation at the partition-level check out Paul's recent blog entry on it: SQL Server 2008: Lock escalation changes.

Have fun and thanks for reading!
kt

Yesterday, Microsoft released the July Community Technical Preview (CTP) for SQL Server 2008. The release makes available several performance and feature enhancements designed to support mission-critical platform and enable pervasive insight across the enterprise. (ok, those last two sentences weren't mine :)

Check out http://www.microsoft.com/sql/prodinfo/futureversion/default.mspx for all of the latest info. You can download the SQL Server 2008 July CTP at: http://connect.microsoft.com/sqlserver/

     Thanks and have fun!
kt

Categories:
SQL Server 2008

I had a customer ask me about the debate of skipping SQL Server 2005 entirely and just moving straight to SQL Server 2008... I haven't thought much about this as most of my direct customers have moved already BUT, I know there are a few out there that have not yet moved.

So, I definitely need to spend some time on SQL Server 2008 upgrade requirements, restrictions, etc. - and that's my plan in the not-too-distant-future BUT, I haven't thought much about it so far.

Your thoughts???

Cheers,
kt

Categories:
Opinions | SQL Server 2008

Well, I'm not sure where May went but I do know that part of it went to my 2 weeks in Hyderabad, India at Microsoft IDC (the India Development Center) and what was left was spent getting ready for upcoming events such as TechEd (and some personal events too :). And, the first big hurdle of TechEd is over...yesterday, Bob and I delivered a full day preconference seminar on Leveraging SQL Server Always On Technologies to Achieve High Availability and Scalability and the day could have been a week long (ok, do any of you ever know of a one day event where we couldn't expand into more time! :). But, the precon event went really well and was great fun. We had a lot of questions and ended the day close to on time but then we stayed late (Bob and I - obviously, and Paul Randal, too) for questions. In fact, Paul blogged about something he learned: the black box trace. Check out his blog entry and tips for using the trace.

Today was all about exciting future releases and the one at the top of my list: Katmai - now officially announced as SQL Server 2008. There are certainly many more details coming soon on Katmai but for now, here are a few interesting items to read/watch:

Website: SQL Server Katmai website
Press release: Microsoft SQL Server “Katmai” Builds on Proven Success of SQL Server 2005, Empowering Customers to Manage the Data Explosion
Whitepaper: SQL Server 2008 Product Overview
Download CTP3: SQL Server 2008 CTP3

And, there are quite a few other resources available! So, start checking out the upcoming version so that you are ready when it's released! I'll have a lot more to say...shortly (relatively speaking :).

More to come from TechEd!

Cheers,
kt

Categories:
Events | Resources | SQL Server 2008

NOTE/UPDATE: Be sure to read this post as well: Transaction Log VLFs too many or too few? after reading this one. Not only can you have too many small VLFs but if incorrectly sized, you can have too few! Enjoy!!

On a few of my last trips, I've had the pleasure of doing a lot of customer visits - visiting some of the larger implementations of SQL Server. Many of these implementations are at banks where both performance and recovery are critical. After my trip to Turkey (where I did customer visits prior to presenting at the Microsoft Professional Developers Summit), I received this email:

As you might remember we talked at xxxbank, Turkey about performance problems in transaction log backups and splitting tempdb data file into equal size chunks equal to number of CPUs. I have implemented both optimizations in my troubled server, and the results are great! Transaction log backups do not impact the server at all and with lowered congestion in tempdb, overall system performance has gone up as well.

So - this is my first of two blog entries targeting these two VERY typical problems:
* Transaction log fragmentation (both internal and external - this customer was having problems with internal fragmentation)
* Optimizing TempDB

We'll start with 8 Steps to Optimizing your Transaction Log - not necessarily in any order. It's best to review all of these to make sure you have a comprehensive view of how to improve both performance and recovery of your transaction logs!

1) Try to dedicate the transaction log portion of your database to its own phyiscal disk. In high volume OLTP system, isolating the transaction log can allow the disk head to be ready for the next write by not having other files contend for the physical disk resource. If your database already exists, the best way to "move" your transaction log is to detach your database (sp_detach_db) and then reattach it (sp_attach_db). Here's a great KB article titled: Moving SQL Server databases to a new location with Detach/Attach.

2) Defrag the disk(s) on which your transaction logs reside. This will get rid of external fragmentation of the transaction log - better known as disk file fragmentation. This will require that your server be taken offline but if your files have had a lot of autogrowth and/or they reside on a disk with a lot of other files that have been modified, then all of your files (incl. the transaction log file) are likely to be interleaved and fragmented. The good news is that this is a one time operation (unless the files continue to grow and shrink excessively - but we'll fix that as well). The bad news is that you'll need to shutdown SQL Server and then use some form of system tool - Microsoft or third party - to defrag. On Windows XP and Windows Server 2003 machines, the disk defragmenter tool can be launched by running: dfrg.msc.

3) Create only ONE transaction log file. Even though you can create multiple transaction log files, you only need one... SQL Server DOES not "stripe" across multiple transaction log files. Instead, SQL Server uses the transaction log files sequentially. While this might sound bad - it's not. If you want to get better performance out of the transaction log, place it on faster disks and/or a more performant disk (RAID) configuration. You could place the transaction log on a RAID 0 array but you'll likely want fault tolerance as well as performance. Isolation, as per number 1 is a good start but if you also need increased capacity then using hardware based RAID to combine disks for both better performance and redundancy is your next choice. See 4.

4) Not only should you try to isolate the transaction log to its own physical disk but you should make sure that the logical/physical disk configuration is as efficient as possible. Try to use an isolated RAID 1 mirroring set if you don't need significant capacity. If you need a greater capacity OR you want better performance, consider a combination of RAID 0 and RAID 1 (either RAID 0 + 1 or RAID 1 + 0). While RAID 0 + 1 can often offer better performance, RAID 1 + 0 offers better reliability. If you're new to RAID and are interested in learning more - check out the RAID Tutorial here: http://www.raidarray.com/04_01_00.html.

5) Don't be caught up in nothing but transaction log speed, you'll also want to make sure that your transaction log is always available as this can help you in times of disaster. Even if the data is damaged, if the transaction log is available and you have a series of backups up to and including the last transaction log then you can make a final backup of the transaction log that will represent all of the changes since your last transaction log backup. If this backup is possible (it's called backing up the "tail" of the log), then you can achieve up-to-the-minute recovery. This is only possible when the log is available. By placing the log on mirrored disks you can increase your chances of recovering data and minimize data loss!

6) Create transaction log files with a reasonable initial size. When you create a database it's ideal to PRE-ALLOCATE both your data files and your transaction log file. A little bit of capacity planning goes a long way... Now, if you think that you've got absolutely no idea how to size your transaction log you're going to need, well - here are the things that have the greatest impact:

  • Type of activity - transaction processing or decision support
  • Frequency of that activity - the more frequent the changes, the faster the transaction log will grow
  • Recovery Model - the recovery model of the database
  • Frequency of transaction log backups
  • Whether or not replication is used (since the log reader relies on the transaction log)

I wish I could give you a rough idea on sizing but if you look only at database size and none of these other factors, you could end up with a transaction log that's seriously oversized or seriously undersized. I've seen recommendations of 10-25% of the size of the data and you can use that but I would also add a bit of common sense. A larger database with very frequent transaction log backups may not need a transaction log that's even 1% of the size of the data... The best way is to setup your development environment similar to that of your production environment (including backup jobs) and then see how the transaction log grows. If you have a lot of autogrowth (because your guess was wrong), you can later clean up the fragmentation that has occurred and get back to a reasonable, intact, and optimal transaction log. If you want to know more about Recovery Models and their impact on the transaction log, review a sample book chapter from an MSPress book (Microsoft® SQL Server™ 2000 High Availability) that I helped to co-author. I only helped out on a couple of chapters but this specific chapter explains a lot about the general database environment settings that impact recovery. You can download Chapter 9: Database Environment Basics for Recovery here: http://www.sqlskills.com/resources/SQLServerHAChapter9.pdf

7) Don't let autogrowth get out of control. As important as capacity planning, you're not likely to be spot-on in your estimates. I don't recommend completely turning off autogrowth but in general I also don't like SQL Server 2000's default growth rate (or max size). In general, I would recommend setting ALL of the transaction log files attributes: initial size, growth rate AND maximum size. For the growth rate, I recommend something that can be allocated somewhat quickly and something of a fixed size. In general, I recommend a value which is less than or equal to 1GB (based on total size) but something that doesn't mean that you're going to autogrow again soon. So, for databases whose transaction logs are under 1GB then you might set autogrow to somewhere between 20 and 100 MB. For those of you who have transaction logs measured in GB, then I'd set the autogrowth to 500MB or 1GB. In general, I don't like the percentage because it needs to be calculated (which I realize isn't really all that big of a deal) but the larger the file the larger the autogrowth and the longer it takes. In SQL Server 2000, autogrowth can create blocking...so, it's best to minimize this in general.

8) Check/fix your internal fragmentation. OK, so this is the one that really helped the customer in Turkey. Often, when transaction logs are not pre-allocated and/or when there's been a lot of autogrowths, the transaction log can become internally fragmented. Internally your transaction logs are broken down into smaller more granular chunks called VLFs (Virtual Log Files). The size and number of VLFs you'll have depends largely on the size that the chunk is when it's added to you transaction log. If you add a new chunk to the transaction log which is 20MB (through autogrowth or through manual growth) then the number of VLFs that are added is 4. If you add a chunk which is greater than 64MB but less than or equal to 1GB, you'll add 8 VLFs. If you add more than 1GB then you'll add 16VLFs. In general, most transaction logs will only have 20 or 30 VLFs - even 50 could be reasonable depending on the total size of the transaction log. However, in many cases what happens is that excessive autogrowths can cause an excessive number of VLFs to be added - sometimes resulting in hundreds of VLFs. Having an excessive number of VLFs can negatively impact all transaction log related activities and you may even see degradation in performance when transaction log backups occur. To see how many VLFs you have solely look at the number of rows returned by DBCC LOGINFO. The number of rows returned equals the number of VLFs your transaction log file has. If you have more than 50, I would recommend fixing it and adjusting your autogrowth so that it doesn't occur as fequently. To get rid of all of the execessive VLFs, follow these easy steps to shrink off the fragmented chunk and add a new, clean chunk to your transaction log:

1. Wait for an inactive time of day (ideally, it would be best to put the database into single user mode first) and then clear all transaction log activity through a regular transaction log backup. If you're using the simple recovery model then you don't need to do a log backup... Instead, just clear the transaction log by running a checkpoint.

 BACKUP LOG databasename TO devicename

2. Shrink the log to as small a size as possible (truncateonly)

DBCC SHRINKFILE(transactionloglogicalfilename, TRUNCATEONLY)

NOTE: if you don't know the logical filename of your transaction log use sp_helpfile to list all of your database files.

3. Alter the database to modify the transaction log file to the appropriate size - in one step

ALTER DATABASE databasename
MODIFY FILE 

      NAME = transactionloglogicalfilename 
    , SIZE = newtotalsize
)

And, that should do it!

Theme design by Nukeation based on Jelle Druyts