It's official! We're doing another 5-day SQL Immersion Event (our ever-popular 5-day Deep Dive into Internals, Performance and Maintenance) this year in the US!

We decided to have a class near Seattle because Washington is *fabulous* in August!

The class will cover our best content in these areas:

  • On-disk structures: how the data is stored
  • Index internals: how the data is organized
  • Logging and recovery: how the data is protected
  • Choosing the RIGHT Data Type
  • Table & Index Partitioning Strategies
  • Data Access
  • Indexing Strategies
  • Data and log file provisioning and management
  • Tempdb
  • Index and statistics maintenance
  • Using backup and restore (plus internals)
  • Consistency checking and disaster recovery

The event will be held in the Marriott Courtyard Bellevue Downtown and will be fully catered - there's a special room rate of $159/night and even a hotel room discount for the first 10 people to register and stay at the event hotel.

The full cost is US $3,100.00 with an early-bird special of US $2,500.00 for registrations before midnight PST June 18th.

Check out the Immersion Event page for full details, registration, and feedback from attendees of our event in Boston earlier this year (or jump straight to registration).

We hope to see you there!
kt

Categories:
Events

NOTE: I've released other versions of sp_helpindex since this post. Check out the category: sp_helpindex rewrites for all of the options!

OK, I first blogged about re-writing sp_helpindex here as sp_helpindex2 (April 2008). Shortly thereafter (Aug 2008), a reader found a bug and when I went digging, so did I (a couple of others). I did a few tweaks and the version that's stuck for the past couple of years is in this blog post: Updates (fixes) to sp_helpindex2.

However, as much as I've used that - I've always been frustrated by its output. And, so, over the months+, I wrote sp_helpindex3 (this solely added the index_id as the first column and then ordered the output; I've always hated the weird way in which sp_helpindex/2 ordered the output). Then, I started writing sp_helpindex4 which listed the clustering key columns with each index so that I could more easily identify columns on large tables without a lot of scrolling. Ultimately, I didn't like this because the clustering key (or some of the columns in a multi-column clustering key - might already be in the nonclustered index and so I still had to figure things out more manually). And, so I finally sat down and created what's jokingly being called sp_helpindex8.

Geeky note: Er, you might wonder with how many geeky SQL people I've joked about sp_helpindex8 but in fact, in my last few events (both onsite with a couple of customers as well as when teaching the Indexing Strategies and Performance Tuning portion of the Microsoft Certified Masters - SQL Server course (two weeks ago)), I've actually joked - and promised - that I'd have this blogged before this particular 3 week MCM rotation ends. And, well, that's this week. So, I've actually joked with quite a few folks on this and I'm close to the end of time on blogging it. So, here I am - sitting in a hotel room in Houston, after having presented at the Houston Area User Group (7-9pm) (yes, with Paul) and after having been onsite all day with a customer - blogging this darn thing. I just have to get it out there. ;-)

Having said that - I'm pretty happy with this version but it was pretty painful to write. And, before I get into what it does - and why I wanted this version - let me explain what I didn't do and why. There are a lot of other things that could be added. And, quite a few folks have said - why don't you do this or why don't you do that... Specific suggestions have been made such as adding index usage stats (from the current state shown by the DMV) or add index operational stats (to show if an index is suffering from latch IO waits, for example) or add index physical stats (to show if an index is fragmented), etc. And while many of these suggestions would make for lots of interesting uses of this output, I've stayed away from these. Why? Some are expensive to run - such as physical stats - even a limited scan can be painful on a really large table. Or, the results might not be able to be truly trusted  - such as with index_operational_stats. Why? Because the information from that DMV might only cover a short period of time. The information help by them is cleared whenever SQL Server is restarted (or if a database is taken offline or restored, etc.). So, when were those usage or operational stats last cleared? Moreover, is this single "look" at the DMV information really the best on which to base our decisions (even just in general).

In the end, I think that physical stats, operational stats and even usage stats - should be used wisely and carefully (definitely used but not by this). In fact, we often recommend that the best way to use these is by studying them over time and then analyzing their trends not just the point in time values. And, there are LOTS of posts that can help you with understanding this information; here's just a couple to get you started:

So, having said that - I've kept my sp_helpindex8 more pure ;-). What I want to see is EXACTLY what's stored in that index. In fact, this has always been my main goal. The first version (sp_helpindex2) added INCLUDED columns (2005+) as well as filter definitions (2008) to the output. So, what was missing? The added columns that SQL Server might have to add if a nonclustered is nonunique and/or when the clustering key is nonunique. The reason this is so important to me is that there are many things coming together and recommending indexes (DTA, the missing index DMVs, application tools, etc.) and while many of these tools are solely there to "help" us get better performance and "help" us create more effective indexing strategies - these tools sometimes recommend the same index over and over again (see Paul's recent blog post: Missing index DMVs bug that could cost your sanity... about the bug in the missing index DMVs) OR the tools might recommend similar indexes to what we already have.

Additionally, the output from sp_helpindex/2 only shows the columns that are EXPLICITLY defined in the key and in the leaf (using INCLUDE) but they don't show what SQL adds for you. Most folks know that SQL Server adds the clustering key to the nonclustered indexes and when you have a simple single-column clustering key it's not difficult to know that this IS "there" in your index even though it doesn't show up. But, even with that, I think there are still a few things that are lesser known. For example, if your nonclustered index is nonunique - did you know that the clustering key is not only added to the leaf level of the index (like INCLUDE does) but that the clustering key is added into the tree (as part of the key and ordering?). And, if have one index that lists this clustering key explicitly and another that doesn't - are you actually going to be able to easily tell that these are in fact the SAME. Ultimately, I want you to be able to better recognize what's REALLY in that index. What you might end up seeing is that two indexes are more similar than you first thought and as a result - maybe you can drop one of them. Or, you might be able to combine them to reduce your overall indexes (I call this index consolidation). And, all of this becomes significantly more challenging when you have more complex clustering keys (and, yes, they do exist!). The other thing I did is delimit the columns and index names properly using brackets. This is also nice because you might have identifiers that make the output of sp_helpindex/2 hard to read. This should help. However, I do have to admit that you probably have other problems if your index/column names have commas, spaces or brackets in them. But, this should still help/work.

And, that's why I wrote "sp_helpindex8" it's a MUCH better version of sp_helpindex but it requires a few other components. And, it was a lot more challenging to write. Basically, this information is more internal to SQL Server and not directly exposed anywhere. So, we had to do a bit of digging. Paul started the process (thanks!) but, I ended up spending about 5 hours getting it exactly the way I wanted it. And, I also ended up breaking it up into two parts: one sp that gets column definitions and the other that builds the final output. And, I've actually named it with a longer name (sorry!). But, I rarely type in the name anyway. What I typically do is go to SSMS, Tools | Options, then under Environment | Keyboard, I set the keyboard shortcut for Ctrl+F1 to my sp_SQLskills_SQL2008_helpindex.

Now, when I want to see the true defintions of indexes on a table - I highlight the tablename and hit Ctrl+F1. But, if you want to rename this to sp_helpindex8 - I won't be offended. ;-)

So, how do you create this?

  1. You need to create sp_SQLskills_ExposeColsInIndexLevels (sp_SQLskills_ExposeColsInIndexLevels.sql (6.73 kb)). This is what gives us the tree/leaf definitions. And, this works for both SQL Server 2005 and SQL Server 2008.
  2. You need to create the version specific sp_helpindex8. These are named: sp_SQLskills_SQL2005_helpindex (sp_SQLskills_SQL2005_helpindex.sql (10.50 kb)) and  sp_SQLskills_SQL2008_helpindex (sp_SQLskills_SQL2008_helpindex.sql (10.73 kb)). Again, I know they're long names but using the keyboard shortcuts can help. And, if you do rename these make sure you do a replace ALL. There are multiple places where there are comments and/or statements that need to execute to get everything to work (for example the sp_MS_marksystemobject execution).

NOTE: The very first version of the 2005 script had a bug in it (thanks Calvin!). Funnily one I already knew about and thought I had caught (it was fixed in the 2008 version). Anyway, as of 3:15pm CT on May 19, it's fixed in the scripts here. There could be others but so far - it's looking good!

Enjoy!
kt

As a follow-on discussion to my SQL Server Magazine Q&A titled “Determining the position of search arguments in a join” where I was responding to a reader question about whether or not search arguments could be moved up into the FROM clause instead of placed in the WHERE clause and whether or not it would improve performance. While I was so focused on creating a scenario to highlight the problems when doing this, I missed a great point about what happened in my specific example. Fellow SQL Server MVP, Phil Brammer (blog | twitter) brought up this point first and it sparked a very passionate discussion on the SQL Server MVP forum. As a result, I thought I’d take a few minutes to create a few more examples and state a few key points. The most important of which is that search arguments should not be moved from the WHERE clause into the FROM clause unless you really know what you’re doing - you may completely change the meaning of the query… Let me explain.

Details:

In the prior Q&A, Listing 1 and Listing 2 show INNER JOINs that describe the Customers along with some of their Sales information (specifically: OrderDate, AccountNumber, FirstName, LastName, and StateProvinceCode). In the joins, we asked for only Customers with a RegionCode of US. Regardless of having the search argument in the WHERE clause or the FROM clause, SQL Server executes the same plan, has the same performance and returns the same data. While people often state that a WHERE clause applies to the entire set (and also imply order), that’s NOT the case in terms of how the join is processed. SQL Server knows that the search argument applies to the entire set but it can certainly apply the filter prior to the join if it’s selective enough and would help improve performance. Moving the search argument up into the FROM clause is not a “tip” for improving performance. However, if all of the joins are inner joins, then it really doesn’t matter. You can certainly move the search arguments up; they won’t change the result set and it’s very unlikely that they’ll change the query plan. Listing 1 and Listing 2 were both executed at the same time to produce these two plans:

First, notice that they are 50% each. Also, without worrying about the plan itself, notice that each plan has the same pattern, the same percentages, and the same join types. These are in fact, the same plan – Listing 1 has the search argument in the WHERE clause and Listing 2 has the search argument in the FROM clause.

To illustrate how moving a search argument could affect results, I created OUTER joins, added a somewhat random search argument and then did the same thing – Listing 3 was written as an outer join and Listing 4 was a copy of that query with the search argument moved up into the FROM clause. While my examples (solely looking at the results between Listing 3 and Listing 4) prove the point that a search argument cannot be moved from the WHERE clause into the FROM clause without understanding the effects, my example also introduced another phenomena. It was this phenomena that sparked debate. When a search argument exists on an inner table of an outer join, the purpose of the outer join is effectively lost. In fact, the query is not likely to even work the way you expect. To illustrate this, I’ve created a [quick] Venn diagram of what’s logically happening:

 

In fact, the two following queries will produce identical results:

-- The original Listing 3 query
SELECT so.OrderDate, c.AccountNumber, p.FirstName
     , p.LastName, sp.StateProvinceCode
FROM Sales.Customer AS c
     LEFT OUTER JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
     LEFT OUTER JOIN Person.Person AS p
          ON c.PersonID = p.BusinessEntityID
     LEFT OUTER JOIN Person.Address AS pa
          ON so.ShipToAddressID = pa.AddressID
     LEFT OUTER JOIN Person.StateProvince AS sp
          ON sp.StateProvinceID = pa.StateProvinceID
WHERE sp.CountryRegionCode = 'US'
ORDER BY sp.StateProvinceCode, so.OrderDate 

-- Listing 3 is really an inner join
SELECT so.OrderDate, c.AccountNumber, p.FirstName
     , p.LastName, sp.StateProvinceCode
FROM Sales.Customer AS c
     INNER JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
     INNER JOIN Person.Person AS p
          ON c.PersonID = p.BusinessEntityID
     INNER JOIN Person.Address AS pa
          ON so.ShipToAddressID = pa.AddressID
     INNER JOIN Person.StateProvince AS sp
          ON sp.StateProvinceID = pa.StateProvinceID
WHERE sp.CountryRegionCode = 'US'
ORDER BY sp.StateProvinceCode, so.OrderDate 

A better example – which preserved the purpose of the outer join – would be to place a search argument on the table which is our outer table in the query. Before I show the query itself, let’s get some background on the data here.

In the Sales.Customer table there are 3,520 Customers in TerritoryID 1 (the northwestern US):

SELECT COUNT(*)
FROM Sales.Customer AS c
WHERE c.TerritoryID = 1 -- 3520 customers 

These Customers represent a total number of Sales of 4,594 orders:

SELECT COUNT(*)
FROM Sales.Customer AS c
     JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
WHERE c.TerritoryID = 1  -- 4594 orders 

There are 92 customers that have never placed an order:

SELECT COUNT(*)
FROM Sales.Customer AS c
WHERE NOT EXISTS (SELECT *
                  FROM Sales.SalesOrderHeader AS so
                  WHERE so.CustomerID = c.CustomerID)
AND c.TerritoryID = 1 -- 92 customers have not placed orders 

If we want to see ALL Customers in TerritoryID 1 – regardless of whether or not they have placed an order and with their order information if they have, we can use an outer join to do this. For those customers who have not placed an order, NULLs will be produced:

SELECT so.OrderDate, c.AccountNumber, p.FirstName
     , p.LastName, sp.StateProvinceCode
FROM Sales.Customer AS c
     LEFT OUTER JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
     LEFT OUTER JOIN Person.Person AS p
          ON c.PersonID = p.BusinessEntityID
     LEFT OUTER JOIN Person.Address AS pa
          ON so.ShipToAddressID = pa.AddressID
     LEFT OUTER JOIN Person.StateProvince AS sp
          ON sp.StateProvinceID = pa.StateProvinceID
WHERE C.TerritoryID = 1 -- 4686 rows
ORDER BY sp.StateProvinceCode, so.OrderDate 

This is different from an inner join because the INNER join will return only those customers that have a matching row in each and every joined table. In this case, all 3,520 customers have a matching row in Person.Person, Person.Address and Person.StateProvince. As a result, the inner join produces all 3,520 customers with their sales for 4,594 total rows. If you notice, the outer join above returned 4686 rows (4686 - 4594 = 92).

SELECT so.OrderDate, c.AccountNumber, p.FirstName
     , p.LastName, sp.StateProvinceCode
FROM Sales.Customer AS c
     INNER JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
     INNER JOIN Person.Person AS p
          ON c.PersonID = p.BusinessEntityID
     INNER JOIN Person.Address AS pa
          ON so.ShipToAddressID = pa.AddressID
     INNER JOIN Person.StateProvince AS sp
          ON sp.StateProvinceID = pa.StateProvinceID
WHERE C.TerritoryID = 1 -- 4594
ORDER BY sp.StateProvinceCode, so.OrderDate 

Using this query as our example, let’s go back to the original discussion about whether or not moving the search argument can improve performance… In the query below, I *solely* move the search argument (c.TerritoryID = 1) up into the FROM clause for the following query:

SELECT so.OrderDate, c.AccountNumber, p.FirstName
     , p.LastName, sp.StateProvinceCode
FROM Sales.Customer AS c
     LEFT OUTER JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
              AND C.TerritoryID = 1 -- 20986 rows
     LEFT OUTER JOIN Person.Person AS p
          ON c.PersonID = p.BusinessEntityID
     LEFT OUTER JOIN Person.Address AS pa
          ON so.ShipToAddressID = pa.AddressID
     LEFT OUTER JOIN Person.StateProvince AS sp
          ON sp.StateProvinceID = pa.StateProvinceID
ORDER BY sp.StateProvinceCode, so.OrderDate 

And it returns a drastically different result set of 20,986 rows. What’s happened here?

In this case, we’ve preserved ALL customer rows and then found the matching rows for ONLY those customers where TerritoryID = 1. To show you the numbers, it breaks down like this:

There are 19,820 customers – of which, 3,520 are in TerritoryID 1. Of these 3,520 customers in TerritoryID 1, all but 92 have placed orders – for a total of 4,594 orders. Bringing everything together:

19,820 customers – 3,520 in TerritoryID 1 = 16,300 rows

.4,594 orders + 92 customers that haven’t placed an order (remember – all customers are preserved in the outer join) = 4,686 rows

16,300 + 4,686 = 20,986 rows

Bringing this all together and to quote fellow MVP Hugo Kornelis (blog) in the discussion: "For OUTER joins, moving predicates does change the results. This makes performance comparisons moot. If a query gives incorrect results, I don't care if it's faster or slower than one that does yield the desired results."

This is absolutely the most important point. You cannot just move a search argument “up” into the FROM without repercussions. In general, I recommend search arguments to be in the WHERE clause and join conditions in the FROM clause.

But, to make it even better, I thought I’d end with an even more simplified example of the sets described above:

SELECT c.CustomerID, so.SalesOrderNumber, so.OrderDate
FROM Sales.Customer AS c
     INNER JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
WHERE c.TerritoryID = 1 -- Northwest US, 4594 rows
go 

-- in this simple inner join
SELECT c.CustomerID, so.SalesOrderNumber, so.OrderDate
FROM Sales.Customer AS c
     INNER JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
              AND c.TerritoryID = 1 -- Northwest US, 4594 rows
go

-- What if we want ALL Customers in the Northwest of the US,
-- regardless of whether or not they placed an order:
SELECT c.CustomerID, so.SalesOrderNumber, so.OrderDate
FROM Sales.Customer AS c
     LEFT OUTER JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
WHERE c.TerritoryID = 1 -- Northwest US, 4594 rows
go -- Northwest US, 4686 rows 

-- So, now what if we thought that the performance could be
-- better if we pushed this up into the FROM clause...
SELECT c.CustomerID, so.SalesOrderNumber, so.OrderDate
FROM Sales.Customer AS c
     LEFT OUTER JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
              AND c.TerritoryID = 1 -- 20986 rows

-- Absolutely ALL customers, regardless of whether or not they
-- placed an order. For the Northwest territory we will see their
-- order information but for all other customers (even those that
-- have placed an order), we will NOT see their order information.
go 

What's probably most important is for you to thoroughly review the output of the last result set (of 20,986 rows). While I've explained the number - do you understand the results? Out of the 3,520 customers in TerritoryID 1 - 3,428 of them create 4,594 sales. For ONLY these sales, we see the SalesOrderNumber and SalesOrderDate. For all other customers - even those that have placed orders - we see NULLs. Why? Because moving this search argument up into the FROM has defined the criteria for the INNER join *BEFORE* the outer join is performed. This is why we see NULLs for those that have placed orders. If you order the results by TerritoryID this may help you better visualize what's happening!

Thanks for reading and thanks for the comments and discussions... I love the SQL Community for its never ending passion!

Cheers,
Kimberly

OK, it seems like when it rains it pours! I still have a few blog posts on my ToDo list and I do hope to get to them over the next few days (one that I've promissed to the SQL MCM folks in this month's rotation as well so now there's even more pressure!).

Anyway, in the midst of working on some really fun projects (mostly architecture/partitioning/tuning), there are a few upcoming public events. Here's a quick update so you can find all of the details: 

Houston, Texas (the Woodlands) - May 18, 2010

London, England - June 17, 2010

Dublin, Ireland - June 28th to July 8th, 2010

Bellevue, Washington, August 9th to August 13th, 2010

  • SQL Immersion: 5 day workshop with the same format as our popular Boston Immersion Event that was presented in March.
  • Registration details coming soon and event details are here.
  • EARLYBIRD discount offer expires on June 18th. 
  • NOTE: This will be our only public Immersion Event in the 2nd half of 2010.

We hope to see you at one of these events and feel free to email one of us (kimberlyATsqlskillsDOTcom or paulATsqlskillsDOTcom) if you have any questions!

Thanks,
Kimberly

Categories:
Events

Theme design by Nukeation based on Jelle Druyts