This "comment" started out as one but I've decided to make it a post as I hope that a few people that may follow me but not Stacia - will check out her post on WIT.

The post to read is here: http://blog.datainspirations.com/2010/07/29/maybe-its-just-me-a-perspective-from-one-woman-in-it

First - Great post.

Second - Fantastic comments from folks in the SQL community (once again reminding me of why I like it so much)!

I do love the SQL community for so many reasons and this is DEFINITELY one of them. To be honest, I've never really felt singled out as a "WOMAN" in tech. I've always felt like just another person struggling to do the tech and to do it well. I've found that you must do things well to get ahead but I (luckily) haven't been in a situation where nothing but my gender stopped me from getting ahead or getting something done. To take that even further since I'm sometimes the only woman in the room where I'm teaching (big smiles) the thing I love the most is the willingness and interest for everyone to listen, contribute and grow - regardless of the fact that their instructor is a woman. I suspect that maybe I don't meet the people that don't want to hear tech from a woman as they go to someone else's sessions instead of mine but the good news is that I've often had very well attended sessions (so it truly can't be the majority of men that can't listen to a woman OR work with a woman OR deal with women in tech). And, this week my class was the norm - in terms of men/women ratios (maybe even a bit higher than normal). We had 31 students + 2 instructors - 6 students were women (Paul Randal and I were the instructors). That's only 20% and still fairly low (overall - but again, a bit higher than some) but I didn't feel like there were any issues with men/women in the class - everyone was happy to be there and sharing information (regardless of gender, ethnicity, etc.).

Having said all of that - I do think that there have been sterotypes over the years. I remember (*VERY* clearly) something that was said to me when I was in high school (I wish I didn't have to say that it was over 20 years ago but it was roughly 25 years ago when this occurred). My favorite math teacher (I was also his TA) introduced me to a mathematician that was a woman... She was definitely an inspiration and talked to me about different fields and directions in which to go - in Math. It was a nice dinner. I remember thinking that I might want to continue my studies in Math (which I did as a Math/CompSci major in college). But, what I remember clearly is her telling me that she didn't wear makeup or paint her nails because she wouldn't be respected in the field. I've never forgotten that but I've also never lived by that. And, unless it's me (and my bad memory ;-), I really don't think it has hurt me. Maybe it's SQL. Maybe I've been sheltered - but I guess I have to say that I'm happy it's been this way as well.

Thanks for (another!) great post Stacia and some really great comments.

Once again - I'm reminded of how much I love our community!

Cheers,
kt

PS - Check out a past post about Women in Technology and some of the resources that are out there: http://www.sqlskills.com/BLOGS/KIMBERLY/post/Resources-A-panel-on-HA-and-a-second-on-Women-In-Tech.aspx. Also, if you're interested in helping your daughters get more involved in technology check out:

Categories:
Opinions | WIT | Community

As a follow-on discussion to my SQL Server Magazine Q&A titled “Determining the position of search arguments in a join” where I was responding to a reader question about whether or not search arguments could be moved up into the FROM clause instead of placed in the WHERE clause and whether or not it would improve performance. While I was so focused on creating a scenario to highlight the problems when doing this, I missed a great point about what happened in my specific example. Fellow SQL Server MVP, Phil Brammer (blog | twitter) brought up this point first and it sparked a very passionate discussion on the SQL Server MVP forum. As a result, I thought I’d take a few minutes to create a few more examples and state a few key points. The most important of which is that search arguments should not be moved from the WHERE clause into the FROM clause unless you really know what you’re doing - you may completely change the meaning of the query… Let me explain.

Details:

In the prior Q&A, Listing 1 and Listing 2 show INNER JOINs that describe the Customers along with some of their Sales information (specifically: OrderDate, AccountNumber, FirstName, LastName, and StateProvinceCode). In the joins, we asked for only Customers with a RegionCode of US. Regardless of having the search argument in the WHERE clause or the FROM clause, SQL Server executes the same plan, has the same performance and returns the same data. While people often state that a WHERE clause applies to the entire set (and also imply order), that’s NOT the case in terms of how the join is processed. SQL Server knows that the search argument applies to the entire set but it can certainly apply the filter prior to the join if it’s selective enough and would help improve performance. Moving the search argument up into the FROM clause is not a “tip” for improving performance. However, if all of the joins are inner joins, then it really doesn’t matter. You can certainly move the search arguments up; they won’t change the result set and it’s very unlikely that they’ll change the query plan. Listing 1 and Listing 2 were both executed at the same time to produce these two plans:

First, notice that they are 50% each. Also, without worrying about the plan itself, notice that each plan has the same pattern, the same percentages, and the same join types. These are in fact, the same plan – Listing 1 has the search argument in the WHERE clause and Listing 2 has the search argument in the FROM clause.

To illustrate how moving a search argument could affect results, I created OUTER joins, added a somewhat random search argument and then did the same thing – Listing 3 was written as an outer join and Listing 4 was a copy of that query with the search argument moved up into the FROM clause. While my examples (solely looking at the results between Listing 3 and Listing 4) prove the point that a search argument cannot be moved from the WHERE clause into the FROM clause without understanding the effects, my example also introduced another phenomena. It was this phenomena that sparked debate. When a search argument exists on an inner table of an outer join, the purpose of the outer join is effectively lost. In fact, the query is not likely to even work the way you expect. To illustrate this, I’ve created a [quick] Venn diagram of what’s logically happening:

 

In fact, the two following queries will produce identical results:

-- The original Listing 3 query
SELECT so.OrderDate, c.AccountNumber, p.FirstName
     , p.LastName, sp.StateProvinceCode
FROM Sales.Customer AS c
     LEFT OUTER JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
     LEFT OUTER JOIN Person.Person AS p
          ON c.PersonID = p.BusinessEntityID
     LEFT OUTER JOIN Person.Address AS pa
          ON so.ShipToAddressID = pa.AddressID
     LEFT OUTER JOIN Person.StateProvince AS sp
          ON sp.StateProvinceID = pa.StateProvinceID
WHERE sp.CountryRegionCode = 'US'
ORDER BY sp.StateProvinceCode, so.OrderDate 

-- Listing 3 is really an inner join
SELECT so.OrderDate, c.AccountNumber, p.FirstName
     , p.LastName, sp.StateProvinceCode
FROM Sales.Customer AS c
     INNER JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
     INNER JOIN Person.Person AS p
          ON c.PersonID = p.BusinessEntityID
     INNER JOIN Person.Address AS pa
          ON so.ShipToAddressID = pa.AddressID
     INNER JOIN Person.StateProvince AS sp
          ON sp.StateProvinceID = pa.StateProvinceID
WHERE sp.CountryRegionCode = 'US'
ORDER BY sp.StateProvinceCode, so.OrderDate 

A better example – which preserved the purpose of the outer join – would be to place a search argument on the table which is our outer table in the query. Before I show the query itself, let’s get some background on the data here.

In the Sales.Customer table there are 3,520 Customers in TerritoryID 1 (the northwestern US):

SELECT COUNT(*)
FROM Sales.Customer AS c
WHERE c.TerritoryID = 1 -- 3520 customers 

These Customers represent a total number of Sales of 4,594 orders:

SELECT COUNT(*)
FROM Sales.Customer AS c
     JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
WHERE c.TerritoryID = 1  -- 4594 orders 

There are 92 customers that have never placed an order:

SELECT COUNT(*)
FROM Sales.Customer AS c
WHERE NOT EXISTS (SELECT *
                  FROM Sales.SalesOrderHeader AS so
                  WHERE so.CustomerID = c.CustomerID)
AND c.TerritoryID = 1 -- 92 customers have not placed orders 

If we want to see ALL Customers in TerritoryID 1 – regardless of whether or not they have placed an order and with their order information if they have, we can use an outer join to do this. For those customers who have not placed an order, NULLs will be produced:

SELECT so.OrderDate, c.AccountNumber, p.FirstName
     , p.LastName, sp.StateProvinceCode
FROM Sales.Customer AS c
     LEFT OUTER JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
     LEFT OUTER JOIN Person.Person AS p
          ON c.PersonID = p.BusinessEntityID
     LEFT OUTER JOIN Person.Address AS pa
          ON so.ShipToAddressID = pa.AddressID
     LEFT OUTER JOIN Person.StateProvince AS sp
          ON sp.StateProvinceID = pa.StateProvinceID
WHERE C.TerritoryID = 1 -- 4686 rows
ORDER BY sp.StateProvinceCode, so.OrderDate 

This is different from an inner join because the INNER join will return only those customers that have a matching row in each and every joined table. In this case, all 3,520 customers have a matching row in Person.Person, Person.Address and Person.StateProvince. As a result, the inner join produces all 3,520 customers with their sales for 4,594 total rows. If you notice, the outer join above returned 4686 rows (4686 - 4594 = 92).

SELECT so.OrderDate, c.AccountNumber, p.FirstName
     , p.LastName, sp.StateProvinceCode
FROM Sales.Customer AS c
     INNER JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
     INNER JOIN Person.Person AS p
          ON c.PersonID = p.BusinessEntityID
     INNER JOIN Person.Address AS pa
          ON so.ShipToAddressID = pa.AddressID
     INNER JOIN Person.StateProvince AS sp
          ON sp.StateProvinceID = pa.StateProvinceID
WHERE C.TerritoryID = 1 -- 4594
ORDER BY sp.StateProvinceCode, so.OrderDate 

Using this query as our example, let’s go back to the original discussion about whether or not moving the search argument can improve performance… In the query below, I *solely* move the search argument (c.TerritoryID = 1) up into the FROM clause for the following query:

SELECT so.OrderDate, c.AccountNumber, p.FirstName
     , p.LastName, sp.StateProvinceCode
FROM Sales.Customer AS c
     LEFT OUTER JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
              AND C.TerritoryID = 1 -- 20986 rows
     LEFT OUTER JOIN Person.Person AS p
          ON c.PersonID = p.BusinessEntityID
     LEFT OUTER JOIN Person.Address AS pa
          ON so.ShipToAddressID = pa.AddressID
     LEFT OUTER JOIN Person.StateProvince AS sp
          ON sp.StateProvinceID = pa.StateProvinceID
ORDER BY sp.StateProvinceCode, so.OrderDate 

And it returns a drastically different result set of 20,986 rows. What’s happened here?

In this case, we’ve preserved ALL customer rows and then found the matching rows for ONLY those customers where TerritoryID = 1. To show you the numbers, it breaks down like this:

There are 19,820 customers – of which, 3,520 are in TerritoryID 1. Of these 3,520 customers in TerritoryID 1, all but 92 have placed orders – for a total of 4,594 orders. Bringing everything together:

19,820 customers – 3,520 in TerritoryID 1 = 16,300 rows

.4,594 orders + 92 customers that haven’t placed an order (remember – all customers are preserved in the outer join) = 4,686 rows

16,300 + 4,686 = 20,986 rows

Bringing this all together and to quote fellow MVP Hugo Kornelis (blog) in the discussion: "For OUTER joins, moving predicates does change the results. This makes performance comparisons moot. If a query gives incorrect results, I don't care if it's faster or slower than one that does yield the desired results."

This is absolutely the most important point. You cannot just move a search argument “up” into the FROM without repercussions. In general, I recommend search arguments to be in the WHERE clause and join conditions in the FROM clause.

But, to make it even better, I thought I’d end with an even more simplified example of the sets described above:

SELECT c.CustomerID, so.SalesOrderNumber, so.OrderDate
FROM Sales.Customer AS c
     INNER JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
WHERE c.TerritoryID = 1 -- Northwest US, 4594 rows
go 

-- in this simple inner join
SELECT c.CustomerID, so.SalesOrderNumber, so.OrderDate
FROM Sales.Customer AS c
     INNER JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
              AND c.TerritoryID = 1 -- Northwest US, 4594 rows
go

-- What if we want ALL Customers in the Northwest of the US,
-- regardless of whether or not they placed an order:
SELECT c.CustomerID, so.SalesOrderNumber, so.OrderDate
FROM Sales.Customer AS c
     LEFT OUTER JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
WHERE c.TerritoryID = 1 -- Northwest US, 4594 rows
go -- Northwest US, 4686 rows 

-- So, now what if we thought that the performance could be
-- better if we pushed this up into the FROM clause...
SELECT c.CustomerID, so.SalesOrderNumber, so.OrderDate
FROM Sales.Customer AS c
     LEFT OUTER JOIN Sales.SalesOrderHeader AS so
          ON so.CustomerID = c.CustomerID
              AND c.TerritoryID = 1 -- 20986 rows

-- Absolutely ALL customers, regardless of whether or not they
-- placed an order. For the Northwest territory we will see their
-- order information but for all other customers (even those that
-- have placed an order), we will NOT see their order information.
go 

What's probably most important is for you to thoroughly review the output of the last result set (of 20,986 rows). While I've explained the number - do you understand the results? Out of the 3,520 customers in TerritoryID 1 - 3,428 of them create 4,594 sales. For ONLY these sales, we see the SalesOrderNumber and SalesOrderDate. For all other customers - even those that have placed orders - we see NULLs. Why? Because moving this search argument up into the FROM has defined the criteria for the INNER join *BEFORE* the outer join is performed. This is why we see NULLs for those that have placed orders. If you order the results by TerritoryID this may help you better visualize what's happening!

Thanks for reading and thanks for the comments and discussions... I love the SQL Community for its never ending passion!

Cheers,
Kimberly

This is a bit overdue as Kevin Kline tagged me in a meme post to help people get ready for their travel to PASS. However, we were too busy with travel (we were in Australia at the time) to get to it. And, while this year's PASS has past, it's not too late for travel in general OR for next year! I suspect that a few folks are already planning for PASS, or Connections, or other business travel! So, here we go - my travel recommendations/tips.

Travel tips:
  • Join the frequent flyer program even if you're not a frequent flyer. As long as you travel within another 12 months (and sometimes even longer) your miles won't expire... Miles (for upgrades or free tickets) are always good :) and even if you're only a 2 or 3 times a year flyer - they'll add up (again, as long as you keep traveling).
  • Try to pick an airline and stick with it. The more you fly, the higher the bonuses. In fact, I'm so focused on flying just one airline that I'll often fly two flights (for example, SEA -> DEN, DEN->MCO [Orlando]) even when another airline (like American) has a direct SEA->MCO. This happens more than I would like but it also means that I get things like extra bonuses as the end of the year. For example, as I hit each 25K after 100K (ah, I really do fly *A LOT*), I get 2 extra systemwide upgrades. These allow me to upgrade (at the time of booking) for any international flight from departure to destination. So, I can upgrade SEA->MEL (even though that's actually SEA->LAX, LAX->SYD and SYD->MEL). But, systemwides are only given at the highest levels... so, you really have to fly a lot and you MUST stick with the same airline. My airline of choice: United. Also, to help you earn points faster - be sure to check the airline promotions REGULARLY. For example, many airlines offered a "double your EQM (Elite Qualifying Miles)" for Sept to Dec flight... all solely for registering. This might get you to the next level with little work! Here's United's page on promotions: http://www.united.com/page/article/1,,52102,00.html?navSource=Dropdown07&linkTitle=earnmiles.
  • When you book, start by doing some research online. Try general travel sites (Expedia, Travelocity, etc.) first but be sure to check the specific airlines as well. And, you *might* need to get creative if your itinerary is messy. Paul and I recently had to travel to Dublin and then Warsaw on the same trip but when we booked the entire set of flights the tickets were OUT OF CONTROL expensive. We ended up booking SEA->DUB and then WAW->SEA (it's called an open jaw) and we saved over 60% (no, I'm NOT kidding). Now, don't get me wrong - we spent something like 5 hours coming up with different options/combinations both online and on the phone. And, in many cases this kind of thing costs more not less so it can get tricky; it really isn't all that common but can really help. One thing of which to be aware is that connecting flights on a different ticket may cause you A LOT of grief (even though they can save you $$$). If your final destination if DUB and you purchase one RT ticket (let's say SEA to LHR) and then a separate RT ticket LHR to DUB but you plan to connect in LHR to fly immediately to DUB... some airlines/partners do NOT allow the baggage to be checked all the way through - this means that you'll need to pick up baggage in LHR and check-in again (and since this is international - it means going through customs/immigration). What this means is that you'll need more time (so pick those connections with 2-3 hours - maybe more). And, if there are any delays, etc. you won't be a high priority for them... this is where it can get messy. But, I have to admit this is something I do OFTEN. I always *try* to fly on partner airlines (meaning Star Alliance) as I have "Gold" privileges BUT, this doesn't always help (when checking in). However, it can help while en route. Again, fly the same airline/group and it does have its benefits.
  • Seats – when you book, try and get your seats assigned. And, be sure to check out your seat on SeatGuru (http://www.SeatGuru.com). If you book online (which can often offer discounts) be sure to follow up by calling the specific airline to get your seats assigned. In fact, if the price isn’t all that much better you might be MUCH better off using the specific company’s website and/or phone system so that you CAN get your seats assigned. And, in the worst case, if you can’t get your seats assigned – get to the airport EARLY (really target the 1.5-2 hour rule) and then be VERY nice to the folks that check you in… see if you can get an exit row or bulk head. Hmmmm, bulkhead seating is debatable. On some planes, bulkhead
Getting ready for travel:
  • A couple of weeks prior – go through all of your dates and itineraries and double-check everything. Once, I arrived at an airport for a flight and I had been booked on the wrong day. I remember the phone call well – I had been going back/forth about specific dates (and costs) and in the end, my E-ticket was wrong. Had I double-checked it, it wouldn’t have been a problem… Luckily, seats were available and all was well (and, I had elite status) otherwise, it could have been a nightmare!
  • Is there anything special you’re taking? Does it need to make it to the cleaner, etc.? Are you going to remember to pick it up in time? Really, planning a bit for this is important. I’ve forgotten to go to the cleaner the night before a trip only to realize that my flight is so early that I can’t even stop on the way to the airport. Now I need to go to plan b. This is why Paul creates a comprehensive “Pack List” and he’s got it online. When he thinks of something he wants to take on a trip, he updates this doc. Then, the day before he packs – he double-checks it, prints it and then starts packing. I’m pretty good at remembering most things so I haven’t done this but I have forgotten a few things over the years and travels. So, I’d really recommend this and I’m probably going to do this moving forward too!
Making the travel less painful:
  • Sleeping. Does melatonin work for you? Or, have you even tried it? If you’re crossing many time zones, you might want to pick up a bottle of this. It’s an anti-stress/herbal sleep aid and taking it right before bedtime might help you sleep better and adjust more quickly.
  • Sinus rinse. I swear by this and I honestly think this helps me avoid getting sick on some flights. I rinse my sinuses twice daily regardless of travel but given that you’re traveling with so many other folks and breathing recycled air – rinsing your sinuses can’t hurt. And, you might want to pick up some saline solution for in-flight moisture. Here's a link for the specific sinus rinse I recommend: http://www.neilmed.com/usa/products.php#isotonic. And, they have a nasal moisturizer as well: http://www.neilmed.com/usa/products.php#nasogel.
  • Flight socks. Honestly, I just recently found out about these and I LOVE them. My feet can swell up a bit while flying (I just recently had foot surgery) and now, I wear these even for shorter flights.
  • Slippers. Yes, I actually travel with wool slippers. Seriously, I get cold...really cold... And, when I travel with these they make a HUGE difference. Especially for long haul flights. These aren't hard slippers - more like thick socks (easy to pack).
  • Snack. Even if you do get an upgrade – having a snack with you might make an otherwise hungry time a lot more bearable.
  • Water. I buy water in the airport prior to boarding so that I’m never without. Flying can be very dehydrating and that will only help you get sick.
  • Entertainment. Have something ready for you to read, watch, work-on… whatever. This is always my excuse to buy the trashy mags at the airport before I board. However, I have to admit that I know fewer and fewer of the stars referenced. Not sure how I’m supposed to feel about that?! J

Well… I’ll plan to come back to this and update it with other tips/tricks as I think about them!

Safe – and happy – travels to all! And, I look forward to seeing you at an event sometime soon…if not, at PASS, or Connections again next year.

Thanks for reading!
kt

Categories:
Conferences | Opinions

OK, I know I don't blog all that often but when I do, I do try and post as much useful information as I can :). I've got a few posts in the queue and a few more tests to do and code to write before I can wrap them up. In the interim, Paul and I have both decided to throw a bit of our "spare time" to keeping up with friends and family more and just staying "more connected." In that effort, we've both joined facebook and twitter (www.twitter.com/KimberlyLTripp and www.twitter.com/PaulRandal). Our end goal is:

  • Blogging: large/complex posts with detailed info/code, etc.
  • Twitter: short, quick posts on things to check out, review, etc. Maybe I'll do a weekly summary of tweets that I do and/or followed? Would you be interested in that? And, as for my interface, I'm currently using TweetDeck and Twitter.com and I've also joined via WeFollow.com: #sqlserver, #mvp and #womenintech.
  • Facebook: more fun stuff and keeping up with friends, etc. I have to admit that I really love the interface and I'm constantly impressed at how easy it is to upload video, photos, etc... It's just NOT what I expected before I tried it out. I'd truly recommend this to anyone that wants to asynchronously keep up with a large number of people and wants to share photos, comments, video, etc. It's really well done. However, beware of many of the facebook apps. They tend to spam your friends list - sometimes even when they don't ask. Outside of better requirements on fb app developers, I don't have a lot of complaints there.

Having said that, neither of us is doing anything else (no Plaxo, no LinkedIn, etc.) so if you want to find us - we're definitely around but we're going to stay somewhat focused. ;-)

And, now that I'm back home again, I hope to have a few more of my longer posts done.

Cheers everyone!
kt

Kimberly L. Tripp's Facebook profile Paul Randal's Facebook profile

Categories:
Opinions | Personal

A couple of weeks ago I wrote a blog post titled Whose job is it anyway? It's an interesting debate and something I've been hearing more and more - that SQL Server is a "set it and forget it" technology - a black box where you just don't need to know how it works to do well with it. In fact, I've even had a few folks comment that they think it would be better to "roll their own" database rather than have to learn how to work in a "general purpose" database. And, while there are certainly lots of different angle to this debate - one fact remains... if you don't know anything about the database on which you're developing (whether it's SQL Server, mySQL, Oracle, whatever), I *PROMISE* you won't have a truly scalable, optimal solution. Why do you think there are so many knobs? It's because there are so many different ways to work with data. There is more than one way to query, more than one way to design. This is also why every answer to a "how should I do this" question starts with "It depends". And, while that seems like a scary response it's actually a good one. It means that you have lots of options - options that can offer many different pros/cons. And, as a result of knowing these pros/cons, you can make better decisions - decisions that will ultimately determine how well you can scale.

So..... while I don't think this debate will EVER be finished (as to WHOSE job it is to know these things), I do think a lot of folks are seeing the effects of not knowing more about their store (and, again, this is NOT limited to SQL Server in any way, shape or form).

At a minimum, hear the discussion on RunAsRadio with Richard, Greg and I and let us know what you think!

Kim Tripp on the Roles of Developers and DBAs with the Database!

Cheers,
kt

OK, I was tagged by Ward Pond here. Ward was tagged by Jimmy May here. Jimmy was tagged by Kevin Kline here. Kevin was tagged by Chris Shaw here. Chris was tagged by Michelle Uford here. And, Michelle was tagged by Mike Walsh here.

Ah... at least I know who to blame. Ah, thanks Mike! ;-)

So, as for some things I know now that I wish I knew then and/or what I value the most. First, I took a peek at a lot of the other posts. There are some really good tips and tricks in all of them and so it's hard to come up with something really clever. But, I had to admit the following:

You don't know what you don't know..

Yeah, I know. Then you don't know. So, does it matter?! Actually, yes. There have been times where I wished I had listened better before I jumped to what turned out to be the wrong solution. Sometimes taking more time at design - can REALLY help. This is something that I've been working really hard to prove in a lot of my recent posts. We've all thrown together a schema... got things working... and then found out it wasn't ideal... But, how much can you really know if you don't know the workload? Well, this is where the listening part comes in. Talk to your users (as painful as it may be ;). Talk to the stakeholders. Even if you can't figure out the specific queries that are going to run, give them sheets of paper on which they can draw their own dialogs and tell you what they'd LIKE to see (tell them to visualize what they want to see on the screen - on the paper). This is an ideal way of getting to the meat of what they want/need - without getting lost in the "glitz" of the UI. Even if you only have some insight into what they'll be doing - some is WAY better than none and the more you listen - the better you'll design. The better you design; the better it will scale/perform.

You can't know it all!!!

Give up now. Don't try to know everything - you'll never sleep. I work really hard knowing a lot about a lot of different things - but, I still specialize AND, I'm HAPPY to say "I don't know" (and say it often). In fact, what I pride myself on the most is that I'm never afraid to prototype and/or try something out. One of my favorite sayings for SQL Server is: "Jack of all trades, master of SOME." Know enough to know you need to learn more (because you plan to use that feature)... And, know enough to know when a feature is NOT appropriate. But, don't get lost in all of the minutia. We just don't have time for it. Choose your battles.

And, since I'm starving and I know when to quit. ;-) I'm going to tag a few folks including a couple of NON-SQL folks and hear what they say!! And, Paul was also tagged - check out his post here: http://www.sqlskills.com/BLOGS/PAUL/post/Things-you-know-now.aspx.

And, I'd love to hear what you know now!

Thanks,
kt

Categories:
Opinions | Tips

When you decide to rebuild or reorganzie an index, you have an option to set something called FILLFACTOR. This is probably the MOST IMPORTANT thing to understand about index maintenance and reducing fragmentation (especially in databases that are prone to it). Unfortunately, we need to dive into some internals to really understand why this is so helpful...

What is FILLFACTOR?

FILLFACTOR defines how full the leaf level pages of the index are filled when rebuilding or reorganizing an index. The leaf level of an index is the largest structure of an index and it's the most prone to fragmentation. A good analogy would be to use the phone book (ah, loosely for this one) and imagine that new people come to the area. The *entire* phone book needs to get reprinted. Obviously this is terribly expensive and therefore it's not done often (well, we all use the internet anyway...). But, imagine if EACH page of the phone book were only filled to 90%. When new people come to the area we could just add their information - on the specific page where it should reside - and the phone book stays relatively intact and we don't have to kill a bunch of trees (OK, you could argue that you would have had to kill more trees to make a phone book that has 10% whitespace...but, let's not go there right now. :). OK, I know this analogy is weak at best but think back to "a page in the phone book" and think of it representing a "page" within SQL Server (quite literally here - SQL Server actually stores data on 8KB pages and indexes are doubly-linked lists of these 8KB pages).

So, if the pages are full and there's nowhere to put a particular row (let's say someone with a lastname of 'Tripp'), how does SQL Server "insert" this new row? They can't possibly shift all of the data down and rewrite all of the pages AFTER 'Tripp' could they? I suppose they could but it would get REALLY expensive - especially if someone with the last name of 'Anderson' moves to town...

How does SQL Server make room for new rows? They do something called a page split. Now, this can be A LOT more technical than I'm going to explain here but simply put, they split the page (on which the row is to be located) into half. The first 50% of the rows STAYS on the current page and the 50% MOVES to a new page (and this new page is probably nowhere near the page being split). However, they keep this list LOGICALLY in order by maintaining a doubly-linked list. So, SQL Server knows where to go when it's reading the data - even if the pages are not physically/sequentially in order. If this happens once or twice, it's not a huge problem. But as it continues to happen, it makes scans of your data more and more expensive and the process of the split can be expensive in and of itself. The biggest problems are:

  1. The data is not physically in order (and can create lot random IOs when scanning)
  2. The pages are no longer very dense (they've gone from 100% to 50%). Splitting does has a positive side in that it's going to take awhile before the page splits again BUT, splitting is meant as a temporary fix and ideally, it shouldn't be happening to every page. But, this also negatively impacts cache - you're only using 50% of the page that's split so now you're wasting half of that page in memory (and, on disk). So, while making a table 10% wider from the start seems bad; it reduces the need for splitting and therefore your table will stay more intact over time. 

What indexes have fragmentation?

Some indexes are prone to more fragmentation than others. For example, indexes with a high-order element (the first column of the index) that is NOT in ever-increasing order. This does NOT mean that ALL indexes should be created such that their first column is ever-increasing (that's NOT a good way to create indexes!). But, it does mean that you should pay special attention to some of your larger tables and therefore larger indexes (meaning ESPECIALLY the clustered index since the clustered index is over ALL of the columns). 

In terms of index management, here are the main things to think about for each index type: 

Rebuilding the clustered index is the most expensive index to rebuild. It's also the one that you'd rather avoid if you can. It has the most limitations... A clustered index CANNOT be rebuilt as an online operation IF the table has ANY LOB columns in it at all. What this means is that the clustered index - if fragmented - must be rebuilt using an offline operation. This translates into downtime for that table - while it's being maintained. So, if you can create a good clustering key EARLY, then you might be able to minimize clustered index maintenance.

Rebuilding a nonclustered index is much easier and likely to be an ONLINE operation. Because nonclustered indexes contain a much smaller number of columns - and rarely ever include a LOB column - they can be managed/maintained through ONLINE index operations. So, even if these are prone to more fragmentation (and, in general, they are) then it's not as big of a deal.

How do I solve this problem?

So, the BEST thing to do is to visually picture the ORDER of the data in the index (just think of it like a phone book - a LIST of records on PAGES - ordered by the KEY of the index). If you can visualize that structure then you can visualize what happens on an INSERT, UPDATE and a DELETE. Once you have a good visual you should be able to define a FILLFACTOR based on these two things:

  1. How much fragmentation is happening (is it becoming massively fragmented quickly) - then set a lower fillfactor (like 70 or 80 - depending on how big the table is)
  2. How frequently are you going to be running maintenance (if you can do this daily or weekly - for some objects) then you might not need much free space. If you can run maintenance against the table frequently then you won't need as much free space.

And, of course, you can directly check the level of fragmentation your table has by using the DMV: dm_db_index_physical_stats (which I referenced in Part I) and ideally, you'll create a procedure that runs through your objects determining how much fragmentation they have and the appropriate FILLFACTOR. Sometimes the best way to do this is to create your own "management table" which lists all of your tables and then have definitions for when to rebuild, when to reorg and what fillfactor to use. There are a few folks that have done things like this. Check out Ola's site here. He's put a lot of effort into a more robust and flexible combination of options for database maintenance. It might be more than you need but it's a GREAT EXAMPLE of all of the things that you can evaluate and consider wrt to maintenance.

Solving this using Database Maintenance Plans

Finally, if you CHOOSE to use a maintenance plan you ONLY have two options:

Reorganize pages with the default amount of free space
Does NOT sound like what it actually does (IMO). It sounds like it uses some form of SQL Server default value. However, it actually uses the "OrigFillFactor" value that's set for the index. This is actually a good thing IF you've set the FILLFACTOR on the CREATE INDEX statement OR on a rebuild. Reorganizing does NOT allow you to set this ONLY a rebuild does. So, if you have set this then this choice is probably best. If you run a query against sysindexes (yes, the backward compatibility view NOT the new sys.indexes catalog view), then you can see the OrigFillFactor setting for ALL of your indexes. (And, it's probably 0 - which sounds strange... 0 actually means 100... I could explain this but just trust me on this one.)

Change free space per page percentage to _____ %
Well, this is certainly an option you can consider. If you've never set the FILLFACTOR (or, even if you have...), this would overwrite it AND change the OrigFillFactor value to the percent you use here. If you decide to take this "sledgehammer" approach then I'd suggest something quite high - like 90-95% so that you don't end up wasting a lot of space. Some indexes (ones that have a high-order element that IS ever-increasing) don't really need free space and this might be completely wasted. So, I think it's better to set this indivually using stored procedures and/or a more custom maintenance procedure.

NOTE/TIP FROM COMMENTS POSTED: While I explain this concept (above) using the index option/setting (called FILLFACTOR) the UI is wanting the FREE SPACE percentage. So, if you want to use 90-95% FULL then be sure to set this value to 5-10%. Using the syntax, you will set page fullness.

OK, I still have at least 2 more in this series and EVEN THOUGH I've been completely sucked into facebook (as Paul blogged here), I'm still trying to manage to work and save SQLskills. Paul, on the other hand, just uploaded more photos. ;-)

Cheers,
kt

OK, it seems as though there's A LOT of confusion about what steps are required for proper database maintenance. And, it seems as though some recommendations are being given as "quick fixes" without any real recommendation for root cause analysis. I'm not saying that the generalizations are horribly wrong but in many cases they're just too broad and/or unspecific to actually be useful (and, well, in all honesty, some of them are just really bad recommendations because they’re so ambiguous). And, in my random internet trolling for the day, I found 4 different references that I want to go through (which is why this is only Part I). For this post, I’ll focus just on Sharepoint.

First, what did I see that’s motivating this post?
I found the following KB article – which was referenced by numerous sites as recommended reading. And, without knowing a lot about SQL (and, that’s NOT meant as a dig at all – it’s just that most apps that sit on SQL don’t ever even recommend that you need to know SQL and I can argue certain aspects of that point as well BUT, wrt to maintenance, it can really become a problem if you don't know a few things about these tasks), it does seem like good reading: Information about the Maintenance Plan Wizard in SQL Server 2005 and about tasks that administrators can perform against SharePoint databases

Here is the part that over-simplifies picking what maintenance tasks to run vs. what not to run:

DIRECTLY TAKEN FROM THIS KB ARTICLE IS THE FOLLOWING:

We have tested these tasks and the effects that these tasks have on database schema and performance. The following table summarizes the results of these tests.

Task                                        Safe to perform this task?
Check database Yes
Reduce a database Yes
Reorganize an index Yes
Clean up the history Yes
Update statistics Yes. However, this task is unnecessary because the SharePoint Timer service performs this task automatically.
Rebuild an index No. The task does not restore existing index options before the rebuild operation. However, you can use scripts that restore index options.
Note This problem was corrected in SQL Server 2005 Service Pack 2.

We used the following criteria to determine whether a task was safe to perform:

  • Whether the task modified the database schema from its default state
  • Whether the task decreased performanceResults may vary depending on the environment.

However, if you use the Maintenance Plan Wizard to perform the tasks that are listed in the table as "safe to perform," you are likely to experience increased performance in SQL Server 2005.

The big problem is: this is just too little information about too many VERY important tasks!

Let me break this down task by task and give you a few other places to go for more information.

Check database

The check database task refers to DBCC CHECKDB. This is definitely an important part of any maintenance plan. And, it really is a safe task to run as it’s NOT corrective by default. However, there is nothing mentioned about how this command may completely flush your buffer pool as it reads all of the pages of all of the objects it’s checking. So, this might impact performance but, of all of the tasks, this is the safest to run and it’s definitely a recommended task.

If you want to learn more about DBCC CHECKDB, check out these things: 

  1. Read Paul’s blog post to his TechNet article on Database Maintenance. It’s a great overview of all of these maintenance tasks and will give you a good overview of what each one does. Here’s a link to the blog post that pulls all of the TechNet resources together: http://www.sqlskills.com/BLOGS/PAUL/post/TechNet-Magazine-Effective-Database-Maintenance-article-and-August-SQL-QA-column.aspx  
  2. Read Paul’s blog post on Myths around causing corruption – so that you can get better insight into where/why the actual corruptions are occurring. 
  3. Finally, if you’re really interested in the internals of CHECKDB and how it works – Paul’s written a ton about it in his CHECKDB from Every Angle category. FYI, 3 of his 9 years on the SQL Server Development Team were spent writing CHECKDB and repair for SQL Server 2005 (so, he definitely knows how it works J). Here’s the link to the category: http://www.sqlskills.com/BLOGS/PAUL/category/CHECKDB-From-Every-Angle.aspx

Reduce a database 

OK, I’m sure I’ll get a lot of responses to this one but IMO, a database maintenance plan SHOULD NEVER INCLUDE A SHRINK.

Let me explain… J
To be honest, I'm not even a fan of manually running database-level shrinks (DBCC SHRINKDATABASE) either. Don't get me wrong - there are ACCEPTABLE times to shrink parts of a database but, in general, I'd recommend only using DBCC SHRINKFILE for individual file-level shrinks. I wouldn't schedule shrinks nor would I EVER turn on [the database option] autoshrink. I don't think shrinks should EVER be automated - either through the database option OR through maintenance plans.

If you need to do regular shrinks - then it's likely that you have some other problem. And, without DIRECTLY addressing this problem, you *might* be making things worse.

This is a bigger discussion and I’ve found a few other references that I want to pull together. I’ll post another post about this within the next day or so – and link to it from here BUT, for right now…Know this – free space is generally GOOD. Excessive free space has happened FOR A REASON. Maybe there’s a pattern to it but often shrinking is worse than just leaving the free space for the next data explosion (a bunch of data comes in, the database grows, the data is archived, the free space remains for the next set of data that comes in).

If you shrink the database you might make things worse by fragmenting everything. Paul’s video that goes with the TechNet article on Database Maintenance shows you the [shocking if you didn’t know this] effect of shrinking a database on indexes.

Reorganize an index, Update statistics and Rebuild an index

These need to be grouped together to start because this KB article does NOT address the impact of running these together. In fact, the problem, if you run these together – is that you MIGHT make things worse. First, let me give you an overview of each:

Reoganizing an index removes fragmentation in the largest part of an index (it’s called the leaf level of the index) and removing fragmentation in this level has the greatest (and positive) effect on range query scans and cache. So, this is really the most important type of fragmentation to remove. However, this is NOT the only way to do it… 

Rebuilding an index completely and totally removes ALL forms of fragmentation in all levels of an index; however, this is the most expensive (yet most effective) way to do it. As a result of rebuilding an index, SQL Server also updates the statistics for the indexes that were rebuilt. Therefore you do not need to update statistics OR reorganize an index if it gets rebuilt.

Updating statistics is important for query processing and optimization. The query processor uses statistics on your data to help determine how many rows will be processed by your query/statement. If SQL Server can accurately estimate the rows, then it can choose a more effective plan. However, if it doesn’t have good statistics, then it may not do as good of a job at accurately estimating rows and therefore it might not come up with as optimal of a plan. So, this is an integral part to good database health. However, some of this might be done via the database option: auto update statistics which is ON by default in SQL Server (and, YES, you should leave this on). Check out this post on: Auto update statistics and auto create statistics - should you leave them on and/or turn them on??

However, if you use a maintenance plan then I really see two problems: 

  1. You’ll end up doing maintenance on things that may not need it. The default behavior for these tasks is just to run them on the selected objects. And, since many people will choose all objects (possibly even of all databases) then you’ll probably select objects that won’t really need this as frequently as you run this maintenance plan. 
  2. You might end up running a combination of things that either – wastes cycles/CPU and a MASSIVE amount of log space (which can translate into all sorts of concerns for DR technologies like database mirroring which will need to send all log rows to the secondary server). For example, if you run ALL three of these things then they’ll have to be run in a certain order (you can change this in a maintenance plan). However, the default order is: Reorganize Index(es), Rebuid Index(es), Update Statistics. This means that the work that’s done by reorganizing is effectively wasted as the rebuild would have taken care of it AND the work that’s done for updating statistics could mean that they update statistics TWICE (during the rebuild AND after) and the end result is WORSE because the update statistics command might use a sampling mechanism to generate statistics (which can lead to LESS EFFECTIVE statistics information). However, this is ONLY if you change the wizard’s default. The default is for the updating statistics command to do a “full scan”. So, even if the statistics will end up being the same – it’s still problematic because it means that for all indexes you’ve just rebuilt – you’ve now updated their statistics TWICE.

SUMMARY

A database maintenance plan is CRITICAL for best performance (especially for databases that are prone to some of the problems corrected by these maintenance tasks (yes, you can read SHAREPOINT into that statement). Sharepoint uses GUIDs as PRIMARY KEYs (read this post to hear more about the side effects of this choice) and as a result, as clustering keys. This means that many Sharepoint tables are prone to [potentially a MASSIVE amount of] fragmentation.

You absolutely need to have a maintenance plan. But, what should it be?

My absolute preference is to NOT use the Database Maintenance Plan Wizard UNLESS you really know what you’re doing. It just doesn’t give enough prescriptive advice. And, if you just select the defaults, you will end up with an inoptimal maintenance plan.

A better approach would be to create your own maintenance plan. If you write the code yourself (or leverage one of the custom ones that are already out on the web) then you can strategically target ONLY the objects that have the warning signs and/or are out of date and you can set when to rebuild vs. when to reorganize (generally people rebuild if a table has more than 30% fragentation and they reorganize when it's less than 30%). Fragmentation is something that can be detected programmatically using the DMV: dm_db_index_physical_stats (in SQL 2005/2008) or by using DBCC SHOWCONTIG (in SQL 7.0/2000). Here are a few places to go to see the more flexible and programmatic way of rebuilding/reorganizing indexes:

Smart Indexing Part II - Conditional Rebuilding a blog post (with conditional index rebuild code) from SQLMCA Bob Duffy (a good friend who is located in Dublin, Ireland and whose wife (Carmel) just had a baby last week – congrats again Bob!! You guys are seriously outnumbered now!!!) here: http://blogs.msdn.com/boduff/archive/2007/06/08/smart-indexing-part-ii-conditional-rebuilding.aspx

Custom Index Defrag / Rebuild Procedures - a blog post with some posted code as well. http://www.sqlstuff.info/post/2008/03/Custom-Index-Defrag--Rebuild-Procedures.aspx

Rebuild and Reorganize Indexes in SQL 2005 – an article (with conditional index rebuild code) from SQL Server Central here: http://www.sqlservercentral.com/scripts/31857/  (NOTE: You will need to become a subscriber to get to this article.)

Rebuild Only the Indexes that Need Help - an article by Andrew Kelly (SQL MVP) on SQL Server Magazine here: http://www.sqlmag.com/articles/index.cfm?articleid=99019&pg=1 (NOTE: You will need to become a suscriber to get to the full text of the article.)

Or, build your own! Check out the BOL topic for the sys.dm_db_index_physical_stats for SQL 2005 here: http://msdn.microsoft.com/en-us/library/ms188917(SQL.90).aspx, Example D has sample code to help you get started! For SQL 2008 it’s here: http://msdn.microsoft.com/en-us/library/ms188917.aspx. It’s still Example D for the sample code to leverage. J

The most important thing I can tell you is that a SMALL amount of time getting familiar with what’s really happening in SQL as well as WHY it’s happening to you WILL BE A LOT MORE PRODUCTIVE then just slamming in a maintenance plan that solves some problems but probably creates others.

Hope this helps!
kt

Well, today has definitely been enlightening. I've been around and around with a few folks (mostly Tim Huckaby and Richard Campbell - who are BOTH great friends and people whose opinions I greatly respect!!!) and I've come a conclusion... Paul started down this line of commentary here: http://www.sqlskills.com/BLOGS/PAUL/post/Mandatory-SQL-breathalyzer-test.aspx and this was in reference to something I was saying in our long thread...

But, where I'm at is somewhere that I'm not sure entirely understand AND I need your help. I need to understand where and with whom do you think each role should be.... Here's what I think:

The DBA has to be able to trust the code that a developer submits – plain and simple (yes, this is absolutely true). But, (IMO) it’s still the database developer that writes the code and they’re the ones that need to know what goes on IN the db and the best practices of database design/development. The database developer should know how to create scalable schemas, they should know best practices, they should have a fundamental understanding about their specific platform as well (whether it's SQL Server or DB2 or even that O one :)). At a minimum, they should have a basic understanding of things to avoid... But, I'll give you this - it's not easy and it's not like the best practices really jump out at you. I'll even argue that some of the defaults INSIDE of SQL Server itself aren't ideal (ah, let me give a list: database size and autogrowth settings have HORRIBLE defaults, the PRIMARY KEY of a table is - by default - enforced with a UNIQUE CLUSTERED INDEX (and, well, there are many cases when that's not the best CLUSTERING KEY even though it might be a completely legitimate PRIMARY KEY), databases default to the FULL recovery model but run in a pseudo simple mode where log truncation on checkpoint occurs automatically (ah, until you do your first backup...)). Ugh, many of these things can catch you out and can completely cause you grief... and some are horribly complex to change after the fact (like what is your PRIMARY KEY and/or what is the index type of a key). So, it's definitely NOT easy. BUT, back to my original point - someone needs to know this and I need to find out who YOU think should? I think it's the database developer and NOT the DBA. I think there should be a somewhat blurred line between the database developer and the DBA. IMO, the DBA needs to know operational stuff (backup/restore, availability, security, maintenance… and, this is a HUGE list of things to really understand in and of itself - especially when you're managing this for many servers), it’s not their job (IMO) to know TSQL coding, error handling, stored procedure recompilations (although I can debate both sides of this last one), I would even argue in many cases it's not the DBA that should make the primary decisions on base indexing - I *REALLY* feel that it is the database developer - they should know the database schema, know the general access patterns AND know how the platform uses those objects so that they can make better design decisions EARLY. My favorite line is "the sooner you begin to code, the longer it's going to take" and it's really true.

However, with all of the discussions I've had today (and, I'm debating on posted an edited version of each of our rants... it's colorful in places :)), it feels to me like (and what I’m seeing more and more and more) is something that got mentioned early on in the thread – the database is “just plumbing” and the real application is in the client. Is this true... are there really only two players:

The Application Developer (who doesn't touch (because they're not allowed) OR possibly because they don't want to) the database
The DBA (who doesn't allow the AppDev to touch the database BUT doesn't touch it for them either because they're too busy and/or too protective over changes??)

Seriously, IS there a database developer in this formula above? And, is it a distinct person? I don't think so (personally)... I think the Application Developer SHOULD be the database developer. I think the AppDev should do EVERYTHING in terms of schema, keys, indexes, procedures, etc. AND once in production the two should work together to handle efficient and effective change control policies and procedures. There is a way to make this work. But, when good design doesn't exist - the database performance/scalability suffers, bad practices fester and then everyone blames SQL Server and no one really makes a change. In some cases, people just accept the poor performance as "how it works".

Thoughts??
kt

Given the general state of the economy...many companies are looking to cut back. Going back over what we've done and "optimizing" things -> budgets, expenses, etc. is the norm right now. And, scaling back is not always a bad thing - unless the wrong things are cut. Unless the wrong things are used to motivate you. Prioritizing and/or really assessing what gives you the biggest gains for your dollars is hard. In fact, one of the things that always seems to be first on the cutting block is training. Training is hard to quantify. And, the results of good training are also hard to quantify. Instead of fixing a problem (which you can often see the exact improvement) you might instead avoid a problem. Avoid downtime. Avoid data loss. Process more rows - with the same hardware. But, how do you know the cost of what could have happened. Ugh. To be honest, if I could do that - Paul and I would be on a beach. ;)

But, I do have a reason for this post... what should you be thinking? Where should you focus your attention? What can you cut - safely, temporarily, permanently and what might you help to prioritize?

Should you upgrade software?

  • Is there a feature that makes something easier? Some new features are really powerful "big" features. For example, Policy-Based Management (PBM) might help you to better centralize certain rules (in PBM-speak "policies") and then enforce them on many servers - even 2005 servers... so, you might be able to upgrade a smaller number of servers and still get some of the benefits. Many of the tools work against multiple versions so you might be able to minimize (and/or prioritize) which servers you upgrade and slowly migrate others. Potentially following an every-other-version upgrade strategy... upgrading some servers from 2000 to 2008 and leaving some of your 2005 servers to wait to upgrade until SQL11 (the next version after SQL10 - which is SQL Sever 2008).
  • Are you starting a new project - architecting a new database? Wouldn't it be easier to start on the newer version and get better longevity (maybe?!)? For example, sparse columns might make a major difference in your base table's architecture...and be easier than if you were to architect (and write all of the code) for 2005 but then later need to do a major architectural change to move to 2008 (well, to *really* benefit from things like sparse columns). There are some really good features in 2008 and some *might* warrant upgrading... upgrading now. But, if you don't have a direct need then I'd argue that you could probably stay with 2005 (or even 2000) and then push this out a bit until you absolutely need to move forward.

Should you upgrade hardware?

  • Again, are there features that will directly impact: performance, availability, manageability?
  • Can you wait? I can't really answer this and - for everyone - the answer is going to be "it depends". There might be something that significantly reduces costs and/or minimizes downtime and as a result, you'll just have to do cost-benefit analysis. This is a tough one... but, maybe you can do rolling upgrades and let some of the lesser servers take the hand-me-downs. :)
  • Can you do rolling upgrades moving the most critical to a new server and then a less critical server to the one freed up by the last upgrade...

Is there anything you can do to get more out of what you already have??

In my opinion, this is probably even more important than the two above. Upgrading hardware and software is something you will ALWAYS need to consider but if you could get better performance, scalability and availability out of the hardware/software you have now, then you'll benefit *now* without additional funds spent (actual outgoing funds) and you still be able to leverage what you do today when you do upgrade. So, what this really translates to (IMO) is tweaking and tweaking a bit more - what you already have? How? What can you look for? What can you do to help??

  • Upgrade to the latest service packs/hotfixes (at least upgrade to the free stuff - you might see some gains and in some cases (like SQL Server 2005 SP2+) you might get some new features. (important note: test this on a non-production server FIRST!!)
  • Update your hardware's firmware? You might have missed an update that improves performance (important note: test this on a non-production server FIRST!!)
  • Bottleneck Analysis - Some good resources for this are: Performance Tuning Using Waits and Queues and the SQLCAT team.
  • Workload Analysis - Some good resources for this are: Troubleshooting Performance Problems in SQL Server 2005, Working with Tempdb in SQL Server 2005, Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005...well, there are lots of good whitepapers that are specific to certain types of workloads and/or perf problems...check out our whitepapers page here: http://www.sqlskills.com/whitepapers.asp and the CAT team's whitepapers pages here: http://sqlcat.com/whitepapers/default.aspx and the general SQL Server on microsoft.com pages here: http://www.microsoft.com/sqlserver/2008/en/us/white-papers.aspx and for 2005 here: http://www.microsoft.com/sqlserver/2005/en/us/white-papers.aspx
  • Maintenance - often overlooked and incredibly important. A database that has solid maintenance practices (fragmentation analysis and cleanup, VLF analysis and cleanup, transaction log management, finding corruption in its early stages through automated CHECKDB executions...) performs better, is easier to recover, might naturally stay smaller (more compact) and therefore require less hardware. In fact, analyzing indexes - to get rid of unused indexes and to consolidate redundant indexes can end up saving disk space, backup space, cache, maintenance costs, etc. Both Paul and I have blogged quite a bit about many of these!
  • Other tips and tricks
    • Blogs... which is why you're here and there are so many out there! Here's a link I recently found that lists a bunch of SQL-related blogs: http://technet.microsoft.com/en-us/sqlserver/bb671052.aspx and, of course, Paul's post on "So many blogs" and the PASS list of blogs here: http://www.sqlpass.org/Community/BlogDirectory.aspx.
    • Webcasts... there are lots out there and we now have a page which has most of ours listed on it (thanks to Paul for creating this!!) here: http://www.sqlskills.com/webcasts.asp and there are LOTS more on TechNet, MSDN, etc.
    • Conferences... OK, maybe a shameless plug for conferences like SQLConnections *but* in having put together the agenda (with Paul) where we specifically focused on best practices topics and performance tuning - I can tell you that some of the tips and tricks that we recommend can significantly improve performance, may minimize needed disk space (by creating more optimal and often fewer indexes), may improve availability with better design practices and/or maintenance and much more than that! And, in getting away from the office for a few days and focusing just on learning you might do two things. First, you might learn some tips and tricks that you never would have (or it would have taken *a lot* more time and/or been harder to really understand?). Second, you might come back with a whole new and renewed enthusiam for doing things - and with an ordered/prioritized list of things to try. And, this might even help to motivate you because it also shows that your company really is committed to you/your job (having spent money specifically on your learning) - and you to them.

So, I do think that there are SMARTER ways to save. A well trained employee is worth a lot more than a cheaper one. And, there are smarter things to cut. I hope this might help you think of things to do and/or places to look to get better performance with what you have! I think blanket "no training" or "no upgrades" statements are never good for anything - even the budget (the longer term effects can be much worse - but also much harder to quantify).

Really, the answer is always different. It depends............

kt

Well, I had wanted to come up with a clever reply to my husband's oh-so-romantic blog post (here). And, well, in all honesty, that *is* the best present for me... but, probably not what most would say is romantic ;-). We've both had quite an effect on each other's presentation styles. We constantly remind each other of what works and what doesn't work. In fact, we even take notes when we watch each other's sessions so that we can very clearly and concisely state what we think works and/or doesn't work in a session. So, that got me thinking that I'd like to add a few things - mostly about how to get started in presenting - as his list for tips/tricks in presenting was extremely thorough!

Presenting tips:

  • Demos: OK, everyone says this - yes, your demo should work. A demo that fails is easily the most frustrating thing to happen on stage. Not only is it frustrating for you (as the presenter) but the attendees feel like their time is wasted. And, in all honesty, this should never happen. When it does it usually happens because of one of the following: the demo wasn't prepared, the demo was overly complicated or something changed recently (and you hadn't recently tested it). And, all of these are preventable.
  • Passion: Paul mentioned that you need to find a topic for which you have a passion - and I completely agree. This is probably the most important part for me. I've often been asked to talk about X or Y and if I can't find something really compelling to focus on, then it's definitely not my best session. In fact, less and less am I presenting other people's content and/or other people's abstracts. I really want to construct the session, the content and the message (in general). Having said that, what I want to add is passion/excitement/fun. The more fun you have with a session (and, I don't mean jokes - I just mean good content that makes sense and that is interesting), the more fun it will be to present. If you're having fun and presenting something with passion (that you have specifically for that topic), then it's contagious. People will remember more and take more away with them.
  • Mentor: Paul mentioned this in his post, Greg Low mentions this in his series Presenting at Large Events, and large conferences such as TechEd have people like Richard Klees on-site to help with this as well. But, if you can find a colleague or other presenter to really watch your actual session (and you watch theirs), then you will get the best feedback. The actual session - in front of real attendees - is the only place where everything counts (unfortunately, all of your presentation gotchas don't always show in a "test run"). So, it's here where you can get the best feedback and where you can really learn.
  • Don't stop learning: I've presented in one way, shape or form since 1987 when I gave "short-courses" on WordPerfect for the local students and staff at my University computer center (where I worked). And, in over 22 years, I still read evaluations, still look for books on presenting, still read blogs, watch webcasts, etc. A session can *always* be better. A presenter *always* has things they can work on.

Getting Started (which can also translate into improving your own skills/knowledge and even your own position within the company):

  • Create a work/study group: Something that can help to "find the right tool" for the job is to divide and conquer. In our technical fields, no one can ever know everything. And, sometimes we get stuck in a rut - solving problem after problem with the same solution. I always use the "tool" analogy because I think it works well. If you have a problem in your house you don't always grab the hammer, right? But, I don't know how to use every tool either - if I don't know exactly what tool to use, sometimes I'll ask someone else to look at something. Sometimes a second set of eyes is exactly the thing to do. They might see an easier and/or a quicker solution. So, since you can't know everything - know a lot about a lot of things. Have a small work/study group that meets weekly or bi-monthly where each of you tackles a topic, a feature, something. Or, maybe you all read different blogs or newsgroups - then come back together and each do a 10 minute presentation on what you found. This can keep you better informed about other tools you might need to use someday AND can help you to start presenting (on a small scale)

NOTE: Before you do any of the following, you might want to check to see what your company policies are with regard to blogs, user groups, conferences, etc. With an appropriate disclaimer and/or no direct references to the company, you might be fine. Just check to make sure! 

  • Present at a local user group and/or consider creating one if one doesn't exist: these a typically 10-50 people who meet at least once a month. Some user groups are much larger... But, this is a great way to learn more AND present what you know to a bigger group but, with (usually) a bit less pressure. The PASS website has a large list of chapters: http://www.sqlpass.org/Chapters.aspx. Consider even working with the local user group to have a side group that meets solely for improving presenting skills - you can present to each other in the same way that the work/study group would but this is a group outside of your company.
  • Create a blog: there are lots of sites you can join to blog on and my personal opinion is to see if you can write consistent posts for a month or so (without them being published publicly) and then slowly publish the ones you've written so that you can stay ahead of the game. Blogging is hard. I have a hard time keeping up with it but if you have a good message - that's useful to others - then it's worth it.
  • Create podcasts to post on your blog: This can be another way to present but without the stress of an audience.
  • Write for magazines (TechNet, MSDN, SQL Server Magazine): This is a bit harder but if it's a good article then it's likely to be published either in print and/or on their associated websites. This can give you more exposure and in turn, help you to hone your message and your presentation skills.
  • Submit sessions for larger conferences: This is the hardest and will take the most time. It's really nothing personal if you're not chosen the first or second time around (and these days the large conferences are even more competitive so it's *really* hard - don't get discouraged!!). Most large conferences want people who have a name that attendees can recognize. So, this takes time. The previous bullets *will* get you to a point where submitting to a larger conference makes sense - but, even then it's still not a guarantee that you'll be accepted. The more you blog, write, present at user groups, etc. the more people will get to know your name and your presentation skills. This is what the larger conferences are looking for - people that can give a good, technical session and make those 75 minutes the attendee spends with you worthwhile. Consider getting into a speaker competition like Speaker Idol which is usually at conferences such as TechEd but I've also been seeing this in local areas as well (with local user groups).

Key point: Clear and Concise "Presentation" for every aspect of life...

So, in re-reading Paul's post, I also read the comments and followed the links (which is what got me to Greg Low's 4 part series on presenting for large conferences - which is a really good series btw). In his presentation he mentioned a flight attendant... which, then made me surf/read a few things there. And, somehow I go sidetracked and ended up:

here: Galley Gossip: Flight Attendant Pet Peeve #1: Answer please!
and here: A woman missed her flight at the boarding gate HKIA
and even here: An open letter to unsatisfied users on the newsgroups

And, the combination made me think of a few things that really can go back to presenting but also just in general about comments, criticisms, and everything in life. If you want something then you need to ask for it precisely. If you want to get somewhere or do something or improve your position (in life, in work, whatever), then you need to be specific, concise, and clear. It's all about the presentation. Paul says this occassionally to the girls: Stop, Think, Speak. It's somewhat frustrating when he says this to me but it's usually after I've had a couple of glasses of wine ;) but, I think it's really interesting in the context of user groups, forums, presenting, life... You need to *present* something clearly and concisely for people to understand.

If you're posting a question on a forum and/or newsgroup - take time to research it first. Take time to present it clearly (what version, what's the EXACT syntax (can you provide a repro script?), what exactly are you doing, what is it doing that's wrong, what do you think it should be doing?). The better your question (i.e. the presentation), the easier it is to get people to understand. And, in the case of a forum/newsgroup, the easier it is to get a [useful] reply. Remember Tom Cruise/Cuba Gooding, Jr. in Jerry Maquire - "Help me, help you."

If you're presenting a session - present the problem (or idea), present the solution (code/demo/etc) and summarize why it works. Clear, concise and then follow through. And, this also reminds me of the most important presentation skills that everyone says but that I'm going to repeat one more time... your demo should WORK. Keep is simple. Straightforward (concise) and simple demos are really the most effective!

So, on that note, I'll remind you to look at one of the links I already posted above. This woman is very clearly stating the problem and reasonably coming up with a solution (to her missed flight): A woman missed her flight at the boarding gate HKIA. Yep, that's going to get someone to help you. Hmm. Clearly, she did not see Jerry Maguire.

Thanks for reading!
kt

Categories:
Opinions | Presenting | Conferences

I know where I go but where the heck does the time go (it seems to move much faster than I)? First off, I hope everyone had a truly lovely holiday season. I know they've long since past for everyone but maybe this reminder will bring back a nice memory from the holidays as it does for me. Paul and I escaped the world of SQL and focused on finding frogfish. Unfortunately, it just wasn't the right time of year to find them though. Oh well - we did see some very coolstuff including a batfish (the odd-shaped-bottom-dweller kind) and, we also read a ton of books (a completely strage combination of books - list below) and we relaxed.  It was really quite lovely.

Once the holidays were over, we were back into the fire... Paul worked from home (mostly writing) and did a lot of work from there - including blogging (as always) and I was back in the classroom. We were only home for 10 days and then we were off again. Right now, we're both on the road - in Hyderabad, India this week (and last) and Bangkok, Thailand next week - teaching some classes on performance tuning, maintenance and high availability. I thought I'd do a quick post to remind you that I do exist (yes, I know - I really do *want* to blog more but it's not easy for me... first of all, before I publish a blog post I probably read it 50 times and that's after I even get the courage to write one... blogging is just different for me than it is for Paul... for that, I'm sorry. I'm *really* going to try... no, really. I will try to get a really good post out there as often as possible but for right now, I'm not going to give a number or a goal - just know that it drives me nuts that I don't blog all that often.)

In the interim, interviews/podcasts - for some reason - are a lot easier for me. Paul and I did a few interviews with TechNet in December and these are being made available for download as they're edited for publishing. The first one is available now and it's titled: New Features in Partitioning and Indexing for SQL Server 2008. Here are the download links:

OK, that's it for me for today. But, I do have a couple of other posts brewing. I will work to get these out as soon as possible as I have a small break from teaching, writing, and conferences (yes, I've been writing too!). Oh, speaking of which, we've *all* finished the SQL Server 2008 Internals work (Kalen doing the large majority) and it's getting really close! You can pre-order it on Amazon here: http://www.amazon.com/Microsoft%C2%AE-SQL-Server%C2%AE-2008-Internals/dp/0735626243/ref=sr_1_1?ie=UTF8&s=books&qid=1233824791&sr=1-1.

Thanks for reading - I'll be back soon!
Kimberly

As for books, in no particular order:  

  • Robin Maxwell's Mademoiselle Boleyn
  • Josh Grogan's Marley & Me
  • Sheridan Hay's The Secret of Lost Things
  • Tucker Max's I hope They Serve Beer in Hell
  • J.K. Rowling's Harry Potter books 2, 3, 4, 5 (these were re-reads - always a safe bet to bring with me and a fun re-read as Kiera (7) goes through the series now too) and where is the 6th movie? Ugh. ;)
  • Tom Philbin's The Killer Book of Serial Killers: Incredible Stories, Facts and Trivia from the World of Serial Killers
  • Stephenie Meyer's Twilight (last, but not least!)

OK, I told you it was a very wide-ranging book list; a few were purchased at the airport on the way. I've since read the entire Twilight series by Stephenie Meyer: New Moon, Eclipse, and Breaking Dawn (and I have to admit that I loved the series (especially Breaking Dawn))... I really hope she completes Midnight Sun; I absolutely loved Twilight from Edward's perspective (maybe even more than from Bella's). And, if she continues writing in this world - I really hope she writes as Renesmee. The Twilight movie was not spectacular - defnitely read the book before going to the movie - but, I loved the casting of Bella and Edward. And, I'm hopeful that the next movie will be *a lot* better. Next on my reading list, I'm planning to read many Robin Maxwell books starting with The Secret Diary of Anne Boleyn. In the end, the holidays were really relaxing.

Oh, and here's a great shot from India... At least he has a helmet??

OK, in my quest to blog more often, I realized I have the perfect thing to blog about today... stuff I did that I haven't yet told you about. :) :)

At TechEd US, Paul and I were on a panel created to debunk some of the more common questions/concerns when setting up High Availability. It's an interesting panel and I think there are some particularly interesting moments based on the strange turns the discussions take... but, overall, it's expecially good for people seeking a 100-200 level overview of HA (Clustering, Database Mirroring, Replication, Log Shipping) with a few other related topics (Database Snapshots and Backup/Restore). Here's the direct link to the recording: http://mfile.akamai.com/14853/wmv/microsofttec.download.akamai.com/14853/TechEdOnline/Videos/08_NA_ITP_TEOPanel_57_low.asx. And, here's a general link to everything TechNet so that you can check out TechNet Radio, TechTalks, etc. there's some really great stuff out there: http://technet.microsoft.com/en-us/default.aspx. Have fun!

Also at TechEd US, I hung out for a bit in the bloggers lounge...when it was just about to end, it turned out there was a "Women in Tech" panel starting - in which I was then asked to join in. I have to admit that I'm sometimes reluctant to get involved in these because my general (and this is *very* general) opinion is that it doesn't matter if you're a woman or a man (wait - wait - let me explain!!). What I mean by this is that the key to ANY job is that you have to enjoy it. You need to find something in it that's exciting. If you can do that - then you'll be good at your job... plain and simple. If you don't like it (regardless of whether you're a man or a woman) then it's never going to be satisfying and/or you're never going to get ahead. Having said all of this, I do have to admit that I find it especially exciting when I do find other women that can geek out and really enjoy doing this whole IT thing (like MLB, Kate Gregory, Stacia Misner, Julia Lerman, Kalen Delaney, Eileen Brown). So, I don't really intend on actively trying to get more women into technology but the fun part about this panel was the fact that a lot of us talked about how children these days are already way more into IT - without realizing it (our girls asked what film was recently because we were scanning a lot of negatives... and, can you imagine any of our lives without cell phones??). And, my hope, is that if people find ways of getting more into IT (and computers/digital stuff - in general) at an earlier and earlier age... then maybe more kids (girls as well) will find a passion in IT. Here's a link to the panel from TechEd: http://mfile.akamai.com/14853/wmv/microsofttec.download.akamai.com/14853/TechEdOnline/Videos/08_NA_ITP_TEOPanel_67_low.asx.

And, here are a few resources regarding Women In Technology:

And, I think that's it for now...

Thanks for reading,
kt

Categories:
Opinions | Personal | Resources | WIT

Have you ever written something and then lost it... for whatever reason: your own stupidity (come on we've all accidentally done something at some point where we lost data or a spreadsheet or a document or something...), the software eats it (this might be self-inflicted but I've been in apps that just hang and that's it... there's nothing you can do except power off), or....whatever.

Well, during those times... have you ever thought - I'd do almost anything to get that data (and time) back?

Before I go any further - yes, backups are good. No, let me be clearer... Backups are an absolute requirement of ANY environment!!! 

Even personal/home environments should have something to protect the data. Something to consider is even offsite copies. Burn a DVD or two of your precious family photos and then swap DVDs with a friend... there's your simple/cheap/offsite data recovery. We all focus on critical data when a business depends on it - what about the personal stuff... Paul and I always talk about database backups and this post is not really targeting enterprise systems or even business critical systems... Really, there's NOTHING THAT SHOULD EVER REPLACE A GOOD DISASTER RECOVERY STRATEGY...

Having said that though, I have an interesting story (with a data-saved/happy ending)!

I was on a plane flying back from SQL Connections (I was actually flying from Tampa after having visited family in St. Pete) and well, disaster struck again (yes, this has NOT been a good year for hard drives for me as this was my 3rd and MOST catastrophic disaster so far...).

Anyway, Paul and I were "discussing" what I STILL think is a bug in a SQL function (ok, I'll get to that in a later post) and I had been feverishly completing a multi-page blog post AND some index examples/metadata queries, etc. when my laptop slipped off of my crappy airplane table (I was in an exit row so I had a table that came out of an arm chair and folded - it was very wobbly) and crashed to the floor (and, the irony of being in an exit row... had I been in a regular seat there wouldn't have been room for the laptop to have fallen to the ground :).

To my surprise, I picked it up and still saw the screen I was working on. I typed in another line... and then got a little dialog that said "windows hard error" or something... I don't even remember now. The only thing I could do is power it off. My laptop was dead. Very dead. I [expect] that I had had a major head crash when my laptop hit the floor because it's likely the disks were still spinning. However, I still did not know this at the time.

I rebooted and received "Error 2100 - Hard drive initialization error"... so, when we landed in Denver, I was off to one of the small stores to buy an eyeglasses kit (I needed a screwdriver :) and then I went to find a spot to do some laptop surgery. The good news is that I've had so many disasters over the years that almost none really freak me out anymore (this is probably the 10th drive I've had personally fail) and, I always carry at least one spare laptop... But after the terrible time I had March in India, I actually had 2 spare laptops on this trip (yes, airline security hates me even more now - oh, and Paul travels with 2... so, we go through security with 5 laptops... we get some interesting looks!).

Anyway, I took the take the drive out, loaded it into my secondary drive bay (if you travel a lot - having this second bay that can hold a primary/secondary drive is INVALUABLE as these secondary drive bay usually use the same setup (Serial ATA in my case) as the boot drive). So, if your boot drive doesn't boot, you *might* be able to still read and/or save data by using the secondary drive. So, on a second machine, I gave it a try to see if it would spin. No dice. I even tried my third machine (my primary was XP, my secondary is XP and my backup/backup is Vista... I thought... well, maybe?) Ah, I thought wrong. My drive would never spin again. So, on my flight from Denver to Seattle, I was not overly pleased (this is an understatement to be honest - just ask Paul) because I was at least a couple of days from having done a backup AND I was even more frustrated about having lost the detailed indexing post I was working on - and even losing the code that was on screen just THAT day.

We finally got home (which seemed like a much-longer-than-normal trip ;) and I got online thinking/hoping - is there ANYTHING I can do... And, I stumbled on a reference to a possibly out of date BIOS version and the needed update which also happens to generate this SAME error. And, being hopeful (and opportunistically forgetting the drop/crash/thump which led me to this problem), I *attempted* the BIOS update and well, it didn't recognize that a drive was attached. OK, that was my last hope. Hours lost. Let's move over to my backup laptop and shift everything I do have backed up over........ which I did and I was up and running that morning. Yes, I had lost a few things and yes I was pretty frustrated but, I wasn't totally down. It wasn't as bad as it could have been without any backups...... but, I was still annoyed.

Then, I thought... is there any other option(s)? It's been at least 10 years since I sent a drive into a drive recovery place. So, I thought this needed some research. I wanted to see what it might take to (and/or IF I could) recover data. About 10 years ago, I had a single drive of a RAID 0 array fail and the disk recovery place couldn't recover any of it (well, I think they could get 128kb out it and it was going to be 800 bucks). But, that was 10 years agao. Have things changed??! Hmmm... what could they do?

Anyway, I got in touch with Drive Solutions, Inc. and they gave me the rundown of what it would take to get data back. I wasn't sure if it was really going to be worth it (especially for the costs) but I still wanted to go through the process (for multiple reasons - some of it was for the data but some of it was for this post - and to remind people of what's possible (myself included)) given that we talk about the importance of backups and the UNLIKELY potential for data recovery off of damaged disks...

The long story short (ah, too late :) is that they can do amazing things these days (NO GUARANTEES THOUGH!!!) and they can completely rebuild the drives in a clean room - replacing drive heads, etc. Once complete, IF they get anything back, they'll give you a directory structure of what they've recovered and different options for getting it back to you (DVDs of just critical directories (there's a cost for each DVD after the first one or two) OR you can purchase a new comparable drive and they'll copy it over). The whole process took about a week (and this was for expedited service - which was also an extra charge). However, and amazingly, they recovered everything (well, I've only done a bit of spelunking but so far, so good)...

And, here's the coolest part, I was working on some SQL files at the exact time of the disaster and well, I went to the \Documents and Settings\username\My Documents\SQL Server Management Studio\Backup Files\ directory and found a directory of Solution1. In it were 3 files with similarly ugly names (~AutoRecover.~vsC.sql, ~AutoRecover.~vsC.sql~RF93d469.TMP, etc...) and the AutoRecover file was 2KB (the others were 0KB). Anyway, I opened up the file and viola! I actually recovered the .sql script I was working on at the time I dropped my laptop (well, I still blame this on Paul cause he made me lean over to talk to him and this ultimately pushed my laptop off the crappy airline table ;) ;) ;) :).

Needless to say, I am pretty amazed at what they can do now... but, I'm certainly not going to rely on that AND, it wasn't cheap!!!

So... what did I learn??!

1) First and foremost, data recovery is NEVER GUARANTEED. (yes, ok, we all knew that. However, I think we just need to say it out loud a few times :).
2) Even if they can recover some of the data, data recovery is not lightning fast. Even with the more expensive expedited service it takes time to ship (overnight), get the drive into a clean room, rebuild it from parts of an exactly matching (including BIOS/firmware) drive, test it, copy it to another drive, ship it back (overnight). Maybe you can find a place that's local, that would help but, it's still time...
3) It's expensive... expect about $10-15 per GB. And, I guess that some will think that's a crazy amount... Again, I had multiple motivating factors - one of which was also related to some pictures I had taken over the weekend with family in FL, some was for data, some was even better to understand this overall process... I expect that got about 10 hours back plus some photos and, I'm pretty impressed with the overall process (in general).

More than anything, I'm going to get even better at daily full system backups when I'm on the road (scheduled/overnight to an external drive) and I might even copy critical stuff to something like Windows Live Skydrive (or something like that). And, while on a plane, I might keep a small/simple USB stick handy if I do something that I really don't want to lose while on a long plane ride. I think new technologies like "mesh" and "cloud" are really interesting and definitely the direction to better performance AND *possibly* minimizing data loss but you're always at risk if there's only one copy. All of this might seem crazy but well...... I've been called worse ;-)).

So, just to wrap things up, I'll be doing a bit of final tweaking on my indexing blog post + my indexing demo scripts from Connections + my metadata script that I was working on at the time of the laptop disaster (which is also why it's been a while since my last post) and I'll be leveraging some of my favorite tools (Beyond Compare) to determine all of the differences between my recovered data/directories and the system I rebuilt from a backup (especially now that it has also changed over the course of this week since I moved over to my backup laptop). And, I've now ordered a new harddrive for my primary laptop. Sadly, I'm getting good at laptop rebuilding.

Thanks for reading,
kt

PS - When did you last backup your home/personal/less-critical system... is it really less-critical?

Categories:
Opinions | Personal | Resources

I had a customer ask me about the debate of skipping SQL Server 2005 entirely and just moving straight to SQL Server 2008... I haven't thought much about this as most of my direct customers have moved already BUT, I know there are a few out there that have not yet moved.

So, I definitely need to spend some time on SQL Server 2008 upgrade requirements, restrictions, etc. - and that's my plan in the not-too-distant-future BUT, I haven't thought much about it so far.

Your thoughts???

Cheers,
kt

Categories:
Opinions | SQL Server 2008

While at SQL Server Connections in Orlando, Stephen Wynkoop of SSWUG stole some morning time for an interview (morning time is not my best but we did get a lovely "I got my mug on SSWUG tv" mug so that made it OK :) :). We (Paul and I) had a great time chatting about Diaster Recovery, Backup/Restore, general best practices and well - games (specifically - the VERY addictive game of Blokus). Here's the interview link: http://www.sswug.org/columnists/editorial.asp?id=1135.

Enjoy!
kt

PS - If any of you pick up (and become completely addicted to) Blokus, let us know! It's great for 2 to 4 players and extremely fun when a 5 year old "wild card" sits in and throws moves that you just can't understand (but later come to really frustrate you :) :).

Categories:
Events | Opinions | SQL Server 2005

Last week while Paul and I were in the UK delivering a one day seminar on Crucial Database Maintenance Techniques, we met David McMahon from the Next Generation User Group. They're doing some exciting things in the UK and even for the wider community - for example - podcasts. Paul and I were interviewed for one and it's ready for download here.

Enjoy!
kt

Categories:
Opinions | SQL Server 2005

This is a bit of a rare post for me as most of my posts are technical. I guess it's a bit of a change in attitude, as a few things (from my personal life) have come to light recently. (and, I thank Richard and Carl for this :) :).

Anyway, I'm sitting in Tokyo/Narita airport waiting to board my final flight home to Seattle. This is after a trip which has brought me over every longitude possible - in only 8 days. It sounds "glamorous" as many people always tell me "how lucky I am that I get to see the world" and while many trips allow me to see some amazing things AND I absolutely LOVE what I do... this trip really wasn't all that much fun. The long story short is that my Father became very ill where he was vacationing. It really didn't sound all that good when we heard (which was Wednesday - in London - where Paul and I had just completed a one-day workshop on Database Maintenance on Tuesday). Anyway, we thought that even if he were to make it, we knew he couldn't do the return travel alone. I was a bit of a basket case and in the short period of time we had to make a decision the only logical one was to get to him to help him get home (and for Paul to help me through it). Fortunately, for me, everyone (Miracle, Paul, Tony Rogerson, Tony's wife Alex, etc...) were extremely supportive and everyone just said they'd take care of what we left behind in our wake of crazy travels. And... well, that's where we are now (on our way home). We're amazingly happy to say that he's getting his strength back (slowly) and that we're making our way back home today (which is why we're in Narita). In the end, we (Paul and I) had to miss the OpenWorld event in Lalandia (to which we were truly looking forward) but I can't say that we regret the decision in any way.

Having said all of that, we (Miracle, Paul and I) are looking at options to give the OpenWorld attendees something special for our not having been there. We're looking at options that might include a special LiveMeeting presentation with an open Q&A as well as options for later in the year when both Paul and I are back in Europe (which is at least once or twice more this year).

So, while I'm not so fond of lots of [my] personal [life] posts (I certainly don't mind other people's posts :), we hope that you'll accept our sincerest apologies for missing the event. But - we have to admit - without Paul there, there must have been more beer.

Cheers! And, thanks for reading......
kt

Categories:
Events | Opinions

OK - I feel like I know a fair amount about SQL Server but sometimes I also feel like I don't :) :) I'm continuously amazed at how big a product SQL Server is... today was one of those days when I felt "I don't"!

I've been wanting to know more and more about the new features in the tools and the direction in which the tools are headed so... I setup a meeting with Paul Mestemaker (it helps that I live in Redmond and I'm working on some SP2 resources for the team :); it was a great meeting. Some exciting new features and some great new directions in which the tools are headed. I like the way they're thinking and I especially like the options that are now in place to discover, use and customize the "reports" feature within Management Studio (just to name one!). What I learned (that was the highest on my "I didn't know that" list) is that quite a few gems are released as part of the Feature Pack for SQL Server. The Feature Pack "is a collection of standalone install packages that provide additional value for SQL Server 2005. It has been updated for SP2." From that description, it doesn't leap out at me as exciting AND I often know about many of these tools through other channels - but usually it's just "a tool at a time". The thing that's nice about the FP page is that it seems to be a nice and central, single location for ALL of the "add-ons" for SQL Server. It includes things like the Upgrade Advisor (which I typically point people to individually on it's main page) and will include (once it ships) things like the BPA (SQL Server 2005 Best Practices Analyzer) BUT it also includes things like a standalone download for SQLCMD so that you don't need to install all of the tools if you just want this lightweight client for automation. Additionally, it includes the SQL Server 2000 DTS Designer Components if you want to edit/modify/maintain DTS packages in 2005 before you rewrite/convert them to SSIS.

So - the point is that there's lots of great stuff out there and sometimes it just takes another person, a blog entry, or a few minutes hitting a company site to see what's new. I'd strongly suggest that you and your team pick a morning - maybe once or twice a month (and round robin who brings the coffee/doughnuts :)) to just browse around and see if there's anything new on your hardware, software and other supporting sites - especially those that don't already have a blog/rss feed (or other form for notifications). No one needs to know everything but knowing where to look can really make all the difference when you do need something (or when you have a concern/problem). And browsing a few sites (occasionally) might make the difference in applying a hotfix/patch before something becomes a big problem. Staying current with hotfixes, service packs, bios updates, firmware updates, etc. is difficult so make it a team effort.

Speaking of service packs, here's the primary page for SQL Server 2005 SP2: http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/servicepacks/sp2.mspx
And don't forget the Books Online Update as it is NOT installed when you update an instance to SP2: http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx

Also, be sure to watch the SQL Server Manageability Blog (aka Paul Mestemaker's blog) moving forward as he'll have the first news about BPA and many other tips/tricks with regard to the tools.

Finally, (and this is great timing too), Paul Randal - prolific author at the SQL Server Storage Engine Blog - blogged about all of the active "SQL Server Product Team"-related blogs here.

And........ if that doesn't keep you busy, I'm not sure what will! :)

Instant Initialization technical details are next and then I'll get back to the Clustered Index Debate. Thanks for reading!
kt

Well, I've promised to blog more and I'm really going to try to do so. This morning I got the perfect question/comment (in email) to respond to and after working through a response that was taking me upwards of 3 hours (you'll learn later why I have 3 "spare" hours :)......... I figured that it was time to turn the response into a blog post. ;)

Background: The Clustered Index Debate
In the years since the storage engine was re-architected (SQL Server 7.0+) there's been constant debate on how to appropriately choose the clustered index for your tables. I've generally recommended an ever-increasing key to use as a clustered index and many find that counterintuitive. The primary reason people feel it's counterintuitive is that it creates a hotspot of activity. [If "hotspot" is not a familar term - a hotspot is solely an active place within your table.] Hotspots were something that we greatly tried to avoid PRIOR to SQL Server 7.0 because of page level locking (and this is where the term hot spot became a negative term). In fact, it doesn't have to be a negative term. However, since the storage engine was rearchitected/redesigned (in SQL Server 7.0) and now includes true row level locking, this motivation (to avoid hotspots) is no longer there. In fact (and probably even more counterintuitive), the opposite is true. Hotspots (specifically hot PAGES not hot ROWS) can be very beneficial because they; minimize the number of pages needed in cache, improve the likelihood of the required page already being in cache and in general, they minimize the overall amount of cache required. So, this is why many of us have changed our recommendation on where to create the clustering key in 7.0+. Instead of focusing on range queries we now focus on placing the clustering key on an ever-increasing key. In earlier releases, focusing on range queries for the clustered index reduced hotspots for insert/update and this in fact was the PRIMARY motivation to choose them, NOT range query performance! But - there are even MORE reasons to choose an ever-increasing key and they are based on internals as well. These internals are based on the significant changes made in the storage engine for 7.0+. For a quick start on these, I went through them in the Blog entry here.

And, today's email is not uncommon. This is the basis for the title clustered index debate. In general, there are still a lot of questions related to creating clustered indexes to improve "range query" performance. Don't get me wrong, there's definitely a benefit in performance for some range queries but the first thing to remember is that you get only one CL index per table (therefore only one type of range query can benefit). In the real world, t's not likely that you want to see your data exactly in the same way all the time. Therefore it's very challenging to come up with the "right clustered" index if you're using range queries as your strategy. Even worse, the affect of choosing the clustering key to improve range queries causes problems for modifications against that table (INSERTs/DELETEs and UPDATEs). So.............. this is what started my day today. A great email from a reader that brought up these points. The question/comment (modified to hit only the highlights and to protect their identity :) was this:

The most important characteristic for a Clustered Index key is to satisfy range queries. More often than not, if a sufficient range of data will be scanned, the Optimizer will choose the Clustered Index over all others due to the excessive cost of Bookmark Lookup operations. As such, the table KEY is a more suitable clustered index candidate than any surrogate (few every query a database by range of surrogate keys).  [kt note: this second sentence is not entirely true... SQL Server will certainly choose a clustered index over non-clustered that require table scans but there are A LOT of algorithms that SQL Server can use instead of either of these and my examples later show this... non-clustered covering seekable indexes, non-clustered scanable indexes, index-intersection, etc. ] 

Now, when the default behavior for SQL Server was designed such that the PRIMARY KEY was chosen as the default clustered index, it was exactly for this reason.  It is the business key.  It would satisfy uniqueness (by definition of logical KEY).  And, it is well suited for a wide variety of range scans.  However, this is when the PRIMARY KEY is defined on the Business Key of the data.But, when you introduce the usage of surrogate keys (i.e., IDENTITY) as a physical implementation, and thus transfer the PRIMARY KEY definition to it, two things must be considered.  First, the Business Key this IDENTITY will be a proxy for must still exist as it is still apart of the logical design.  As part of the physical design, the logical key needs to be implemented as a physical constraint to maintain logical uniqueness.  Second, just because a proxy has been defined does not make it a natural candidate for the clustered index.  The business key still maintains this distinction.What is often cited as the “reason” for IDENTITY PRIMARY KEY clustered index definitions is its monotonic nature, thus minimizing page splits.  However, I argue that this is the only “reason” for defining the clustered index as such, and is the poorest reason in the list.  Page Splits are managed by proper FILLFACTOR not increasing INSERTS.  Range Scans are the most important “reason” when evaluating clustered index key definitions and IDENTITies do not solve this problem.Moreover, although clustering the IDENTITY surrogate key will minimize page splits and logical fragmentation due to its monotonic nature, it will not reduce EXTENT FRAGMENTATION, which can cause just as problematic query performance as page splitting.

In short, the argument runs shallow.

Luckily, this email arrived with perfect timing for me as I'm sitting in a "bootcamp" event on Always On technologies and I'm not speaking this morning (my colleague Bob Beauchemin is doing lectures on Scale Out technologies: Scalable Shared Databases, Service Broker, DPVs, etc.). Anyway, in addition to listening to Bob, I've decided to continue the blog series on "the clustered index debate". The first and most important point to stress is that minimizing page splits is NOT the only reason nor is it the most important. In fact, the most important factors in choosing a clustered index are that it's unique, narrow and static (ever-increasing has other benefits to minimizing splits).

The Clustered Index Debate Continued
First, there are many angles to look at wrt to "the clustered index debate" and it's not until all of the issues are reviewed, that this strategy (a monotonically increasing key) becomes obvious. So, I think it will probably take a couple of blog posts to really prove this. I'll start up this debate again here...... When you look at a general purpose table (which is most) where the table has ALL DML (S/I/D/U) then you are best off with an ever-increasing key (again, you have to look at the overall impact of all operations against the table - not just select... because I/D/U will also impact select in the long term). So, I'll break this down into each DML operation here. If you don't look at the overall impact, then large tables can end up having a tremendous number of problems once they're put into production. I've certainly heard this concern/debate before (and most people are skeptical at first glance) but when you look at the situation overall, you'll find that "finding the right balance" includes not just looking at range queries. In fact, here's a quick list of the things/tests/numbers/scenarios that help to prove my strategy:

  • Inserts are faster in a clustered table (but only in the "right" clustered table) than compared to a heap. The primary problem here is that lookups in the IAM/PFS to determine the insert location in a heap are slower than in a clustered table (where insert location is known, defined by the clustered key). Inserts are faster when inserted into a table where order is defined (CL) and where that order is ever-increasing. I have some simple numbers but I'm thinking about creating a much larger/complex scenario and publishing those. Simple/quick tests on a laptop are not always as "exciting". But - this is a well documented issue (IAM/PFS lookups) and poor performance on a heap is also referenced in this KB: PRB: Poor Performance on a Heap. note: this KB is quite dated and I don't actually agree with everything in this article however, the general concern of poor performance for inserts is still true on SQL Server 2005.
  • Updates are often faster (when the row needs to be relocated) and for the same reason (IAM/PFS lookups) BUT there are many types of updates and not all updates cause records to be relocated. Here are a few things to think about wrt to updates:
    • Updates that are completely in-place (some examples are where the update is updating a fixed-width column OR to variable-width columns where the row size doesn't change, etc.). These types of updates don't really care.
    • Updates that cause record relocation (where the row size changes) are definitely better by having a clustering key because the record relocation (which will be handled by a split) is defined by the clustering key
    • Updates to the clustering key are the WORST (in this case) which is one of the key reasons for having a cl key that is static (so we have to keep this in mind when we choose a clustering key).
  • Deletes aren't nearly as big of a concern BUT deletes in heaps create more gaps and more gaps creates more work in PFS/IAM lookups and while this helps to reduce wasted space, it still requires the time to find the space........ hence the slowed performance of Inserts/Updates. I've also written some blog entries that cover very interesting test cases for large scale deletes and why you'd want to consider partitioning to optimize for the "sliding window scenario" in this blog entry: MSDN Webcast Q&A: Index Defrag Best Practices - Fragmentation, Deletes and the “Sliding Window” Scenario and it's the LAST one!.
  • Selects.............. now this is the hardest one to go through in just a couple of bullets (ah, I guess this will lead to another one or two posts :) BUT I'll start by saying that the best way to tune the vast majority of range queries is through non-clustered [covering] indexes. But, it's also important for me to stress that I do NOT advocate covering every query (it's impossible to do). What's important to realize in terms of covering is that SQL Server 7.0 and up continues to include internal algorithms to improve performance when you don't have the "perfect" non-clustered covering seekable index and instead still gives better performance than going to the base table (or performing bookmark lookups - as mentioned in the mail...and I completely agree that these [bookmark lookups] can be evil!). To start this discussion, I'll give one of my favorite examples of a large-scale aggregate. The absolute best way to improve the performance is through an indexed view but the data can be gathered through many other algorithms - ideally through a non-clustered covering index that is in order by the group by and that includes the column(s) being aggregated. For example, take this query:

SELECT c.member_no AS MemberNo,
 sum(c.charge_amt) AS TotalSales
FROM dbo.charge AS c
GROUP BY c.member_no

On a charge table of 1.6 million rows here are the performance numbers to handle this aggregation:

  • Clustered table scan (CL PK on Charge_no) with a hash aggregate = 2.813 seconds
  • Index scan (non-clustered covering but NOT in order of the group by) with a hash aggregate = 1.436 seconds
  • Index scan (non-clustered covering in order of the group by) with a hash aggregate = .966 seconds
  • Indexed view = .406 seconds

Now this was a pretty small table (narrow rows and only 1.6 million rows) AND I didn't have any concurrent activity. The concurrent activity would have caused this to be even slower for hash aggregates, etc. Regardless, it proves the point (at least generally). Now, if I wanted to improve this range query then I'd have to cluster on the member_no column (and this is an ideal example because I often hear people say that clustering on a foreign key column helps to improve range/join queries - which can be true as well)......... But - this strategy has a few problems in addition to a few benefits (and we have to look at everything to be sure of our choice/decision). First, member_no is not unique (in the charge table) so SQL Server has to "uniquify" the rows. The process of "uniquification" impacts both time (on insert) and space (the rows will be wider to store each duplicate row's uniqufier). Also, theoretically it could change (in this case that's not true). Anyway, the time it takes for the clustered index is 2.406 seconds which is better than the clustered on the PK (of course) but if I were to also start modifying the rows (which creates splits) or even just insert 15% more rows........ then my table would become fragmented. At that point, the query performance should get worse in the table clustered by member_no table and it will continue to get even worse in the table clustered by charge_no (because of the worktable created in tempdb by the hash aggregate) BUT it won't be all that much worse in the non-clustered index examples (especially the covering index that's in the order of the group by - because this doesn't require a worktable).........

  • CL on member_no = 4.906 seconds
  • CL on charge_no = 6.173 seconds
  • Index scan (non-clustered covering but NOT in order of the group by) with a hash aggregate = 3.906 seconds
  • Index scan (non-clustered covering in order of the group by) with a hash aggregate = 1.250 seconds
  • Indexed view = .516 seconds

This is a great start to furthering the clustered index debate but I do have to admit that it's a counterintuitive and difficult issue to tackle because often isolated tests lead you to different conclusions. In this case though, the non-clustered indexes are better for this range query and the indexed view is the best (but I wouldn't consider the Indexed unless this were more of a read focused database rather than read/write). [and - of course, that statement warrants yet another blog post :)]

So, depending on the tests that you do - especially if you focus only on selects and you don't have modifications (i.e. fragmentation) - then they will make "creating the clustered index for range queries" appear to be best. Again, I'm not just saying this to prevent fragmentation, I'm saying this because I wouldn't use the clustered index OR a non-clustered index with bookmark lookups to handle this query. I'd consider a non-clustered covering that's seekable OR even a non-clustered covering that's scanable before I'd even choose the clustered (and that's what the optimizer would prefer as well). In the end it's really a bit of an art and a science to "finding the right balance" of indexing.

Oh - and if you arbitrarily add a column to use for clustering (maybe not as the primary key) that can help but many would prefer to use actual data... which means [potentially] creating your primary key with a new identity [or similar] column and this can impact your business logic (absolutely). I'm certain that certain tests can show that range queries are faster and it's absolutely correct that business application/usage can be a concern but when you look at the big picture (and the impact on I/D/U) then the benefits of the monotonically increasing key significantly outweigh these concerns. Simply put, a small/narrow key can help join performance and an ever increasing key can also help lookups for rows! (yes, definitely more coming)

Happy Friday! Have a great weekend. I'll try to continue more threads on this debate shortly!
kt

Ok - a strange title indeed but it's been a strange couple of months. It all started with a "much needed" vacation and I thought that would get me through the persistent "cold" that I was having all through my travels. Anyway, vacation didn't help and I came back to find that what I had was a sinus infection. Janaury has been filled with antibiotics, sleep and well......still a lingering cough even though the month is over today. The long story short is that I'm starting to come out of it and I promise to start blogging a lot more frequently starting now. In fact, I have 3 or 4 entries in the queue that I'm plotting for upcoming posts.

To get you started - there are some great and NEW resources that were posted just this week by some of my SQLskills colleagues. If you read their blogs then you've probably already seen this but for completeness, I'm going to post them here:

Bob Beauchemin's Blog Entry: http://www.sqlskills.com/blogs/bobb/2007/01/30/TheFirstOfMyScaleoutWhitepapersIsAvailable.aspx
Bob's Whitepaper: Planning, Implementing, and Administering Scaleout Solutions with SQL Server 2005

Liz Vitt's Blog Entry: http://www.sqlskills.com/blogs/liz/2007/01/30/AnalysisServicesPerformanceTuningWhitepaperHasArrived.aspx  
Liz's Whitepaper: Analysis Services 2005 Performance Guide

Ok - there's my first post of 2007 and NOT my last. More to come. Thanks for reading and HAPPY NEW YEAR (at least I got that in January :),
Kimberly

OK, it's been a heck of a long time since I blogged... and for that I apologize. I'm also WAY overdue in my posting my demo scripts from a TON of conferences BUT... now everything has been posted. Check out the past events page on SQLskills and you can find the demo scripts that you're looking for......lots of fun stuff and TONS of scripts to play with and test.

Now - as for the reason(s).... many are business and for that I blame the following (yes, 17 flights [yes, one boarding pass is missing] over ONE 5 week trip with 7 events and 5 continent changes):

The other reason(s) are personal...suffice it to say that the last 6 months have been some life changing times for me and what I'm finding (or trying to find) is that ever important balance between work and life. During this holiday season (and always), I wish you and your loved ones well and I hope that you too can find (and cherish) what's most important to you.

So, you won't see anything else from me for this year but I do hope to be better (and more frequent) with blogging in the New Year and I especially hope to see you again at an upcoming conference. Let me leave you with the most exciting picture I've witnessed this year...it was during my one day of site seeing in Cape Town - where I went cage diving with Great White Sharks (and got horribly sea sick - which is rare for me) but where I was able to witness these amazing and powerful creatures....

Have a happy and safe holiday season!
kt

There are SO many new features in the updated DasBlog that I don't even know where to start. Here's just a quick list of the things I really love:

1) Scrolling through the entries - if you click a specific entry's title, you will get a "last blog entry" | main | "next blog entry" section above the entry's title and you can scroll through the blog entries. It's really a nice way of reading through a series of blog entries.

2) the SUPER cool complete archive by category - here's a link: http://www.sqlskills.com/blogs/kimberly/Archives.aspx

And - those are just a couple of my favorite new features with 1.9 but you can read more about it from the source - Scott Hanselman - here. Definitely worth the upgrade, definitely worth the consideration to move from some other blog engines (depending on how much you blog) OR consider creating your own blog (here's why)........

Enjoy,
kt

Categories:
Opinions | Resources

I am asked often.....

  • How can I get into presenting?
  • How do I get into external consulting?
  • How can I do something similar to what you do?

It's a challenge to work for yourself...

  • There are periods of feast and periods of famine
  • There are [way too many] times when the only person you can blame is yourself ;)
  • You need to do current work, work on getting future work and do the admin for past work (billing/receivables/paying subcontractors)
  • No one pays you when you are sick or when you want to vacation or when you just want to veg... and you really need to schedule all of that way in advance (which is challenging)

BUT - it's also very rewarding...

  • If you plan ahead - you can take time for yourself and you don't have to deal with a corporate limitation in vacation days
  • If the work is light you can take the morning to mess about and blog ;)
  • You feel really good when things go well

So...what's my point?

If you've been thinking that you'd like to expand your horizons and possibly get out and do something on your own, I'd *really* suggest NOT doing it without planning ahead. First, you should plan to have some cash reserves (the first 6-12 months are the hardest as you'll have a lot of setup costs (hardware, network, accounting and legal fees, etc.) AND it might take time to get customers and even more time to get paid). Second, you should already know from where your business will come. What I mean by this one is that you should start trying to figure out who might hire you before you take the leap. I'm not suggesting that you try to steal customers...not at all. You just want to think a lot about where you will get business before you have 0 income. And - that really leads me to the reason for this post. One of the best ways to get business is to be desirable. In this industry that means many things:

(1) Get really good with a technology that you like... which #2 will help you know if you are good or not AND whether or not you really do like it.

(2) Make a name for yourself - create a blog, write magazine articles, post on other people's blogs, answer newsgroup postings...start to learn more but also get more involved. Do a bit of research before posting and even test and/or create sample code. And present at conferences - this is a bit challenging BUT it can be done. Some conferences are community based (like SQLPASS) and they look for new and exciting presentations from the field... Others expect a lot more prior history and sometimes have a smaller number of speakers in general BUT, TechEd ITForum (week of Nov 13 in Barcelona) is doing something that's pretty interesting called "Speaker Idol" and this might be the break you need?

OK, so Speaker Idol is really the reason for why I've posted this blog entry. I wasn't really sure what the heck it was when I first bumped into it (accidentally actually - after someone asked me to verify something totally unrelated). Anyway, it's a competition for attendees of TechEd ITForum to submit a presentation that will be voted on as "best presentation" and the winner will get free attendance to next year's TechEd ITForum (and a VERY cool item to put on your blog, resume, etc.).

(3) Research the whole process of starting a business in your neck of the woods A LOT more before you actually do it!!! I truly wish you well if you do!

So, check out TechEd ITForum's Speaker Idol and have fun! It's certainly an interesting way to jump start the process if you've been thinking about this but just hadn't made the leap. GOOD LUCK!

Cheers,
kt

Categories:
Events | Opinions

Hey there everyone - Been a LONG time since I last blogged (sorry!)... key reason (fyi) is that I'm trying to find the ever-challenging work/life balance during the best months of the year (here in Seattle July/Aug are GREAT! months - September is almost always good too). Anyway, it's been a few weeks and I thought I'd catch you up... It all started with some travel (of course) and I was in Chicago for a SQLskills Immersion event and then off to London for another event with SQL Server FAQ (aka Tony Rogerson). I returned during the CRAZY travel restrictions and had to check two laptops (sigh) BUT they both made it back without damage after I purchased kitchen towels, bed sheets, a blanket and a duvet (and two new [cheap] suitcases) in which to pack them. When I got back, I relaxed! I've been up to see the Athabasca Glacier and Johnston Canyon and just this week I took off a day to do the "Lotus Experience" Advanced Driving Course at Pacific Raceways/ProFormance Racing School. I've always LOVED driving so driving 700 miles to Johnston Canyon didn't bother me at all (especially with great company and fantastic tunes - have you heard the latest Snow Patrol?) and the driving course in the Lotus was not only great fun but VERY informational. I've taken the Advanced Driving course before and I think EVERYONE should consider continuing their education in driving - things like Collision Avoidance, Advanced Braking (understanding proper driving under ABS), "high eyes" and so many other things...just make you safer on the road. OK, enough about all that fun/practical stuff.....let's get back to SQL.

Some great new resources are out:

And - it's now time to start enjoying Labor Day weekend, there ain't no labor going on here this weekend. Enjoy - and check out those links next week. ;-)

Cheers everyone,
kt

OK - so, lots of exciting things going on. I've just completed my first two day partner event in Zurich (first of two in Switzerland). Today I head over to Geneva by train so that I can deliver the second two day partner event there. The partner I've been working with in Switzerland is Trivadis and it's always been fun coming over here and working with them (this is my third trip to Switzerland since Mar last year and second with Trivadis)! Trivadis is home to one of my fellow RDs Meinrad Weiss and he's always great to chat with and very clever too! In fact, I have a blog post coming based on something we chatted about during class (it's a cool trick... stay tuned). But, regardless, one of the things that came up at the end of the day was the type of training that I did this week... (well, really the type of training that we do at SQLskills). The discussion ended up talking about the complete learning/training cycle and how do you choose from ALL of the different choices??! First, (if we're talking about SQL Server), it takes a layered approach and a variety of different techniques, technologies and even a lot of planning/practice/coordination among staff, etc. to keep a system running WELL. In the end, it takes a combination of things (and probably training) to make that happen. As a result and based on all of these choices, we talked about training in three primary buckets...

Conferences
Conferences are great when you want get a wide variety of topics covered. They're great when you want to network with large groups on a variety of levels. They're great when you want to see a variety of products and how they work and/or what they do. Conferences really are great when you want to go wide more than deep. Don't get me wrong, I like doing 300-400 level sessions at conferences (and there are a lot of other speakers that do as well) BUT, in only a 60-75-90 minute session, you can only go so deep. You can do two sessions or three sessions in a topic but sometimes that's hard to coordinate and to be honest, there's always so much going on that you tend to lose people that are getting distracted by other tempting sessions so I'm not all that fond of going beyond two for a conference. Having said that, you can certainly hit a lot of best practices and technical depth BUT I think a conference is BEST at giving you breadth and there's nothing wrong with that! In fact, some of the best conferences I've been to in a long time were TechEd in Boston and SQL Connections in Orlando... lots of great sessions and I got some great ideas of different architectures, etc. and I also learned about some great upcoming products, etc.

Training
Depending on the vendor there are lots of options in this bucket. Probably the most obvious is MOC (Microsoft Official Curriculum). MOC has been around for a long time... in fact, that's where it all started for me! Back in '90 I got my MCP in something (hmmm...DOS?) and then shortly thereafter got my MCP in OS/2, LAN Manager and SQL Server and in '91, I became an MCT (all of this while I was working for a small ATEC in Chicago called LAN\mind). Then, I went to work for Microsoft as a trainer for Microsoft University (the building 0 of Microsoft even though by that time Microsoft was a lot larger and had moved to it's main campus in Redmond). It was a GREAT place to work. Both of these training centers (LAN\mind and then Microsoft) were a BLAST to work at and I learned an amazing amount with my team members. It seems like all of the trainers had great fun working together...one of my best friends actually rollerbladed through my class once (those were the good old days!). My point, is that this kind of training is a lot more focused... it allows you to focus for 3-4-5 days on one area of the product and it gives you some hands-on. This can be exactly what you need. The only problem is that you're somewhat at the mercy of the instructor (well, this is always true!) and you have to hope you get someone who has really worked with the product and/or customers and really knows depth (I’ve heard EVERYTHING in terms of comments – both good and bad about MOC and most of the time it’s about the instructor). To add to this, there are some companies that “host” MOC training but don’t even really have instructors on site (which is also good and bad). In fact, I’ve received a few emails that are clearly being blasted at a ton of people (I guess MCTs) and the mails are saying things like “Is anyone available NEXT week to teach course XYX” and that just completely blows me away. Don’t get me wrong I used to jump in when possible and say “yes” I’m available and I have to think that it’s still a good class BUT… it just seems like the RANGE of quality is the most widely varying here and that’s the part that can be disappointing. With regard to the courses, these are generally good. However, I do think that MOST of the courses try to cover too many topics. Basically, each class tends to focus on a discipline - like DBA or Dev or BI and while that can be good...sometimes you need more AND sometimes certain topics really hit at least two and possibly all three of those areas. That's where I *think* (well, I hope!), that I (we) come in with "Immersion" events.

Immersion Events
Other companies have things which are similar but my thoughts on Immersion Events are that some areas just need time and incredible depth to really understand. I always make fun of this old Chinese proverb but it's so true: give a man a fish and he eats for a day, teach a man to fish and he eats for a lifetime. The point is that anyone can tell you a command and how to implement something... but if I can really tell you why something works the way that it does and show you how you can see it and troubleshoot it, then you're going to be able to use that and leverage that when other things go right/wrong. I believe heavily in internals and the lower levels of how pages, extents, files, filegroups, data v. log, locking... if you start to get a feel for how all of these things are interrelated, you can make better architectural decisions AND you’ll know what the other impacts are of a variety of decisions. For example, transaction design (esp. long running transactions) impacts the types and lengths of time that locks are held (ok, that’s a “well duh”) but poorly written transactions not only cause blocking (which in turn, affects performance) but can also impact the size of your transaction log and whether or not the log can be cleared. This can end up not only causing performance problems but can impact recovery and even some of your secondary systems (or cluster) down the road in terms of restart recovery (because a long UNDO can take time and in SQL Server 2000 and SQL Server 2005 (in 2005 it’s ONLY the non-Enterprise Engine editions), users cannot access a database until UNDO has completed. (sidenote: the SQL Server 2005 Enterprise Engine editions (Developer, Enterprise Eval and Enterprise) have Fast Restart which allows users to access the database after REDO has completed and while UNDO is processing. The good news is that data integrity is maintained because the records being processed by UNDO are locked and inaccessible (unless lock hints are used).) Finally, if you’re waitinf for UNDO, this can lead to the perception of potentially A LOT of downtime. And – all of this could be because of a poorly written application. So, my point is that knowing the impacts of something from end to end and really diving into how things work – that’s what some classes do. That’s what Immersion Events do. We take one area and we really try to focus on a subset of features (new features/critical features) and then we just try to go deep. We’ll often hit on best practices in other areas and other options but we really try to go deep in the more critical areas. The other thing that we do with Immersion Events is that we try to extend the courses with other types of learning – group troubleshooting, guest speakers, hands-on, *interesting* demos and then we end the day with BYOP (“Bring Your Own Problem”). BYOP sessions are unstructured sessions that typically run from 4:30 - 6pm after an 8:30am-4:30pm course day. BYOP is designed so that you can apply what you’ve learned during the class day to your own systems and challenges. If it’s a tuning class, we strongly urge you to bring your laptops and external drives – loaded with database backups, sample applications/workloads and Profiler Traces – so that you have real world data/problems for real world (and specifically YOUR WORLD
J) tuning scenarios. If it’s an architectural class (like Bob’s classes in SODA), then we want you to come with your ideas and current architectures so that we can see if other features and/or alternatives could provide better x, y, z (insert performance, flexibility, control, etc…). The BYOP sessions begin with group discussions – focusing on similar systems and/or challenges and then the groups will move into deep troubleshooting discussion groups from there. Basically, the aim of an Immersion Event is to have an intense and focused event where not only do you come back with a few new/critical features to leverage but you may have even solved some of your most frustrating challenges.

OK, this blog entry really wasn’t planned to be this long but it’s a discussion that I got into last night that started to motivate this. At the end of my partner event at Trivadis (not quite a full blown Immersion Event but *very* close – certainly in terms of technical depth but just not the really long days and BYOP), they were really pleased that we had just spent two full days on nothing but Indexing (Internals to Statistics to Advanced Indexing Strategies to Maintenance). They told me that they felt like they were ready to tackle a few problems and a couple folks said that they already knew what they were going to do to fix a thing or two. One person came to me telling me that they really didn’t think that I could tell them WHY one of their more complex problems “worked” the way that it did but after the module on statistics he said he knew exactly what was going on. And – I love when I get mail from folks after class… I just chatted with someone from my last class who told me that they had a query that was taking 15 mins (over a 2.8 billion row table) and now (after some of my Advanced Indexing Strategies) the query is taking 9 seconds. Can’t complain about that, eh? J

Anyway, learning can be fun AND there are lots of ways to learn. I think the best way to a better system is to combine all of the above (and don’t forget webcasts, blogs, whitepapers, Hands-On Labs, etc.). The basic point is that you should attend some conferences to get a wider variety of topics and architectural options, attend some training to get started with a variety of technologies and then attend Immersion Events to really get the depth and focus to solve some of your more complex challenges. And – you can certainly substitute webcasts, blogs, whitepapers and a few other options for some of these training options but sometimes you really need to get away and “Immerse Yourself” in the technology!

And – OK, I have another reason for blogging on this topic today… we’ve (SQLskills) just announced our new Immersion Event lineup for summer/fall 2006. We’re hitting three cities in the US (Chicago[land], Redmond, and New York City) and we’re targeting some intense classes around Performance Tuning, Service Oriented Database Architectures, Tuning BI (IS/AS/RS) Environments AND Always On Technologies. They’re going to be great fun, in-depth, hands-on courses with long days so that we can all really focus and “get it done”. And we’re still adding some guest speakers (Gert (aka Data Dude) is joining us in New York and it looks like the famous “SQL Apprentice” will join us in Chicago for some BYOP on SQL – details to be posted shortly). Check out the complete lineup of events here.

Let me know your thoughts on training in general and/or in specific. Really, I think all types are good and all have their time/place. I just happen to like the deep “Immersion” ones more than anything!

Hope to see you there!
kt

Categories:
Events | Opinions

So, I've now spent the last couple of hours playing with Database Mail and HTML formatted messages being sent to the SQLskills subscribers. It's been a fun learning experience as I think I've found a bug with the email account name length...let's just put it this way - don't be too descriptive with your account names.

Outside of that - it's amazing how well queue based email works. The old mapi based mail would take a LONG time to complete the batch mail processing but now - with queue based mail it's done in seconds.

The best part is that I'm on a flight over the Atlantic right now...on my way to Frankfurt. I think this is the MOST productive flight I've ever had!

Have a great weekend,
kt

Categories:
Events | Opinions | SQL Server 2005

What a GREAT week here in Boston. I'm just about to pack up and go home (kind of). Right now it's early and I think I'm still feeling really good from last night's fun. The big party at Fenway, dinner on the town, walking around the city, etc... It was a beautiful spring (almost summer) night last night and the stars were out. We couldn't have asked for a better day (weather-wise) to end the week. There's still one day left at TechEd but this is when a few of us start heading out. I'm going to Chicago for the night to hang out with my Father (don't forget it's Father's Day this weekend) and then on Sat I head back to Seattle. I want to get back fairly soon as I'm only home for a bit under a week before I head to Switzerland to deliver a couple of master classes with my partner Trivadis.

But - part of why I'm writing this is that this was one of the best weeks I can remember at TechEd. Everyone seemed to be enjoying themselves and for once (in a long time) I really liked the layout of the venue. Eventhough TechEd is a MONSTER conference (12K people), the venue was easy to navigate and (most of) the sessions never seemed too full. Even the Expo hall didn't feel overly crowded. Don't get me wrong - there were a lot of people in there a few times BUT, it always seemed manageable. And - even when sessions did fill, there were overflow rooms setup and capacity seemed to be managed well. So, I do feel like I walked quite a bit this year but it felt reasonable (compared to Orlando last year, this was a dream).

And - the WAY more fun part of TechEd (for me) is meeting people. I've meet quite a few customers - many of you - who read my blog and/or have attended some of my webcasts and it's GREAT to put faces with the names.

So, that was my week and now it's done (and if you didn't make it - check out Virtual TechEd). In my next few posts, I'll try to find some time to chat a lot more about the technology that I saw and the technology that I demoed. In fact, all of my demo scripts and sample code will be "tweaked" and posted over the weekend and then I have to move on to planning some of our (SQLskills) Immersion Events... At least I don't have to plan an event the size of TechEd! Our events are 5-days, hands-on, and intense technical events where we dive deep into one technology (and you can count the number of people with 2 digits instead of 5). In fact, I think that's the combination I like. Go to conferences for depth in small chunks and over a wide variety of topics and technologies (and futures too, so that's always fun/helpful). Go to training/"immersion events" for deep, technically-focused, hands-on courses to really become fluent in a specific area. I'll post all of the final details when we're all set to open for registrations (next week). But, if you think you might want to attend - subscribe on SQLskills and you'll get an early mail to let you know the details. Subscribe here!

Have a great weekend and HAPPY FATHER'S DAY!
kt

Categories:
Events | Opinions

OK - today's session was quite fun... lots of demos and quite a few "tie-ins" where I tried to bring together many things that we've touched on in our series. And - that's really the point of the series - creating a reliable, robust, scalable and available environment takes MANY different features. You really need to architect a complete solution in order to handle the many potential problems that may occur. And, unfortunately, it's a never ending process; you're never done and you're never going to get everything (sorry!). You will need to re-evaluate, monitor, and manage your system as long as it runs to keep it reliable, available and fast. Something will come up...someday...that you didn't think about, evaluate and/or prevent. But, then you'll know and then you'll put something into place to keep it from happening again.

So - to tie back into some of the other sessions and resources, here is a list of everything to date in the series as well as a few specific references I made during the session.

Demo Scripts are here: 20060512_TechNetWebcast-Part10.zip (25.46 KB) (updated on Sat, May 13 at 2:55 PDT)
Credit Database zip is here. NOTE: This is a 48MB zip which expands to a 175MB backup and restores to a 700 MB database (with a lot of free space for testing, etc.).

TechNet Webcast Series

Session 1: A Fast-Paced Feature Overview and Series Introduction (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 2: Security (Level 200)
   Presenter: Bob Beauchemin, SQLskills.com, 
   Session's corresponding blog entry, here

Session 3: Understanding Installation Options and Initial Configuration (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 4: Upgrade Considerations and Migration Paths (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 5: Effective Use of the New Management Tools (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 6: New Application Design Patterns for Scalability and Availability and the Operational Implications of Service Broker (Level 200)   
   Presenter: Bob Beauchemin, SQLskills.com, 
   Session's corresponding blog entry, here

Session 7: Technologies and Features to Improve Availability (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 8: Implementing Database Mirroring, Part 1 of 2 (Level 200)
   Presenter: Mark Wistrom, SQL Server Program Manager - Microsoft Corp., 
   Session's corresponding blog entry, here

Session 9: Implementing Database Mirroring, Part 2 of 2 (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here.

Session 10: Recovering from Isolated Disasters and Human Error (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   You're reading it! :-)

Recovery Models and Backup/Restore

  • MSDN Webcast Parts I and II cover Recovery Models and some issues/best practices related to changing recovery models. Check out the blog entry here which has links to the sessions and their associated blog entries.
  • MSPress Title: SQL Server 2000 High Availability, Chapter 9: Database Environment Basics for Recovery is here. The MSPress page for this title is here.
  • SQL Server Magazine Article on Isolated Disasters and Recovery (using RESTORE with STANDBY/STOPAT to investigate when a database became damaged) is here. Check out a consolidated list of all of my SQL Server Magazine Articles here and SQL Server Magazine here.

Table and Index Partitioning

RAID 0+1 and RAID 1+0

There was a question that came up on this and the question basically asked - which is better. Well, this is a hard question to answer because they both have pro's and con's BUT before I get to the pro's/con's there's also another [more important] issue; these two get confused and swapped all the time. In fact, many vendors USED to refer to these interchangeably and even just lumped them together as RAID 10. Today, most people don't do this and most people also try to refer to the underlying technlogy instead of the numbers. Having said all of that, RAID 1+0 is Striped Mirrors and is my general recommendation because it tends to be more reliable than 0+1 and can tolerate more drive failures than 0+1. RAID 0+1 is Mirrored Stripes - which generally outperforms RAID 1+0 but cannot tolerate the loss of more than one drive and because of that it's more vulnerable. In the end, I'd suggest a simple "educational" site here (it's on a commercial site but it has a nice - and short - description of the different types of RAID arrays).

See you next week - for our LAST part in this series - Part 11: Best Practices in Building Robust, Recoverable, and Reliable Systems (Level 200).

Thanks for reading, listening and continuing to ask great questions!
kt

I completely spaced in blogging about a recent interview I did...poolside, in Orlando, FL when I was at SQL Connections back in April. I had the pleasure of meeting Chuck Boyce (a DBA from Philly, PA) who feverishly works in his spare time to spread the word about technology and specifically about all things SQL. His blog is here and he does a great job of summarizing good links and useful resources - almost daily (just so you don't have to!) on his "WHERE Clause" resource blog posts. You should check that out while you have your morning coffee. A great way to quickly find some useful stuff.

Additionally, Chuck has a radio program (What's Happening in SQL Server) that he does for SSWUG (Steven Wynkoop's excellent SQL Server Worldwide User Group). The entire list of SSWUG Broadcasts are here and specifically, the chat that we did poolside is here.

So - sorry that took me so long to remember... I wish I could blame it on too much sun (and/or drinks) poolside but......... sadly, I can't.

See you Friday on our 10th part of our TechNet Webcast series. Wow, we're on the home stretch!

Thanks,
kt

First - for what is logging needed?

This seems like an easy question - with possibly an easy answer... it's to aid in transaction durability and help in recovery - when the system loses power. Simply put, the transaction log is a way for SQL Server to ensure that a transaction "survives" a power failure. While a transaction is processing, information about that transaction exists within memory. Once that transaction is complete, log rows are written to the log portion of the database on disk. In the event of a power failure - and when SQL Server restarts, SQL Server performs restart recovery (two phases - REDO and then UNDO). Restart recovery happens everytime SQL Server starts; this ensures that completed transactions are persisted into the data portion and that no incomplete transactions end up within the database. For this discussion the specifics about log rows are not important - just that they are enough to "redo" the operations from *just* log information...in the event of a power failure. The information that's needed to recover the log in the event of a power failure - is really just - what's on disk. The data portion is probably out of date (how much so?) and the information kept in the log is used to bring the data up to date. A good question at this point is - how out of date is my data? The answer depends on a background process that runs almost solely to minimize this restart recovery process; it is called CHECKPOINT. A checkpoint occurs to make the data and log more current (but not neccesarily transactionally consistent). What this means is that periodically what is in memory is "synchronized" to disk. Since users do NOT directly read from disk, the data portion of the database (on disk) does not need to be up to the minute. Users accessing data ONLY read from cache - which is current - so only the data in memory needs to be accurate. It is VERY possible that at any given time that not only is your disk out of date but it's not even transactionally consistent. This is NOT a problem. If memory were to be lost (i.e. a power failure) then SQL Server would perform recovery on restart. Restart Recovery runs everytime SQL Server starts. In fact, if you think ONLY about restart recovery needing to bring a database "forward" after a power failure then you could argue that SQL Server would not need information to stay within the log after it's been "synchronized" with the data portion of the database - as long as the transaction(s) had been completed. And - YES - that's true. You can choose to clear the information from the log by changing your recovery model. Where you might have a problem is when you have a more significant failure - such as the loss of a hard drive (and even more interesting - which hard drive: a data drive or a log drive).

Key points:

  • The Log is a "write-ahead" log
  • The data on disk is NOT guaranteed to be accurate without the transaction information in the log
  • The Transaction Log (on disk) ensures transaction durability
  • Restart Recovery happens everytime SQL Server starts

OK - so that's it for now... In the next blog entry, I'll tackle "what affects logging."

Thanks for reading,
kt

Categories:
Opinions | SQL Server 2005

Hey there everyone! I know I still owe you a few Q&A entries (for sessions 7, 8 and 9) but I wanted to get this blog entry out there so that you can play a bit with some of the resources. This series was targeted at developers but really helps to "bridge the gap" between development and administration by always remembering the impacts of what you implement (and techniques to help you to see if you do). More specifically, everything you do and/or design, has the potential for a negative impact to something else - there's no free lunch, eh?

So, this series focused more on Scalability but always remembered the impact to availability and/or reliability. This last session brought together the three primary areas to remember while developing a scalable and reliable architecture:

  1. Know your data
    • Design for Performance - Sessions 1, 2, 3, 6, and 8
  2. Know your users
    • Indexing for Performance - Sessions 4, 5 and 9
    • Optimizing Procedural Code - Session 7
    • Controlling Mixed Workloads and Concurrency - Session 6
  3. Users lie
    • Profile - to make sure that you're tuning what's really happening as opposed to what you think was going to happen! - Session 9

This last session had some great questions and as a result, a few new resources were used. Here are a few of the things we talked about:

Event Notifications and DDL Triggers

DMVs

Webcast links for the entire series!

Part 1: Creating a Recoverable Database
For the MSDN Download for Part 1, click here.
For the SQLskills Blog Entry for Part 1, click here.

Part 2: Creating a Reliable and Automated Backup Strategy
For the MSDN Download for Part 2, click here.
For the SQLskills Blog Entry for Part 2, click here.

Part 3: Designing Tables that Scale, Best Practices in Data Types and Initial Table Structures
For the MSDN Download for Part 3, click here.
For the SQLskills Blog Entry for Part 3, click here.

Part 4: SQL Server Indexing Best Practices
For the MSDN Download for Part 4, click here.
For the SQLskills Blog Entries for Part 4
Resource links blog entry, click here.
Q&A blog entry, click
here.

Part 5: SQL Server Index Defrag Best Practices
For the MSDN Download for Part 5, click here.
For the SQLskills Blog entry, click here.

Part 6: Mixed Workloads, Secondary Databases, Locking and Isolation
For the MSDN Download for Part 6, click here.
For the SQLskills Blog Entry for Part 6, click here.

Part 7: Understanding Plan Caching and Optimizing Procedure Performance
For the MSDN Download for Part 7, click here.

Part 8: Data Loading and Aging Strategies - Table and Index Partitioning
For the MSDN Download for Part 8, click here.

Part 9: Profiling for Better Performance
For the MSDN Download for Part 9, click here.

Part 10: Session Summary - Common Roadblocks to Scalability
For the MSDN Download for Part 10, click here.
Transcript can be found here.

So, the series comes to an end (even though I still have more work to do). I have to say that it was a lot of fun and I enjoyed everyone's questions. And /start shameless plug here/ starting in March, SQLskills will begin a 10-12 part series on TechNet. The series will include sessions from my colleague Bob Beauchemin as well as me. This will definitely be more Operations and DBA focused but for all of you developers - it may help you better understand the system, High Availability and a myriad of New Features in SQL Server 2005.

I hope to see you there - or at least your DBA... ;-)

Thanks again everyone,

Kimberly

This is a much needed and much overdue blog entry... In 8 Steps to Better Transaction Log throughput, I mentioned a customer that was helped by TWO typical optimization problems I see. In that blog entry, I said I would write two blog entries - that one on transaction log optimization and another on common tempdb optimizations. Well, I forgot...until I was reminded with an email this morning (thanks Marcus!).

First - a bit of understanding of TempDB - what goes there?

  • Internal temporary objects needed by SQL Server in the midst of other complex operations. For example, worktables created by a hash aggregate will be stored in TempDB or interim tables uses in hash joins (almost anything that shows as "hash" something in your query plan output is likely to go to tempdb).
  • User objects created with either # (for local temporary objects), ## (globabl temporary objects) or @ (table variables)
    • # = Local temporary object
      Local temp objects are objects accessible ONLY in the session that created it. These objects are also removed automatically when the session that created it ends (unless manually dropped).
    • ## = Globabl temporary object
      Global temporary objects are objects that are accessible to ANYONE who can login to your SQL Server. They will only persist as long as the user that created it lasts (unless manually dropped) but anyone who logs in during that time can directly query, modify or drop these temporary objects. These objects are also removed automatically when the session that created it ends (unless manually dropped) OR if being used by another session when the session that created it ends, when the session using it finishes using it (and it's only as long as any locks are held). If other sessions need more permanent use of a temporary object you should consider creating a permanent objects and dropping it manually.
    • @ = User-defined Table Variable
      User-defined Table Variables were introduced in SQL Server 2000 (or, wow - was it 7.0?) and provide an alternative to temporary tables by allowing you to create a variable defined as type TABLE and then you can populate and use it in a variety of ways. There has been A LOT of debate over whether or not you should always use table variables or always use temp tables. My response is that I ALWAYS avoid the word always! My point is that table variables are NOT always better nor are temp tables always better. There are key uses to each. I tend to like temp tables in scenarios where the object is used over a longer period of time - I can create non-key indexes on it and it's more flexible to create to begin with (SELECT INTO can be used to create the temp table). I also have the ability to use the temporary table in nested subprocedures because it's not local to the procedure in which it was created. However, if you don't need any of those things then a table variable might be better. When it is likely to be better - when you have smaller objects that don't need to be accessed outside of the procedure in which it was created and when you only need KEY indexes (a table variable ONLY supports the indexes created by a create table statement - meaning PRIMARY KEY and UNIQUE KEY).
  • Objects created by client applications - this is possibly a large part of your problem... Profiling can help you to determine if there's a lot of TempDB usage from your client applications.

OK, so now that you know what goes there - how do you make it optimal?

First and foremost, TempDB is in cache just as any other database is in cache. TempDB does not spill to disk unless you are low on cache and/or if you have a lot of inflight transactions in TempDB. Although TempDB is not persisted from shutdown to restart - it still needs to do some logging and therefore you should consider its optimization a lot like other databases.

Things you should do for TempDB (that are a lot like what you should do for every database):

  1. Isolate the data and log portion of TempDB.
  2. Place them on clean, defragmented disks.
  3. Pre-allocate them so they don't need to do a lot of autogrowth.
  4. Make sure you have sufficient memory to support active objects (check for disk activity to the disks that contain TempDB files).
  5. Make sure that transactions are written efficiently so that there are no unusually long running transactions that are unnecessarily holding resources (and therefore locks and therefore log activity).

And - if you need to move TempDB, you should review this KB Article: Moving SQL Server databases to a new location with Detach/Attach

Things you should do SPECIFICALLY for TempDB (especially if you're running on a multiproc machine):

Before I say what... let me tell you why? TempDB has a large number of objects being created all the time. For an object to be created, space must be allocated to it. Space allocation is determined by looking at some of the internal system pages (the GAM, and SGAM). In the end, it is these pages that start to have significant contention (with just one file) in a VERY active TempDB. To minimize that contention you can create multiple files.

  1. Consider creating multiple files for TempDB (even if on the same physical disks) so that there is less of a bottleneck when objects are being allocated. Make sure to read associated KB.
  2. Consider setting a trace flag to have object allocation grab extents rather than pages. Make sure to read associated KB.

BOTH of these last two are described in detail by a KB article: FIX: Concurrency enhancements for the tempdb database.

OK - so that should really help! Moving forward (meaning SQL Server 2005), having multiple files can still help for TempDB.

Effectively Designing a Scalable and Reliable Database

A Primer to Proper SQL Server Development

New Features in Indexing and Index Maintenance Best Practices, Part 5 of 10

Presented by Kimberly L. Tripp, SQLskills.com

Q: Can I view a recording of this webcast? Part 5 can be replayed by clicking here.

Q: Where can we get the demo scripts AND the sample database: Credit? The demo scripts are in this zip (20050902MSDNDemoScripts.zip (8.52 KB)); here in this blog entry. However, at the series completion, I will also create an entry under Past Event Resources for the entire webcast series.  To download the ZIP of the Credit Database Backup click here. Once unzipped, restore this backup to SQL Server 2000 or SQL Server 2005. The backup is a SQL Server 2000 backup and can be restored to either version! If restoring to SQL Server 2005, you might want to change the destination for the data and log file as the path will probably be different.

Q: Where are the links to all prior Webcast Q&As from this series?

Part 1: Creating a Recoverable Database
For the MSDN Download for Part 1, click here.
For the SQLskills Blog Entry for Part 1, click here.

Part 2: Creating a Reliable and Automated Backup Strategy
For the MSDN Download for Part 2, click here.
For the SQLskills Blog Entry for Part 2, click here.

Part 3: Designing Tables that Scale, Best Practices in Data Types and Initial Table Structures
For the MSDN Download for Part 3, click here.
For the SQLskills Blog Entry for Part 3, click
here.

Part 4: SQL Server Indexing Best Practices
For the MSDN Download for Part 4, click here.
For the SQLskills Blog Entries for Part 4
Resource links blog entry, click here.
Q&A blog entry, click here.

Q: How can I replay previous sessions? I thought we were going to get emails for replaying -- but I haven't received any replay emails. You will receive replay emails ONLY when you register for these sessions through MSDN. We’ve come to find out that there are other ways to register but it’s only through MSDN that we know for sure you will receive the replay information. Regardless, you can always find the “on-demand” version of the sessions here.

Related Resources

MSDN Webcast: Indexing for Performance – Proper Index Maintenance MSDN Whitepaper: Microsoft SQL Server 2000 Index Defragmentation Best Practices TechNet It’s ShOwtime Webcast: Index Defragmentation with SQL Server 2005 

Technical Questions

Q: In your script, what is "HA Requirements"? HA = High Availability. This is the requirement that your table stay online and available. Some companies are trying to achieve 99.999% uptime, this is especially challenging when even maintenance operations take a table offline.

Q: If you create extra indexes is there a easy to configure utility that you can run across an application after it has run for a few months to list keys that were never or infrequently used? Use one of the new DMVs: sys.dm_db_index_usage_stats. To see the complete list of DMV objects, use the following query:SELECT * FROM sys.system_objects WHERE [name] LIKE 'dm[_]%'

Q: What is DMV again? Dynamic Management View. These are new objects which give information about in-memory objects and state information.

Q: What are the parameters and their usage – for sys.dm_db_index_physical_stats? (DatabaseID, ObjectID, IndexID, PartitionNumber, Mode)

DatabaseID = [ NULL | 'DatabaseID' ] NULL: returns information for ALL databases, if NULL is used no other options can be supplied. This returns ALL indexes for all objects in all databases. Easy but possibly slow.
DatabaseID: smallint type. Refers to the ID for a specific database. DB_ID() or DB_ID('DatabaseID') can be used. The latter allows you to run this from ANY database as long as you have access. However, 3-part naming must be used.
ObjectID = [ DEFAULT | NULL | 'ObjectID' ]DEFAULT/NULL: return ALL base data: CL, Heap, LOB for the specified database.
ObjectID: int type. Refers to the ID for a specific object. OBJECT_ID('TableName') can be used. When using OBJECT_ID, you can use 1/2/3-part naming. Be sure to use 3-part when executing outside of database.
IndexID = [ DEFAULT | NULL | 'IndexID' ]DEFAULT/NULL: All indexes
IndexID: tinyint type. Refers to the ID of a specific index.
PartitionNumber = [ DEFAULT | NULL | # ]DEFAULT/NULL/0: return ALL partitions
#: returns only the details about specific partition. When a PartitionNumber is specified then an IndexID must also be specified.
Mode = [ DEFAULT | NULL | 'SpecificMode' ]DEFAULT/NULL/LIMITED: return FAST scan and use only an IS (Intent Shared) Table-level lock. This lock blocks ONLY eXclusive TABLE-level locks and schema changes. Excellent, relatively unobtrusive way to get fragmentation details.
LIMITED: IS Lock. Same as SQL 2000 WITH FAST, only page counts and EXTERNAL fragmentation displayed. Does not detail INTERNAL fragmentation and page density.SAMPLED:  IS Lock. For tables less than 10,000 pages (~80MB), all details are produced. For tables of more than 80MB, two samples are done (1% and 2%) at every nth page. The samples are compared and if close, 2% sampling output returned. If not close, then up to 10% will be sampled.DETAILED: S Lock. Entire table analyzed for both internal and external fragmentation. Returns one row for each level of the index from the leaf level (level 0) all the way up to the root level. This can help you determine fragmentation in the non-leaf levels but at the expense of holding a shared table level lock.
Q: How often should you run DEFRAG on your SQL server box? Should this be a part a regular schedule? Taking down SQL is their any other consideration? First, the only thing that’s not available is the table being REBUILT. Defragging an index does not take that table/index offline. So, more than anything, it depends on what you’re trying to achieve. If you want achieve better availability on SQL Server 2000 then you might choose to defrag rather than rebuild – to keep your tables available.

Q: How often do you get such perfect tables in practice? A table is always completely clean and contiguous after a rebuild. To periodically fix a table, you should use consistent and automated rebuild strategies.

Q: Do you have suggestions for developers using MSDE when customer’s demands can vary? Vary from few transactions to a large customer with many transactions. The general best practices in database and table design scale from the low end all the way up to the high-end and in the end – helps your database scale!

Q: Can you touch on rules of thumb for "pad index"? If fragmentation in the leaf level is minimized through proper index maintenance and fillfactor – then fragmentation in the non-leaf levels should be low as well. You rarely need to specify padindex unless you have widely varying distribution of data and really want to leave larger gaps because of strange densities of data.

Q: Do most of these "Index Rules" apply to Indexed-Views? Yes! All indexes can become fragmented after data modifications... Your scripts should always look for fragmentation across all scripts.

Q: Can you discuss fragmentation WRT horizontal partitioning, especially range partitioning on the primary key? SQL Server 2005 offers more granular rebuild options –but not necessarily online. In many cases, you might want to design a read-only partitioned table and keep the volatile portion of the table (especially if only one partition), in its own separate table – possibly using a partition view (or an inline table valued function) over these two tables.

Q: If I'm selecting from a table with a where FirstName = ... and LastName = ... and I have 2 indexes, one on LastName and another on FirstName. Are they both used? With an AND – maybe. The optimizer will look at the Index statistics to determine if either of them selective enough to use only one index. If neither is selective alone and a better index does not exist (a better index for AND would be one that includes BOTH of the columns in the SAME index – as a composite index), then SQL Server may choose to join the indexes (index intersection).

Q: URLs on the Resources slide can't be read. Could you type then into the Q&A, please? When the session is available for download (which is what happens when MSDN put this online), then you can access the URLs there as well. Typically, I place all of the links at the beginning of the Q&A – resources section. I’ll make sure to do this consistently!

Q: How does an uniqueidentifier used as a clustered primary key effect performance? This is best answered by session 4. In short, a non-sequential GUID can cause a lot of fragmentation.

Q: What is ExtentFragmentation as reported by DBCC SHOWCONTIG and is it less important than Logical Fragmentation? Extent Fragmenation refers to how many extents are next to each other. This is a bit more important than Logical Fragmentation as logical fragmentation shows whether or not the pages are next to each other.

Q: How much danger is there in the defrag processes? What kind of backup procedures do you suggest when you defrag? More frequent transaction log backups. A defrag generates a lot of log information. However, it does so in mini transactions. As a result, transaction log backups can occur concurrently with the defrag process and even though the defrag is not complete, the transaction can still be cleared because the defrag process runs as small transactions instead of one long running transaction. This also improves concurrency because the locks are released throughout the process.

Q: Defraging a large index can cause the log file to grow quite large. Is there a way to minimize this other than frequently log backups? Yes, you’re correct – defraging a large index WILL grow the log file quite large! As for minimizing this activity in the log – no way to do that. But – you’re correct in increasing the frequency of log backups!

Q: With very large tables, how much available disk space (both transaction logs and data drive) do you need to have to rebuild? Does it take less space to defrag than to rebuild? Well, this is really a multipart question… First, log space for rebuilds is mostly dependent on the recovery model. If you are running the FULL recovery model then creating and/or rebuilding indexes will take enough log space for the entire rebuild to complete. If you are running in the BULK_LOGGED or SIMPLE recovery models then this operation will run as a bulk operation and will be minimally logged. While this will take less time and significantly less log space, you are giving up some features when switching recovery models. I would strongly suggest reviewing the second session to see if this is appropriate.Now, as for data space – a rebuild will always require at least the table size in free space and possibly as much as double (if an online rebuild is being performed). Typically, when space estimates are being done (when capacity planning the database) I always recommend taking the largest table size and multiplying it by 2 or 3 – in order to make sure you have enough space for rebuilds. There is space needed for sorting as well – this can come from the database OR from tempdb (using the SORT_IN_TEMPDB option).Defraging doesn’t move an object so it doesn’t take additional data space BUT it does require more overall log space because it runs as mini transactions instead of just one.

Q: Should we look at different fragmentation stats if there are multiple files in the same filegroup? No, you still want to review average fragmentation. However, you may have more “fragments” in a table that spans filegroups; this does not necessarily mean that your table is fragmented.

Q: Are there any good third party tools for checking fragmentation and performing maintenance? Unfortunately no revolutionary ones (that I know of and/or can recommend)...but I still have high hopes :)

Q: How do you determine the appropriate fill factor? Unfortunately, there isn't a magic number... but, you can test your guestimate by seeing how fragmented the table becomes between your regularly scheduled defragmentation routines.

Q: Does it matter if I build the clustered index before/after rebuilding the non-clustered indexes? You should always create the clustered index before creating non-clustered index but as for rebuilding - you can rebuild your indexes independently as a rebuild of the clustered does not (generally) cause a rebuild of the non-clustered. There is one exception to this rule in SQL Server 2000 – in 2000 ONLY, SQL Server will automatically rebuild the non-clustered indexes when a non-unique clustered index is rebuild. Why? Because the uniqueifiers are rebuilt.

Q: Will doing a defrag followed later by a rebuild decrease the work of the rebuild? Not really. A defrag doesn’t move the object – only a rebuild does. However, you might minimize the cost of the sort…

Q: How does cache map to table pages, i.e., does free space in table pages have a 1:1 correspondence to wasted cache? SQL Server reads the 8K page from disk into memory. The number of bytes that are wasted on disk are also wasted in memory. This is often the motivation for vertical partitioning! You might refer back to session three for more details on row/page structures!

Q: If switching a varchar cluster to a bigint and vice-versa in 2000, what would the best order of drop/create index? Actually, this is the reason that CREATE with DROP_EXISTING was created… so that you could “change” the definition of the clustered

DROP TABLE test
go
CREATE TABLE test
(
      testid      int               not null,
      col1       varchar(100)      not null
)
go
CREATE CLUSTERED INDEX testind ON test(col1)
go
CREATE CLUSTERED INDEX testind ON test(testid) WITH DROP_EXISTING
go
sp_rename 'test.testind', 'NewIndexName', 'INDEX'
go
sp_helpindex test
go
Q: What about instances of one name only? (like Madonna, Cher, etc. ;^) Well, this is a good question and this is something that you might need to plan for in design. In these cases, you might allow NULLs in the lastname column and then make sure to search both when a lookup is performed. To be honest, I probably won’t do all that much to find these special first names – unless you wanted to do searches across both columns without knowing whether or not this is a first or last name. You might do something like this in a lookup
SELECT * FROM NamesTable
WHERE LastName = @variable
      OR (FirstName = @variable AND LastName IS NULL)
Comment: Just wanted to say I appreciate the blog you have put together.

Thanks for the thanks! It's a lot of work but I think it's great as a reference!! Even for me! To be honest, I can't always remember where to find things either! J

Thanks! So – we’re half way there – 5 more to go! And, lots more questions coming I’m sure J For the next session, we’re going to cover Isolation and options in Isolation in SQL Server 2005. If you’re interested in hearing more isolation, locking/blocking – here’s the registration link:MSDN Webcast: A Primer to Proper SQL Server Development (Part 6 of 10): Mixed Workloads, Secondary Databases, Wait States, Locking and Isolation See you on Friday!

kt

Effectively Designing a Scalable and Reliable Database A Primer to Proper SQL Server Development
Indexing Best Practices, Part 4 of 10
Presented by Kimberly L. Tripp, SQLskills.com For a list of the non-technical Resources related to this webcast, please review the following blog entry:MSDN Webcast Q&A for Resources: Indexing Best Practices, Part 4 of 10  Technical Questions

Q: When a clustered index is added to a table what happens physically to the data pages? SQL Server moves the data to the new location defined by the ON clause of the index creation statement. The ON clause can specify a different filegroup or even a partition scheme and the heap will move to the new location.

 Q: I'm not getting the follow-up emails with the replay link. How do I get on this email list for replay?

You *must* register through MSDN. If you are using a partner site then that is probably why you’re not getting the email. IF you are registering through MSDN and still not receiving the email, please contact eventsup@microsoft.com with your information.

 Q: Does the copy affect the transaction log?

Yes. A create index or rebuild index statement does require the statement to be a single transaction. If you are running with your database set to the FULL recovery model then this single transaction will create a lot of log activity. You could switch to the BULK_LOGGED or SIMPLE recovery model and get less log activity but switching to the SIMPLE recovery model should be avoided as it will break the continuity of the transaction log. I would recommend two resources to get more information about Recovery Models and their impact on performance, logging and recovery.

MSDN Webcast Q&A: A Primer to Proper SQL Server Development, Creating a Reliable and Automated Backup Strategy, Part 2 of 10

Review the sample chapter: Chapter 9: Database Environment Basics for Recovery from the MSPress title: Microsoft SQL Server 2000 High AvailabilityQ: Can one switch back and forth between simple and full recovery mode?

Ideally, you would switch ONLY between the FULL Recovery model and BULK_LOGGED Recovery model but not with SIMPLE. The two resources listed above will also help to clarify this point.

 Q: What about re-building a clustered index? Will it be that slow too? Is it recommended to do it off-hours?

Rebuilding a clustered index takes a lot of the same requirements that creating the CL index does. It can also be slow and yes – as a result, you might want to do this off hours. You can speed up the process by changing recovery models but switching recovery models can also have an impact on recovery. See the two resources above for more details to help clarify this point. And, in part 5, I’ll cover more of the fragmentation details.

 Q: When is it desirable to move a table?

This is not necessarily an easy one to answer simply. But, I often recommend separating transaction processing data from read-only data as well as possibly placing a large table on its own filegroup (especially true with partitioned objects). But, I don’t often recommend separating indexes. Here’s a link to a SQL Server Magazine article I wrote – it does give some guidance. In general I don’t separate too many objects. However, if you’re building a database that you will put into a read-only state you might want to work a bit harder to “shift” objects to filegroups in such a way that each of those objects gets a nice and clean and contiguous chunk of space within the file. Another time that it’s useful to “move” an object is when you load data into a heap and then want to build the clustered index on another filegroup – where that heap will become a partition of a partitioned table. This is good when the clustered index will move the object into a filegroup that does not have other objects in it. Again, the clustered index will be nice and clean and contiguous in the destination filegroup.

 Q: So, if I understand correctly… a good approach to creating a table would be to have a clustered index independent of the primary key. So, ideally you could use a GUID for the PK and then have a clustered index in order to minimize the fragmentation.

The most ideal combination is when the clustered index is on an identity column – and even better if this is the Primary Key. However, if you do want to keep a GUID as the primary key then yes, you can create a clustered index which is independent of the primary key.

Q: In general, do you think folks have more problems setting up indexes for OLTP, OLAP/DWH, or Web apps?

In general, I think Decision Support environments can have more indexes with less of a negative… OLTP is a bit harder to find the right balance. The hardest is combination environments…

 Q: What was the diff between 0 and 1 on the third argument of DBCC IND?

The third parameter is for Index ID. 0 = Heap, 1 = Clustered Index, 2-250 are non-clustered index ids.

 Q: Should we keep the install defaults like Collation, and others? Or do you have recommendations?

Generally speaking, yes – I would recommend keeping the defaults. However, I do recommend that some development environments – especially those who are producing software for sale. When the database can be installed/setup on other servers that may have different code pages, etc., you might want to use binary case-sensitive sort. This will make your application more consistent – even with case-sensitive servers…and you never know if one of your customers may have one! It will also help to make you a better coder.

 Q: What design considerations should I take into account when planning my primary keys, indexes and clustered indexes for databases that will be used with replication?

There are no specific requirements for Replication other than the fact that you must have a Primary Key… but the same best practices discussed today apply and should be used in replication as well.

 Q: I am using SQL Server 2000. I have a table that is used a lot on the web server. The scan density is 97%. Should I adjust the indexing and/or Primary Key?

I tend to think more about how fast it drops rather than the actual number. But – 97% is pretty good in general. What you might want to do is a rebuild IF you have off hours time to rebuild (in SQL Server 2000 rebuilding can be done offline ONLY). Otherwise, you might just want to setup a defrag procedure. However, this is NOT

 Q: Why are so many SQL commands undocumented and how do you get to know how to use them if they are undocumented?

First and foremost, I want to strongly urge you to take your time with commands that are undocumented. If the commands are really useful for day-to-day operations, they’re probably already documented. Occasionally, development teams add additional functionality during development to test their functionality. These commands – while proving useful in some cases – do not generally go through the same quality assurance procedures that fully documented commands go through and as a result, I would be very cautious before using most undocumented commands.

 

Having said all of that, you’ll find a lot of useful tidbits in articles, on webcasts, in books, etc.  Just make sure you test the command thoroughly and don’t expect to get support on it. In general, undocumented commands are NOT supported.

 Q: What is the command for scan density on SQL Server 2000?

DBCC SHOWCONTIG. If you’re interested in learning more about defragmentation on SQL Server 2000, please review this MSDN Webcast: SQL Server 2000 Performance Tuning: Index Defragmentation Best Practices

 Q: In SQL Server 2000, DBCC SHOWCONTIG, can you explain extent scan fragmentation and if we should pay attention to it?

We’re going to go over this more with our next session on Index Defrag but Extent Scan Fragmentation describes whether or not the extents that an object owns are “next” to each other. If there are gaps then this implies that there are other objects “interleaved” with this one. This is not ideal, defragmenting the object will help to improve scans and should.

 Q: The size of the database doubles when a clustered index is added?

Once the index is created the space needed to build the index is released. However, while the index is being created you need roughly 2-3 times the TABLE size to build it. If you have a table which is 1GB then you should plan for at least 2-3GB of free space in order to build or rebuild a clustered index. However, it’s only the table’s size that doubles and it’s only during the index creation NOT after the index has been created.

 Q: Aren't GUIDs too large to use for an efficient index?

No, an index can be useful even if it’s a large percentage of the size of the table – mostly, because of the order of the columns and the criteria used in the query. Hard to give a fast answer to this but in general 16 bytes isn’t so bad – much larger may even be acceptable.

 Q: Does level 1 of a clustered index point to rows, or does it describe the minimum and maximum row on a page?

Basically, it’s a pointer to the page and it has a pointer to the lowest value on the page. Each pointer in the non-leaf always points to the “first” value on the page (meaning the lowest).

 

Q: Why would you be inserting using an index other than using the primary key? Assuming the primary key is appropriate, I wouldn’t. However, there are a lot of environments where the primary key is chosen and kept – for a variety of reasons. SQL Server creates the clustered index on the Primary Key by default but it is not required’ you can create the clustered index on something else. It’s the clustered index which defines the insert order – not the Primary Key; however, they are often one and the same.

 Q: Should all indexes be clustered if unique? Would this increase performance on search results from SP?

I think the question should really be should all clustered indexes be unique. Generally, the answer to this is yes. But to answer the second part – would this increase performance on an SP – the answer here is not necessarily. Stored procedures have many reasons for being – or not being – optimal. Plan choice, optimization and recompilation issues all play a major factor in whether or not a stored procedure is optimal. However, in session 7, I will talk about plan caching and stored procedure optimizations!

 Q: If there is no natural clustering key, what is the advantage of "coming up with one". Or, said another way, what is the basic reason why I want a cluster rather than a heap?A heap does not inherently have order to it (that’s the point) but when the records are not ordered then SQL Server needs to find a location in which to insert the row. This causes a lookup in something called a PFS (page free space) as well as a lookup in the table’s Allocation Map (actually called an Index Allocation Map in SQL Server 2000 and a Heap or B-Tree Map in SQL Server 2005). To help the insert performance you generally want to have the insert location defined. However, just having a clustered index is not the best answer as there are other internal dependencies on the log. Because of these internal dependencies you typically want the clustered index to be unique, narrow and static.  If you’re interested in reading more about the reasons behind having a clustered index which is unique, narrow and static:Ever-increasing clustering key - the Clustered Index Debate..........again! MSDN Webcast Q&A: Index Defrag Best Practices - Clustering Key Choice In some cases this means adding a column to a table solely for the purposes of clustering on it. Q: Is it a good practice to just have non-clustered indexes?Well, that’s a different way of answering the prior question…and while heaps can be useful for high performance loading they are generally not as useful for day to day OLTP or combination OLTP/DSS tables. So, my recommendation is almost always – the RIGHT clustered index + additional supporting non-clustered indexes. Q: In large OLTP databases is it recommended to move reporting to a separate database which has a lot of non-clustered indexes and no clustered indexes?No… Again, it’s not really a binary thing (should you always have a clustered, or not always have a clustered). Really, there are cases for both BUT in most databases – even DSS (Decision Support System) – a clustered index helps to define a narrow and unique value that all indexes have in common. This can help other operations…  So, even in a DSS environment I generally recommend a clustered, as well as additional supporting non-clustered. Having said that there are cases where a single clustered – setup for order by and range queries can be useful but that’s generally only when the large majority of queries all want * (all columns) and you typically want the exact search and order by in almost every case. Q: I've read that you shouldn't cluster a column that is an Identity column because it hurts performance for OLTP as all writes are on same page/area of disk? Your thoughts?

There are some cases where insert performance can be compromised – in very high volume OLTP systems (typically more than 400 inserts per second but this can vary) but in general, inserting into the same “hot” spot of a table improves performance because the activity is isolated and the needed pages are already in cache. Lots more on this on in other Q&As.

 Q: If I add an identity column to my table to have a good clustered key, it will never be used in a query. Would I naturally also include the primary key, even though it would make the clustered key less narrow?

You don’t need to include the primary key is the clustered key is already unique, narrow, static and ever-increasing.

 Q: Does this mean that vertical partitioning of a table is only helpful when the table is scanned?

No, there are still many benefits in vertical partitioning – such as with locking, scanning. I’d refer you to review the Webcast that was part 3 in this series, for more details.

 

Q: Can you use DTA against SQL 2000 databases?

Yes. You can use either the DTA or the ITW against SQL Server 2000. If you're looking for details about how to use the SQL Server 2000 Index Tuning Wizard you can review this whitepaper: Index Tuning Wizard for Microsoft SQL Server 2000.

 Q: Is DTA in the Express version, too?

DTA does not come with the Express Edition but you can certainly do tuning on a higher version and then use that tuned database in Express.

 Q: Can you explain briefly Index View?

It’s a materialized view – in the sense that the data as defined by the view – is actually materialized and stored on disk. This has both pro’s and con’s and needs to be evaluated fully before you implement these. Please check out these whitepapers for more details:

 Improving Performance with SQL Server 2005 Indexed Views by Author: Eric Hanson and Contributors: Gail Erickson, Lubor Kollar, Jason WardImproving Performance with SQL Server 2000 Indexed Views by Author: Gail Erickson and Contributors: Lubor Kollar, Jason Ward contributor Q: Do you think one can become expert in SQL Server and other aspects of Design/Programming or does one need to specialize?

Wow, I think it this is a tough question; more than anything I would recommend learning all of the options available so that you can better architect the right solution. Once you think you have a good solution, then move forward to really learn that area/topic/features. I’ve been using a phrase with SQL Server 2005 a lot – Jack of all trades, master of some...

 Q: Can you repeat the new function INCLUDE on index?

In SQL Server 2000 the maximum size of an index key is 900 bytes or 16 columns – whichever comes first. This key helps to keep an index’s b-tree smaller and more scalable. In SQL Serer 2005, the leaf-level of the index can include additional non-key columns. This continues to help keep the tree structure scalable but also allows you to cover more queries.

 Q: I have installed SQL Server 2005 CTP, but can only find SQL Configuration Manager. I don't see the management tools that you are using. Any suggestions?

Actually, no. This seems a bit strange?! I might try uninstalling and re-installing, as well as reviewing the setup logs to see if there are errors within them. For SQL Server 2005, all installation log files are stored in C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG.

 Q: if your non clustered index either includes all or part of the clustered index are the clustered index columns added again to the non clustered index?

NO! SQL Server only adds the column(s) of the clustering key – if they are missing.

 Q: Did you mention the cool timebound option in DTA?

Yes, but only briefly. Vipul Shah discusses DTA in greater detail in these two webcasts.

 TechNet Webcast: Performance Diagnosis in SQL Server 2005 (Level 300) by Vipul ShahTechNet Webcast: SQL Server 2005 Database Tuning Advisor (Level 300) by Vipul Shah Q: What is the fill factor and how is it used?

I think I’m going to defer you to the next session! Now, there’s some motivation. J

 

And here are a few comments that I’ve received! I just wanted to say thanks for your comments and thank yous… J It's your questions that keep me on my toes and current. This is still fun so keep it coming.

 

Comment: Thank you again, particularly for thoroughness and enthusiasm.

Comment: Let me add my voice to the chorus of "thank you's", Kim. You're  presentations and verbal Q&As are great, and your online ones are a truly generous gift to the community. No other webcaster has come close to doing what you do. Many thanks from all of us.  

Comment: I must compliment you on your Q&A in your blog - I've just checked it out - couldn't be more comprehensive!

 

So – 4 down, 6 to go! Lots more questions coming I’m sure J

 

If you’re interested in hearing more about fragmentation, how to view it, how to clean it up and finally, how to prevent it – here’s the registration link:

MSDN Webcast: A Primer to Proper SQL Server Development (Part 5 of 10): New Features in Indexing and Index Maintenance Best Practices (Level 200) Thanks!kt

Effectively Designing a Scalable and Reliable Database A Primer to Proper SQL Server Development

Indexing Best Practices, Part 4 of 10

Presented by Kimberly L. Tripp, SQLskills.com

Q: Can I view a recording of this webcast? The webcast will be available for viewing within 24 hours. All “on-demand” webcasts can be found here.

Q: Where can we get the demo scripts AND the sample database: Credit? The demo scripts are in this zip (20050826MSDNDemoScripts.zip (12.17 KB)); here in this blog entry. However, at the series completion, I will also create an entry under Past Event Resources for the entire webcast series.  To download the ZIP of the Credit Database Backup click here. Once unzipped, restore this backup to SQL Server 2000 or SQL Server 2005. The backup is a SQL Server 2000 backup and can be restored to either version! If restoring to SQL Server 2005, you might want to change the destination for the data and log file as the path will probably be different.

Q: Where are the links to all prior Webcast Q&As from this series?

Part 1: Creating a Recoverable Database
For the MSDN Download for Part 1, click here.
For the SQLskills Blog Entry for Part 1, click here.

Part 2: Creating a Reliable and Automated Backup Strategy
For the MSDN Download for Part 2, click here.
For the SQLskills Blog Entry for Part 2, click here.

Part 3: Designing Tables that Scale, Best Practices in Data Types and Initial Table Structures
For the MSDN Download for Part 3, click here.
For the SQLskills Blog Entry for Part 3, click here.

Q: How can I replay previous sessions? I thought we were going to get emails for replaying -- but I haven't received any replay emails. You will receive replay emails ONLY when you register for these sessions through MSDN. We’ve come to find out that there are other ways to register but it’s only through MSDN that we know for sure you will receive the replay information. Regardless, you can always find the “on-demand” version of the sessions here.

Resources mentioned in today’s Session:

If you’re interested in hearing more of the theory behind the indexing recommednations, as well as more of the reasoning behind unique, narrow and static criteria for the clustering key –  watch the TechNet "It's Showtime" recording, recorded at Tech Ed Europe titled: Index Creation Best Practices with SQL Server 2005 which covers Index Creation Best Practices on SQL Server 2005 and really gets into a lot of the "why's". If you’re interested in hearing a lot more about how to use Profiler effectively with ITW, watch the MSDN Webcast titled: SQL Server 2000: Performance Tuning - Finding the Right Balance of Indexes which covers Index Creation Best Practices on SQL Server 2000.

As for whitepapers:

SQL Server 2005 Beta II Whitepaper: Partitioned Tables and Indexes by Kimberly L. Tripp

SQL Server 2005 Beta II Whitepaper: Snapshot Isolation by Kimberly L. Tripp

PREVIEW: The Database Administrator’s Guide to the SQL Server Database Engine .NET Common Language Runtime Environment by Kimberly L. Tripp

Improving Performance with SQL Server 2005 Indexed Views by Author: Eric Hanson and Contributors: Gail Erickson, Lubor Kollar, Jason Ward

Improving Performance with SQL Server 2000 Indexed Views by Author: Gail Erickson and Contributors: Lubor Kollar, Jason Ward contributor 

Additional Webcasts:

TechNet Webcast: Performance Diagnosis in SQL Server 2005 (Level 300) by Vipul Shah

TechNet Webcast: SQL Server 2005 Database Tuning Advisor (Level 300) by Vipul Shah Support WebCast: SQL Server 2000 Profiler: What's New and How to Effectively Use It by Sri Kasam and Ajay Manchepalli TechNet Support WebCast: I/O performance problems and resolutions in Microsoft SQL Server 2000 by David G. Brown TechNet Webcast: SQL Server 2005 Series (Part 7 of 10): Indexing Enhancements (Level 200) By Bryan Von Axelson

Summary!

Finally, check out the Indexes category on my blog – for a variety of Q&A postings, other resources and lots of great links! The rest of the technical Q&A, I should be able to post soon. Stay tuned! Thanks,kt

MSDN Webcast: A Primer to Proper SQL Server Development
Designing Tables that Scale, Best Practices in Data Types and Initial Table Structures, Part 3 of 10

Presented by Kimberly L. Tripp, SQLskills.com

Q: Can I view a recording of this webcast? Here’s the specific replay link.

Q: Where can we get the demo scripts? The demo scripts are in this zip (20050819MSDNScripts.zip (3.67 KB)); here in this blog entry. However, at the series completion, I will also create an entry under Past Event Resources for the entire webcast series.

Q: Does Kimberly L. Tripp have a blog? Yes, www.SQLskills.com/blogs/Kimberly and if you’re reading this on my blog…well, then you already know! J

Q: Where are the links to all prior Webcast Q&As from this series?

Part 1: Creating a Recoverable Database
For the MSDN Download for Part 1, click here.
For the SQLskills Blog Entry for Part 1, click here.

Part 2: Creating a Reliable and Automated Backup Strategy
For the MSDN Download for Part 2, click here.
For the SQLskills Blog Entry for Part 2, click here.

Q: Is this Session 2 of the "Effectively Designing a Scalable and Reliable Database" webcast series? The Title bar here reads "MSDN Webcast: MSDN: A Primer to Proper SQL Server Development (Part...)". A bit misleading don't you think? So, I finally figured out the mystery here. When I originally submitted my proposal for this series, I titled it like this: Effectively Designing a Scalable and Reliable DatabaseA Database Developer’s Primer to Proper SQL Server Development (Level 200-300) But somehow, through a few edits and since each module also has a more detailed title shown in this following list, then I think the title just got edited down to fit. In fact, with subtitles like these, I can barely fit the entire title, subtitle and session title on a single slide. J

Part 1: Creating a Recoverable Database

Part 2: Creating a Reliable and Automated Backup Strategy

Part 3: Designing Tables that Scale – Best Practices in Data Types and Initial Table Structures

So, the mystery ends here… It’s all the same content and the general idea is that building a scalable and reliable database takes a variety of best practices that together make it happen; there is no “DBCC MAKEITGOFASTER(‘TRUE’)” option. I'll unveal a new title slide in part 4... oh, the excitement!

Q: How can I replay previous sessions? I thought we were going to get emails for replaying -- but I haven't received any replay emails. You will receive replay emails ONLY when you register for these sessions through MSDN. We’ve come to find out that there are other ways to register but it’s only through MSDN that we know for sure you will receive the replay information. Regardless, you can always find the “on-demand” version of the sessions here.

Q: What does CLR stand for? Common Language Runtime. Quick WHATIS.COM Definition here.

Q: If something uses sp_OA* now in SQL 2000, am I guaranteed it will use SQLCLR in SQL 2005? Not automatically; however, you should consider rewriting you sp_OA* is used to automate COM objects; these, by definition, are not framework objects. Moving to SQL Server 2005, you can continue to run with your previous sp_OA procedures to access COM objects but if you can write SAFE .NET Frameworks code to do the same thing then you should convert your sp_OA* code where possible. Not everything can be converted but the basic principle is that which you use sp_OA* is a good thing to target for conversion. For some great details on the differences and how to move forward – go here. Also helpful is this KB Article regarding COM and .NET in SQL Server 2000 Using extended stored procedures or SP_OA stored procedures to load CLR in SQL Server is not supported

Q: I loved how XML data is available interactively in SQL Server Management Studio (SSMS); can HTML data columns be displayed in the same way? Yes, if you store HTML in a column of type XML then the column data will appear as a link and when clicked, you can edit it in the XML Editor window in SSMS.

Q: Is the "XML" datatype really a "varchar(max)"? No. In previous releases the XML type really was just a “blob” stored in the database. There were no inherent optimizations and you were not able to define a scheme for an XML column. In SQL Server 2005, XML data is stored natively and offers the ability to put indexes on the columns as well as better access and manipulation. You *can* store XML as varchar(max) but it will be harder to access/code. There’s a very nice description of the tradeoffs in this whitepaper:XML Options in Microsoft SQL Server 2005 

Q: What is the difference between numeric and decimal? In SQL Server they are synonyms. I seem to remember hearing that there was a difference in the standards…somewhere but with a few references and looking around I can’t find any. Regardless, I’d make sure that you standardize on one or the other – just in case they do change, at least all of your data is consistent.

Q: I use decimal datatype for money values. Why it may be suboptimal to use decimal (instead of money datatype)? I wouldn’t say that it’s suboptimal – except maybe in storage length. However, decimal can be more precise.

Q: When should you use a float? A number of my clients use Float – i.e. Banks, Stocks, etc. What would be your argument against this? Just the lack of an absolute minimum precision/scale.

Q: We have a database where datatypes are set. But to use a simple percentage calculation we had to add .0 to the calculation to get the correct percentage. For example 100(5/9) is truncated to 0, but 100((5+.0)/9) yields correct result. Is there any standard procedure without massive conversions to assure a correct result? It’s a matter of data type. In the case of 100, 5 and 9 – these are all integers – without any decimal place. So – the division of 5 divided by 9 yields 0. However, 5 divided by 9.0 or 5.0/9 – both have a decimal and in that case the other data types are implicitly converted to the higher/more precise data value. A cool way to find out what the actual “base type” and/or the level of precision/scale is to ask these questions as if the data is of type sql_variant:

SELECT SQL_VARIANT_PROPERTY(5/9.0, 'BaseType')
SELECT SQL_VARIANT_PROPERTY(5/9.0, 'Precision')
SELECT SQL_VARIANT_PROPERTY(5/9.0, 'Scale')   -- numeric(8,6)
SELECT SQL_VARIANT_PROPERTY(5/9, 'BaseType')
SELECT SQL_VARIANT_PROPERTY(5/9, 'Precision')
SELECT SQL_VARIANT_PROPERTY(5/9, 'Scale')   -- int

Q: I am surprised by the nchar/nvarchar suggestions. I've always been told to NOT use these data types. Can you give the "reasoning" for this suggestion here? The primary reason that you want to use Unicode data is to avoid a cost in “translation” between the client and the server – when you’re using web-based applications that natively use Unicode. Additionally, with ever growing customer bases – you can more effectively store international data properly by supporting Unicode in the client and in the database. You are correct that these take 2 bytes per character rather than one – but that’s what gives them the larger range of characters to use/display/manipulate. Also, if you plan to use the SQLCLR against these columns, you will need to use Unicode as opposed to ASCii as the SQLCLR requires it.

Q: Does Unicode use twice as much space? Yes, Unicode columns are 2 bytes per character. If you defined a column of type nchar(100), that column will require 200 bytes of storage.

Q: Sometimes, and not always, it becomes a tradeoff in Storage Space v. Performance, doesn’t it? That’s an excellent point. You always need to look at the trade-offs associated with any decision and in fact, this is yet another. So, I always want the row size to be as optimal and as reasonable as possible. However, there are many compelling reasons – international and/or web-based applications as well as SQLCLR functionality.

Q: Is the N' keyword usable in MSSQL 2000? Yes, all Unicode character strings should be preceded with an upper case N in all releases where Unicode is supported (SQL Server 7.0 was the first release that supported Unicode).

Q: Is there any relationship between the OS's code page and nvarchar/varchar? Yes. When SQL Server is installed- it chooses the server’s default code page from the Operating System. A code page (or collation) defines the order in which characters are evaluated. The additional attributes a collation can define are: whether or not the characters are case-sensitive; the sensitivity of width and kana-type. This may not seem obvious but if these characters sets are native to you then the order to the data would probably make more sense.  Once the collation is set for the server, the database inherits the server’s collation by default, when tables are created – each character-based column inherits the database’s default collation by default. However, at ANY of these levels the collation can be changed. There are benefits to doing so – in that you can keep data within a specific table ordered for a specific character set. Wow, now that I’m getting started, this could be a session in and of itself. So, just to give you a few tips:

  • Be careful using temporary tables if your database’s collation is different than your server’s collation.
  • Changing collation on the fly (for example, if you want to do case-sensitive searching through a view – of which I’ve included a sample file) can be expensive.
  • Changing a database’s collation is supported but you might have troubles converting from a case-sensitive to a case insensitive database.

The files that can help you to get more familiar with verifying collations, changing collations, etc. are CaseSensitiveSearchingInView.sql and ChangingDatabaseCollation.sql.

Q: Another great use of adjunct tables is to isolate culture-specific data from the base (and usually indexed) data. So, this is a great point. If you change the collation for a column – and index it – then you can get better performance when accessing that column. Changing a collation on the fly (as in the script sample for the previous questions), is what’s expensive. However, you should make sure to do a lot of testing.

Q: I have a US code page OS but Latin general 850 collation set for SQL Server; I have no problems storing and retrieving data. Exactly! That’s the benefit of collation information being stored with each column. It allows the application to store many types of data – natively and then just deal with their display at the client.  For those of you who want more details on how to do this, see COLLATE in the books online.

Q: Somewhere along the line I was always told to define varchars in multiples of 8 characters. Something to do with space allocated with new rows.  Any ideas if it has any premise? Well, my first response to this was that I’ve never heard of this… So, I thought I’d check around and well, the couple of folks I’ve chatted with haven’t heard this either… IF anyone has – feel free to comment on this blog.

Q: How about images? Is it better to store them in the db or just the url? This is really an old debate. I still think there’s validity to both sides of it as well. I’d say that you should go through the pro’s and con’s and really choose what’s best for your application. If you use a lot of urls then you might be able to reference more without having to manage a copy of that data… If the url is your data then you might find that managing it in the database (from a backup/restore perspective) is easier – and you know that it’s always going to be there v. the url occasionally not working. Wow, I’d almost like to get a panel of folks together to have everyone hash out their opinions on this one. J SQL Server 2005 does make managing images easier but there are still points to both arguments!

Q: How would SQL Server support becoming a data server of video file? The only real format for this data would be binary. However, there is a 2GB limitation to a single value.

Q: Are you aware of a stored procedure to handle blob data effectively? i.e. for image storage? No. But there is an executable that comes with SQL Server 2000’s samples and I think it’s called textcopy.exe. That might be some good sample code to review.

Q: Does SQL Server support the SQL 2003 NULLS FIRST and NULLS LAST keywords of the ORDER BY clause? No support for NULLS FIRST or NULLS LAST but the order by can include ASC for ASCending or DESC for DESCending. NULL values are treated as the lowest value so NULLS FIRST would be ASC and NULLS LAST would be DESC. But – that’s the only support for specific placement of NULL values.

Q: Is there a "best practice" for the new C# nullable types? i.e. int? vs. int -- re:MSSQL2005? When programming with the SQLCLR, always use types that allows NULLs (assuming your data can contain nulls) and then make sure to always follow the appropriate behavior with NULL. For example, concatenating NULL should yield NULL.

Q: Can I change (refactor) the underlying type of a UDT (e.g., from int to bigint) in SQL Server 2005?  Unless I've missed something, I can't do it in SQL Server 2000 unless I drop the type from all uses. No, as the user-defined data type (UDDT) is only used at create time to map to the datatype; the connection is then broken (altering the UDDT definition will only impact new tables and variable definitions). User-defined Types (UDTs) are even more schema bound than UDDTs and it's a much tougher job to alter them (you need to write conversion from the old to the new). This is much more complex since they fall outside the conversion matrix and the only thing SQL knows is the binary representation. So if the data type was smart enough to add a signature to the serialized data, so that the next version can understand which data type is represents and determine if it can convert to it you might have a shot, otherwise you have to add a column, and convert columns and drop a column afterwards.

Q: Can I easily replace the SQL Server 2000 UDT with a SQL Server 2005 CLR Custom Type? I have a client who wants to -- eventually -- migrate to SQL Server 2005, but for the immediate future wants me to build their application with SQL Server 2000. I'm very interested in SQLCLR types, especially WRT to maintaining type consistency across T-SQL, CLR SP's and external data objects, so using CLR custom types. This is a typical thought for the SQLCLR – “I am going to build an object database.” Type fidelity between the client and the server exists today, but they want class fidelity, so they are going to create the customer object UDT, which is not really for what the SQLCLR was intended. You should be thinking scalar types with additional custom characteristics – not object types – as the scalar types will give you optimal and predictable performance.

Q: Can you find ANSI_NULL_DFLT_OFF setting as on or off? Does DBCC USEROPTIONS do that? Yes, DBCC USEROPTIONS is a nice, quick way of seeing the currently set LIST of session settings. However, there are better ways to check these settings programmatically. it’s not as nice to check it programmatically  The files that can help you to get more familiar with ANSI Null Issues is ANSINullIssues.sql.

Q: It would be great if you could show us a list of  classic fields "Account No", "Account Balance", "Description" etc… and tell us what type you would use inc. "Null/Not Null" etc. Well, this is a bit subjective and with a bank account there would probably be a lot of business rules that surround the account number. However, if this were a sales table and you needed an ordered then it would be a bit easier (again, business rules might change this). Typically, I’d go for a Primary Key which is simple and narrow and not necessarily a “natural” key. However, some of this reasoning comes from how tables are joined and how SQL Server internally stores data. Account Balance – I’d go with a precise numeric. Based on length of storage and the business rules behind required precision, you might choose a numeric(19,6). This will require 9 bytes of storage. For description, I would definitely go with a variable type – probably nvarchar(200) depending on what description really means. Here’s how the table would probably look on creation:

CREATE TABLE Account(
AccountID     int           IDENTITY  NOT NULL CONSTRAINT AccountPK          PRIMARY KEY,
Balance       numeric(19,6)           NOT NULL CONSTRAINT AccountBalanceDflt DEFAULT 0,
Description   nvarchar(200)           NULL)
go
INSERT Account (Description) VALUES ('The account of Kimberly Tripp.')
PRINT 'Kimberly Tripp''s account number is ' + convert(varchar(10), @@IDENTITY) + '.'
go

Q: In the Q&A you might want to tell your listeners that in SQL the string 'abc' is equal to 'abc ' when compared – unlike for C, C++ ,C# , VBNET, etc. This will help with the char v. varchar issue… It’s a good point! You should make sure that you do some testing to make sure that data behaves the way you expect. Trailing blanks may be trimmed with some data types and not others and once again, ANSI settings may impact this behavior.

Q: What are the NULL block and varchar block in the row layout? The NULL block helps to store and identify rows that are set to NULL. In the past, SQL Server stored this as a single space in a varchar column ad there was no real way to differentiate between a real space (as a value) from a NULL value (stored as a space). When the storage engine was re-architected for SQL Server 7.0 the NULL block was added to store a NULL value specifically.

Q: How would the table structures differ between the 32-bit and 64-bit in SQL Server 2005 and SQL Server 2000? They don’t! There is no difference in the on-disk format between any of the versions of SQL Server. In fact, this was even true of the MIPS/Alpha/Win32 releases when SQL Server 6.0 released on all of those platforms.

Q: do you have a script to calculate rows per pages (using the formula you showed? Ha! Seemed a bit painful I know… But that formula works.But, yes – it’s a heck of a lot easier to just do this yourself if you have a table already. In SQL Server 2000 you need to use DBCC SHOWCONTIG with TABLERESULTS and in SQL Server 2005, you use a new DMV called sys.dm_db_index_physical_stats. This takes 5 parameters and has some really nice new features to it.

Q: I might have missed this, but is the SQL Server Management Studio (SSMS) the replacement for Enterprise Manager? Yes, SSMS replaces the Enterprise Manager and the Query Analyzer. And, for a more scaled down query tool, you can use Express Manager (XM). To download XM, you can go here  If you want more details regarding the latest SQL Server CTP, you can go here.

Q: Is Management Studio backward compatible for use with SQL Server 2000? Yes! You can use SSMS to connect to SQL Server 2005 and SQL Server 2000.

Q: How do we write our queries for Insert/Update/Select on a database that we have no idea how it was setup - so we don’t experience the local issue that you talked about? Basically, the best way is for the developer to create an interface that masks the complexities of the database schema through stored procedures, functions and views.

Q: Do indexes point to the row level or the page level? There are two primary parts to an index – the leaf level and the non-leaf level. The leaf-level of a clustered index IS the data (a clustered index is an ordered table).

Q: What is the best practice or rule of thumb on using identity primary keys on tables in a database? And would that be all tables and/or some tables or not all? This is a great question. However, I might phrase it a bit differently because I *think* I know where you’re going with this… Generally, I recommend that large tables have a clustered index on an identity column. This can help to improve performance of inserts by not causing splits (inserts go to the end) and by helping non-clustered indexes look up the corresponding rows in the clustered table by using a clustering key which is unique, narrow and static. This is really quite a bit related to the session on indexing… I think you should definitely attend session!

Q: How could a LOB end up in the leaf level? This is really another great question for the indexing session – and a bit on internals as well. In SQL Server 2005, rows can span pages (so you can have a row greater than 8060 bytes) and rows can have LOB types (even in the leaf level of an index). I’ll explain how this can happen with the new INCLUDE option for indexes – in the next session!

Q: Are you talking at the Vegas launch of 2005? Yes! You can see the complete list of my upcoming events here.

Comment: By the way, thanks for the great Web cast. I don't know of many other companies that do this sort of thing -- take questions directly from their customers.

Thanks! This is part of what I love about these webcasts… it allows me to hear what you’re interested in and it allows me to keep learning. To be honest, often I have to look things up and/or hit the KB and/or ask some of the other SQL geeks I know – in order to answer some of these questions. It really helps to keep me on my toes! In fact, this week I bothered two of my favorite people, my partner Patrick and my good friend Gert.

 

Thanks again for another GREAT week of questions!

kt

See you later today – Friday, Aug 26th

MSDN Webcast: A Primer to Proper SQL Server Development (Part 4 of 10): Best Practices in Indexing (Level 200) 

Two days isn't enough time!

In fact, that's really true. There are so many interesting (well, I think they're interesting) aspects to Indexing that we could in fact do a whole 10-part Series on Indexing and it's likely that we'd still have other things to look at and talk about. So, in preparation for part 4 of my MSDN Webcast Series, I thought I'd take a few minutes to pull together a number of related resources. If you watch/read them prior to the webcast then you'll be a step ahead. To minimize overlap, part 4 is a myriad of best practices with more of the "what's" and "how's" to implement the right indexes - but not quite as much of the theory and/or the "why's" as I've done quite a bit of that before and even if you don't watch/read these resources BEFORE the session...you can use these resources to fill in the gaps and answers the "why's" AFTER the session!

So, let's talk about the session... I'm going to focus on SQL Server 2005 for all of my demos (and I'm going to try to be VERY demo heavy in this one) but quite a bit of the theory also applies to SQL Server 2000...

If you're interested in some of the "why's" and/or you want to hear the information for SQL Server 2000 and/or you just want to see what other folks have asked, here are your options:

Watch the MSDN Webcast titled: SQL Server 2000: Performance Tuning - Finding the Right Balance of Indexes which covers Index Creation Best Practices on SQL Server 2000
Watch the TechNet "It's Showtime" recording, recorded at Tech Ed Europe titled: Index Creation Best Practices with SQL Server 2005 which covers Index Creation Best Practices on SQL Server 2005 and really gets into a lot of the "why's"

Read many of the past blog entries - including A LOT of Q&As created from former webcasts by accessing my "Indexes" category on my blog entries here.

So, if you're not scared to spend quite a bit of time with indexes... I think there's lots of good stuff to review! At a minimum you should think about showing up tomorrow, I promise to give you Practical Indexing Strategies and a lot of demos to prove my point!

Effectively Designing a Scalable and Reliable Database
A Database Developer’s Primer to Proper SQL Server Development
Part 4: Best Practices in Indexing

Attend Part 4 LIVE on 26 August 2005 by registering at this link!

This text is STRAIGHT from SQL Server Magazine's September 2005 Issue: Reader's Choice Awards 

A First Look at SQL Server 2005 for Developers by Bob Beauchemin, Niels Berglund, and Dan Sullivan claimed best-book honors in a highly competitive Readers' Choice category. The book shows working developers how to take full advantage of SQL Server 2005's key innovations. Readers get practical explanations of SQL Server 2005's new data model, built-in .NET hosting, SQL-99 compliance, and more. Virtually every key concept is illuminated by sample code tested with Microsoft's public beta. Key coverage includes: SQL Server 2005 as .NET runtime host, security, reliability, performance, writing procedures, functions, and triggers in .NET languages, leveraging T-SQL enhancements, the XML data type and XML query languages, SQL Server 2005 as a Web Services platform, client-side coding, ADO and ADO.NET enhancements, SQLXML, mapping, ObjectSpaces, and using SQL Server 2005's built-in application server capabilities. 

Congratulations you guys... writing a book is hard enough. Writing a good one is quite an achievement! 

Categories:
Opinions | SQL Server 2005

MSDN Webcast Q&A: A Primer to Proper SQL Server Development

Creating a Reliable and Automated Backup Strategy, Part 2 of 10

Presented by Kimberly L. Tripp, SQLskills.com

Q: Can I view a recording of this webcast? Here’s the specific replay link.

Q: Where can we get the demo scripts? The demo scripts are in this zip (20050812MSDNWebcast.zip (4.86 KB)); here in this blog entry. However, I will also create an entry www.SQLskills.com under Past Event Resources for the entire webcast series.

Q: Does Kimberly L. Tripp have a blog? Yes, www.SQLskills.com/blogs/Kimberly and if you’re reading this on my blog…well, then you already know! J

Q: Where are the links to all prior Webcast Q&As from this series? Click here for the MSDN Download for Part 1, click here. For the SQLskills Blog Entry for Part 1, click here.

Q: Is this Session 2 of the "Effectively Designing a Scalable and Reliable Database" webcast series? The Title bar here reads "MSDN Webcast: MSDN: A Primer to Proper SQL Server Development (Part...)". A bit misleading don't you think? So, I finally figured out the mystery here. When I originally submitted my proposal for this series I titled it like this: Effectively Designing a Scalable and Reliable DatabaseA Database Developer’s Primer to Proper SQL Server Development (Level 200-300) But somehow, through a few edits and since each module also has a more detailed title like the following, then I think the title just got edited down to fit. In fact, with subtitles like the following… I can’t even fit the entire title, subtitle and session title on a single slide. JPart 1: Creating a Recoverable DatabasePart 2: Creating a Reliable and Automated Backup StrategyPart 3: Designing Tables that Scale – Best Practices in Data Types and Initial Table StructuresSo, the mystery ends here… It’s all the same content and the general idea is that building a scalable and reliable database takes a variety of best practices that together make it happen; there is no “DBCC MAKEITGOFASTER(‘TRUE’)” option.

Q: WRT "best practices," what do you think of SQLBPA? It’s a great (and free J) tool that can help you to recognize some of the most commonly overlooked best practices. Part of the intent is also to help you determine if you’re violating certain practices that might impact your upgrade strategy. For example, if you’re accessing the system tables you will want change that code to use information_schema views, stored procedures (sp_*) or system functions to gather the system information. If you’re looking solely for information about your upgrade path, the BPA has been updated/replaced with a new tool called the Upgrade Advisor.

Best Practices Analyzer Tool for Microsoft SQL Server 2000 1.0 

Upgrade advisor URL - Microsoft SQL Server 2005 upgrade Advisor - CTP June version

Q: In the SBS2003 environment the default backup does a live backup using a volume shadow snapshot of the MSDE and SQL databases on the server. How bad is this practice v. doing a SQL backup and backing up the backup and the logs? Well, there are really a couple of issues here… First, the positives of controlling backups within SQL Server:(1)   You can backup databases individually(2)   With certain database backup strategies (specifically, Database/Log strategies in the FULL Recovery Model), you can recover up to a specific point in time (P.I.T. Recovery) or to a marked transaction or even up-to-the-minute (if the transaction log is still accessible at the time of the disaster.(3)   The process of backing up the transaction log helps to keep the transaction log small and manageable.Second, the negatives:(1)   You have to manage each database backup strategy individually – however, tools like the Database Maintenance Wizard can help to minimize this.(2)   You’ll need to do backups more frequently if you want to take more advantage of these features – however, this would also give you better granularity on your backups AND result in less data loss because your snapshots probably aren’t very frequent.Now – as for volume shadow snapshot – this is GREAT for a complete image of the system and it does work (especially well) when you have to replace the complete system or when you want access to specific files on the system. Where you can run into troubles (and I’m not saying that this is always going to be the case, one way or the other) is when a SQL Server database becomes damaged and you revert back to just a file (or even all of the database files) from a shadow copy image. In almost all cases (and I really can’t think of too many where this would work except in some read only cases), you won’t be successful *just* restoring an old image of just a single database file. So, the damaged database *might* end up being lost. Instead, you could restore the entire database image from the shadow copy point in time and that *might* work; however, if the database structures have changed since you backed it up then the best choice would be to drop the database, copy over the files (the mdf and ldf) and then re-attach the database. That should work and all you’ll need to do from there is add any logins and/or changes since the last backup. Where things could be a lot worse is if one of your system databases becomes corrupt. This might limit your ability to recover. Now, having said all of this – I *know* that your SQL backup strategy WILL work and I know the complete set of options related to it. While it is a bit more administrative work, if your data is critical and minimizing data loss is *very* high on your list; I would use BOTH. Volume Shadow Copy for the system and file recovery strategies and SQL Server database and log backups for your databases. OK, so I decided that I wanted to check things out a bit more…especially, since I am not all that well-versed on VSS; I decided to look up as much as I could find and well, I’m going to stop with this. First, pre-Windows 2003 sp1 it looks like even backing up a volume that includes databases that are in the FULL or BULK_LOGGED Recovery models, generates an error. As of sp1 (or the hotfix associated with this KB), the error is no longer generated BUT I did find the following paragraph very useful:

This hotfix does not let Windows Backup perform a Volume Shadow Copy service backup of a SQL Server database with a recovery model that is set to Full or Bulk-Logged. You must use the SQL Server backup procedure to back up your SQL Server databases that have a recovery model that is set to Full or Bulk-Logged. And – if you do a backup with NTBackup, it will log a message in your backup log of the following: NtBackup does not support backing up SQL databases which are configured in non-simple recovery mode.If the database has been backed up, it is likely corrupted. Please do not restore this database.The recommended way of backing up non-simple SQL database is to use the backup solution that comes with SQL Server.

So, while I’d really like to dig into this deeper – I have to use my best judgement at this point and say that you should probably work to handle backups through SQL Server.

Q: What is the default recovery model in MSDE and SQL Express? I ended up demo’ing this in the presentation. I was sure about MSDE but I wanted to do a quick demo on SQL Express. The quick check is just to check the database properties of the model database… Another option, create a new database and see what the properties are. For both MSDE and SQL Express, the default database recovery model is SIMPLE.

Q: How would you compare the native backup to products like LiteSpeed from Imceda (which is used internally at Microsoft - I'll avoid the typical dog food comments) and SQL Backup from the chaps at Red-Gate? Third-party products can offer significant additional benefits over just the native backup types – for example, encryption and compression. This can result in improved security, faster backup times and smaller backup files.  

For more details about SQL Lite Speed, please go to: http://www.imceda.com/For more details about SQL Backup, please go to: http://www.red-gate.com/

However, I would also like to make a comment about the fact that Lite Speed is used internally at Microsoft. Backup products are an interesting special case in terms of how they work with regard to SQL Server. Most backup products use the VDI interface (the Virtual Device Interface provided for SQL Server) in order to get access to essentially the same pages that SQL Server Backup does… So, in fact, using a third-party product like SQL Lite Speed, is actually still testing the VDI interface and still working SQL Server pretty well.

Q: Can you use 3-rd party backup with log shipping? Depends on whether or not they support it and/or have automated software to help you configure it. Many products do offer ways to help automate log shipping with compressed versions of your backup files… See earlier links for references to other products.

Q: How do you fix a log file that won't truncate? The best way to fix a transaction log that won’t clear:(1)   Terminate all connections to the database (best to set the database to SINGLE_USER mode)(2)   Backup the transaction log with a normal BACKUP LOG command(3)   Shrink the transaction log to the smallest size possible (DBCC SHIRNKFILE WITH TRUNCATEONLY)(4)   Alter the database to increase the transaction log to a more appropriate size with ALTER DATABASE.(5)   Return the database to MULTI_USER mode.These steps are exactly the same as the steps necessary to remove fragmentation in VLFs. The file that can help you to get more familiar with this process is located in the scripts associated with this session and is named: VLF Optimization - Northwind Walkthrough.sql.

Q: If you do the steps recommended to clean up the fragmented VLFs, does it break the continuity of the transaction log backup chain? No! You can backup the transaction log before, as part of (step 2) and after you clear up the fragmentation – and no other backups are necessary. It is a complete chain that can be used in recovery.

Q: The transaction log portion of the tempdb seems to be growing and growing- even with the simple recovery model. In fact, it only seems to release space when CHECKPOINT is run manually? Well, I have to admit that this is a bit strange and should not otherwise be the case. I would suggest seeing if you have any open transactions in the database (DBCC OPENTRAN) but that shouldn’t make a difference if it DOES clear when you execute CHECKPOINT. So, I have to admit that this is a bit bizarre. I would try to do some profiling and see if you can find any consistencies with the types of commands that are running at the time when it doesn’t seem to clear. I would also wonder if this is ALL the time or just occasionally. You may want to contact PSS for more assistance with this.

Q: TRUNCATE TABLE does not benefit from BULK_LOGGED? When I say that something benefits from the change to the BULK_LOGGED recovery model, it means that the change must offer a performance gain or otherwise. In the case of TRUNCATE TABLE, it is always executed in a manner which is logged – efficiently. So, there is no gain in switching; it is always an optimal command to execute.

Q: We have a habit of switching to "bulk-logged" mode when we are doing index -rebuilding, etc. which seem to benefit from this setting. Then, when we are done, we "switch" back to full mode. Does this affect our ability to recover later on using the logs that we are creating because of this switching back and forth? and Q: If I switch between the FULL and BULK_LOGGED recovery model, do I need to take full database backup after the switch to BULK_LOGGED? No! In fact, it’s recommended as long as you’re aware of the options that you lose when you are running in the BULK_LOGGED recovery model. However, the transaction log continuity is not broken – given the fact that you can successfully back up the log after the bulk operation completes. To minimize the time of which you are vulnerable, make sure to backup your log right before the switch (to BULK_LOGGED) as well as right after the switch (to FULL).  The file that can help you to get more familiar with this process is located in the scripts associated with this session and is named: DBAlterForBatchOperation.sql.

Q: Is it advisable to use transactions for each and every SQL statement? Even if they are a single, simple, one-command transaction? If not, how can we use the marked transaction? Well, the answer to the first part of the statements are: NO, it is not necessary and it is not advisable to wrap every command in a BEGIN TRAN/COMMIT TRAN block. However, as per the second part – you cannot mark a transaction without using the following complete syntax: BEGIN TRANSACTION TransactionName      WITH MARK 'description of mark'SQL_StatementsCOMMIT TRANThe file that can help you to get more familiar with this process is located in the scripts associated with this session and is named: Restore Log and Marked Transactions.sql.

Q: Do you dynamically create Transaction MARKs to make the mark names unique? I would assume a restore would stop at the first mark it came across with the specified name? You can certainly use dynamic string execution to build your transaction mark names dynamically. However, you can also use a mark with a datetime parameter. From the BOL: The STOPATMARK and STOPBEFOREMARK options both support an AFTER datetime clause. When used with datetime, mark names need not be unique. In a RESTORE DATABASE statement, datetime is required to recover to a marked transaction. In a RESTORE LOG, however, datetime is optional. If AFTER datetime is omitted, roll forward stops at the first mark with the specified name. If AFTER datetime is specified, roll forward stops at the first mark having the specified name exactly at or after datetime.

Q: When configuring a database maintenance plan through the Database Maintenance Plan Wizard, there is no possibility to switch between recovery models…Correct. However, you can modify the jobs created by the Database Maintenance Plans to include changes to the recovery model. BUT – having said that, I’m not sure why you’d want to change it during the actual backups… Instead, you should consider changing the recovery model as part of your batch process that would benefit from the change.

Q: How do you clear the log in SQL Server 2000 through the Enterprise Manager? The only way to clear the transaction log through the SQL Server Enterprise Manager is to do a regular transaction log backup. The way you can do this is to right-click on databases, tasks, backup - then choose a transaction log backup. The default behavior is to clear the inactive portion when the backup completes.There is no way to just “clear the transaction log” as that would break the continuity of the log. You could execute a command to clear the log but if you find that you’re manually clearing the log and NOT keeping transaction log backups, I would recommend setting the recovery model to SIMPLE so that the transaction log is cleared automatically.

Q: Can you explain when you need to recover .mdf and .ldf files? Recovering directly from the .mdf and .ldf files (through sp_attach_db) is really only guaranteed when the files were closed properly – through either a sp_detach_db or the files were copied when the server was shutdown.

Q: How do you recover if the master database becomes corrupt? Is the only option uninstalling and reinstalling SQL Server? No, SQL Server 2000 includes a utility to rebuild the master database called rebuildm. SQL Server 2005 uses setup – but with special parameters that only rebuild the system databases as opposed to do a full installation.

Q: Any BP on restores on replicated database i.e. distribution matching database? Transaction Log marking can help to ensure that multiple servers are all restored and rolled forward to the same point in time. Additionally, there are options related to whether or not you KEEP_REPLICATION (this is for the publisher). However, outside of those couple of options, you should review the Books Online for best practices in this configuration.

Q: Is it possible to backup database in the standby mode? No. But, I wasn’t completely sure during the webcast so I decided to try it. On the backup, I received:Database 'test' is in warm-standby state (set by executing RESTORE WITH STANDBY) and cannot be backed up until the entire load sequence is completed. BACKUP DATABASE is terminating abnormally.

Q: With log shipping and 2000, and the destination database is exactly the same as the source, how about backing up the destination rather than the source? What happens to the shipped logs during the backup? You can’t backup a database that is in the NORECOVERY or the STANDBY recovery completion states. As a result, you can only backup the source, not the destination.

Q: What are my risks if I use Simple Recovery and what are the advantages of it? Risks are mostly in the potential for work-loss exposure and data loss in the event of a disk failure for your database. The advantages are in simplicity and ease of management because you don’t need to backup the transaction log. However, through simple automation procedures you can automate the backups of transaction logs while getting the most options for recovery. If you want more details, you should review the chapter that’s available for download on SQLskills. This chapter covers the differences in the Recovery Models as well their pros and cons in management and recovery! Here’s a direct link: www.sqlskills.com/resources/SQLServerHAChapter9.pdf 

Q: How come you have both the SQL Server 2005 and Express on the same machine? Is it because you don't have Visual Studio 2005 installed? The documentation of VS.Net 2005 states to uncheck the SQL Server 2005 Express if also installing SQL Server 2005 CTP. Part of the reason that Visual Studio doesn’t want you to install Express if you already have the CTP of SQL Server is that they may not be the same versions. However, if you have the appropriately matched builds of SQL Server and Visual Studio, you can install multiple versions of SQL Server on the same machine- with Visual Studio Whidbey. Having said that – of what’s available today- the correct combination would be SQL Server CTP June (IDW15) with Visual Studio Beta II. On my machine, I’m actually running SQL Server 2000 (as a default instance), two Developer editions and one Express edition. AFTER SQL Server was successfully installed, I installed the appropriately matched Visual Studio installation. There won’t be any need for this once the products release in November but while they’re both being developed – and while there are “later” releases that leap-frog over each-other, it’s a bit confusing.

Q: SQL 2000 Maintenance Plans for transaction logs (say, all user databases) indicate job failure if just one user db is in the SIMPLE recovery model. Any alternatives short of creating our own backup script to avoid this SQL Maintenance backup job failure? Yes, this is because the sqlmaint.txt does not verify the recovery model.

Q: Is this a SQL 2005 behavior too? Database Maintenance Plan Wizard has been changed significantly in SQL Server 2005 and it’s now a lot easier to create more interesting/more complex plans on one or more databases – I would work harder to create a class of jobs for your SIMPLE recovery model databases (just full backups and maybe differentials) and a different class of jobs for your FULL/BULK_LOGGED recovery model databases.

Q: Does the execution of a transaction log backup impact simultaneous database usage? If well optimized the cost of a transaction log backup should be relatively low. Might be a good time to make sure that the transaction log is optimized! Go figure, I wrote another blog entry to help you optimize the transaction log. Check out 8 Steps to Better Transaction Log Throughput.

Q: Why does the log backup restore slower than restoring a full database backup? In general, a transaction log is a more intense operation. When restored, the transaction log must be redone (“redo”) and then undone (“undo”). This process is what applies the changes to get you up to what the database looked like when the transaction log backup was executed. And, while each log row’s redo operation is generally much faster than the original execution (they do not have to do any calculations, etc. as the FINAL version of the changes for the row is what’s in the log), there might be a lot of them!

Q: Do I have to backup the ReportServerTempDB? I’m not a reporting services expert but my understanding is that yes, you do need to backup a ReportServerTempDB. The best thing to read is from the Books Online: Backup and Restore Operations for a Reporting Services Installation.

Q: Will the 2005 Backup wizard allow scripting of the 'scheduling'? The backup command itself will be in the scheduled job but the statement that is used to create the job can only be scripted by scripting the job – not the backup command.

Q: Did you talk about partial database availability? Yes! SQL Server 2000 does not offer this new feature, but SQL Server 2005 does. Partial database availability is a new feature that allows a damaged database to stay online – even while secondary data files are damaged and unavailable.

Q: What is an optional secondary file?An optional secondary file is a data file that is NOT the mdf. Secondary data files are created in larger databases and store user-defined data. They might be used to store different types of data and/or even possibly part of a large table (partitioning). Imagine having 100s of millions of rows – because you have three years of historical data – if year 2003 is in a secondary data file and becomes damaged, it will not require the database to be taken offline.

Q: I have not seen Management Studio. Is this new with SQL Server 2005? Yes, SQL Server Management Studio (SSMS) replaces both Query Analyzer and SQL Server Enterprise Manager – in a more robust UI that also include Solution/Project control and Source Integration. Check out the online labs to get some hands-on lab time in the Virtual Lab environment here.

Q: A FULL BACKUP has the option "remove inactive entries from transaction log" --- why doesn't it seem to do this? It would seem from this option that nightly full database backups would keep the log size small? This is a SQL Server Enterprise Manager bug. It should NOT even be an option on that tab. It should be grayed out. If you click around to differential and then back to full on the General Tab, before you go to the Options tab, then you will no longer see that option available. As far as this being something desired – not really… The reason why clearing the log is NOT performed after a full backup (or as part of a full backup) is so that if the full database backup becomes corrupt in some way, you can fall back on your transaction log sequence to successfully restore! Basically, this ensures that nothing ever breaks the continuity of the transaction log!

Q: You recommend Trace Flag 3231, what does it do? Trace Flag 3231 is an undocumented Trace Flag that disables the ability to clear the transaction log in a database that is running in the FULL or BULK_LOGGED recovery models. This significantly reduces the ability for someone to break the continuity of the transaction log… ah, so that would probably be the next question – what else could possibly break the continuity of the transaction log? Changes to the SIMPLE recovery model. It is highly recommended that ONLY changes between FULL and BULK_LOGGED be performed in databases where transaction log backups are relied upon for recovery.

Q: How can we test our backup strategy? How do we test our differential backup content? When we test the differential backup content, do we need to restore the full backup content at the same time?The best test of your backup strategy is a complete restore sequence to a secondary server – which you then follow up by testing your application. There *are* other dependencies outside of the database that could cause your application to fail. However, the restore to the secondary server will ensure that at least your data is accessible. Below are some KB articles that you can review for more details on some of these additional dependencies. To access them, go to http://support.microsoft.com.  Q240872 – INF: How to Resolve Permission Issues When a Database is Moved Between SQL ServersQ246133 – INF: How To Transfer Logins and Passwords Between SQL ServersQ307775 – INF: Disaster Recovery Articles for Microsoft SQL ServerQ224071 – INF: Moving SQL Server Databases to a New Location with Detach/Attach.

Q: I have a database server hosted by a 3rd party on the web ... is there a secure, automated way for me to make remote copies of the Database Backup? You'll need to talk to your ISP to see if they support that. More than anything it depends on the interface they give you, the services you've paid for and whether or not you have secure channels over which you can communicate with your ISP.

Q: Is there a way for you to answer these questions – that is – release the question from the question manager so that another question can be asked? I had a question in the "queue" for most of the webcast and because of that, I could not ask additional questions which I had wanted to ask... I ended up deleting my original question just so the question window would allow me to ask the next question... This is a LiveMeeting issue. I’m trying to put just a quick “a” in the response during the webcast but really, it’s an issue with the LiveMeeting Query Manager UI. I REALLY wish that questions just went into the queue and didn’t hold your window as well… I’ll see if we can pass your suggestion on to the LiveMeeting folks!

Thanks again for another GREAT week of questions. See you in the next session…which because this is posted so late, is session 4 - on Friday, August 26th. You can register for this session here: MSDN Webcast: A Primer to Proper SQL Server Development (Part 4 of 10): Best Practices in Indexing (Level 200)

Thanks for reading!

kt

Wow, I've been horribly bad at blogging these days and I owe you a few entries as well... But, good news, I have a new MSDN Webcast series that will help to answer a lot of best practices in database design strategies that can help you achieve better scalability, availability and reliability. All of my demos are on SQL Server 2005 but many of the concepts apply to both SQL Server 2000 as well as 2005.

The sessions are all targeted at best practices and as such we went with a base of 200 level. Knowing me I can't go through an entire session without some internals and/or technical depth so you should expect 200-300 overall. I'm very excited about this series because everything counts! What I mean is that each and every best practice you implement helps to bring about a more scalable, available and reliable solution - there are no magic bullets only great overall design can achieve these targets!

So, what are we going to talk about in 10 - 90 minute presentations..... a lot!

Here's a link to the first one and as we get rolling, I'll blog a lot more about additional resources, best practices and of course, I'll blog the webcast Q&A as I've done in the past.

Enjoy!

Today, SQLskills proudly annouces an exciting addition to our company... Bob Beauchemin - a database developer and SQL Server 2005 expert in the developer space - joins SQLskills as the Director of Developer Skills. Bob brings a tremendous amount of expertise with him in having worked with computers since 1977 and a variety of databases during this time. In the time that I have known Bob, I have been impressed with both his technical expertise as well as his ability to convey complex topics with clarity. We've got quite a few things in the works and we'll keep you posted of the changes on our blogs and our SQLskills website.

In the interim, make sure to update all of your references for Bob! If you've subscribed to his blog in the past, update your links to his new blog site:

If you want to read a bit about his past, present and future, check out our AboutBob page here: http://www.sqlskills.com/aboutbob.asp

And, if you're interested in hearing the PodCast he recorded with SQL Down Under host Greg Low, check out: http://www.sqldownunder.com/. The specific downloads for Bob's recordings are here: SDU4FullShow.mp3 or here: SDU4FullShow.wma.

Exciting times indeed. Stay tuned!

 

Categories:
Events | Opinions | SQL Server 2005

I think there are numerous reasons for why I love technology but at the top of the list: learning. It's amazing to me that not a day goes by where I don't dig deeper into something or clarify it further. Even learning something trivial, like a new keystroke, can make our work easier to do and make us more productive at it. Things are constantly changing; the one thing I do know about technology is that there's a lot to learn!

So, to start, thanks for everyone's feedback (esp. Steffen Krause here) on some of my recent posts regarding LEFT and RIGHT based partitioning functions here. There was a lot feedback regarding the simplicity in syntax and declaration of a RIGHT-based partition function and by making the first partition of a RIGHT-based partition function empty, YOU'RE RIGHT! We can eliminate the need for data movement. There's no performance difference and it doesn't matter internally which type you choose but - I've heard you all loud and clear! You don't like dealing with the imprecision of a datetime data type when specifying upper boundaries. So, having said that - I need to make a few changes. In my next revision of my presentation materials, whitepaper, scripts, etc. I'll work to give both perspectives. For some reason, I still like LEFT-based partition function but RIGHT is definitely easier to define.

So, keep it coming everyone. You've hit the nail on the head. The fun part about technology is... no one knows everything and we're all always learning!

Enjoy Tech*Ed!

Blog edits brought to you by Richard Campbell (long story)

NOTE: Generally, I recommend RIGHT-based partition function so that you don't have to deal with datetime timetick issues at all. However, this post can really help you if you still want to use LEFT-based partition functions. Enjoy!!! 

When creating partitioned tables in SQL Server 2005, a partition function requires a LEFT or RIGHT designation. In general, I recommend that you choose to create a LEFT-based partition. However, choosing LEFT always creates some confusion because a left-based partition function requires upper boundaries. This results in a more complicated partition function definition (than a RIGHT partition function).

Background First
In SQL Server 2005, you can create truly Partitioned Objects (objects are inclusive to Tables and Indexes) and to create a partitioned table you must base that table on a Partition Scheme (PS) and the PS must be based on a Partition Function (PF). Now, if you're new to the concept of partitioning, you should seriously consider reading the whitepaper I wrote for MSDN which is posted here.

Regardless of your having read the whitepaper, I want to add a few basic principles and make sure that some of the key points are clear. Simply put, a PF is the logical definition of your boundary cases (or possibly better described as the intervals over which each partition covers). A PF must cover the entire domain from negative infinity to positive infinity. If a value doesn't have a place to go then you'd have a problem with the logic of your partition function. Having said that, when you define the PF you are ONLY stating values, not where they will physically will reside. Futhermore, the expression on which the data will be based has to be somewhat straightforward and consistent. In other words, simple. If the PF were extremely complex it could compromise the usage and performance of the Partitioned Object. Finally, the definition of the PF is really just a way to easily define these boundaries. As a result, you can choose to enter a specific value (like June 1, 2005) or a function like the following to create a PF over 4 months with each month being computed from current:

CREATE PARTITION FUNCTION TestFunctionValues(datetime)
AS
RANGE RIGHT FOR VALUES (
   CONVERT(char(4), year(dateadd(m, 1, getdate()))) + right('0' + convert(char(1), month(dateadd(m, 1, getdate()))), 2) + '01'
    -- next month (executed in May, 2005 yields June 2005)
   , CONVERT(char(4), year(dateadd(m, 2, getdate()))) + right('0' + convert(char(1), month(dateadd(m, 2, getdate()))), 2) + '01'
    -- executed in May, 2005 yields July 2005
   , CONVERT(char(4), year(dateadd(m, 3, getdate()))) + right('0' + convert(char(1), month(dateadd(m, 3, getdate()))), 2) + '01'
    -- executed in May, 2005 yields August 2005
   , CONVERT(char(4), year(dateadd(m, 4, getdate()))) + right('0' + convert(char(1), month(dateadd(m, 4, getdate()))), 2) + '01'
    -- executed in May, 2005 yields September 2005
)
GO

Since all of these complex expressions are evaluated to a constant at creation, you can immediately verify your boundary cases by accessing some catalog views. Here is the catalog view query which helps you see what your partition boundaries are:

SELECT PF.[name], RV.boundary_id, RV.[value]
FROM sys.partition_range_values AS RV
 JOIN sys.partition_functions AS PF
  ON RV.function_id = PF.function_id
WHERE PF.[name] = 'TestFunctionValues'

In the above partition function, you'll notice that in addition to the values, I also have a RIGHT designation. The choice of using RIGHT, helps to target whether or not the value is a member of the first partition or the second partition. In the case of RIGHT, this says that the value is a lower boundary of the second partition. In an example that involves something like "sales" where data rows are ever increasing and their usage is for more recent rather than older data you migth want to always have the most recent four months of data available (in the real world you'd probably want more than 4 months but I'm going to simplify to four so that we have fewer boundaries with which to work). So, if we want 4 partitions - one for June, July, August and September - we need to define the partitions in the form of intervals (using boundaries). The first boundary point will either reside in the first partition (as an upper boundary) or the second partition (as a lower boundary). For four partitions, there are numerous ways in which you could define the partition function such that the logical/physical placement of the data is THE SAME however, the behavior of the partition function during later SPLIT and MERGE operations is also impacted by the choice of LEFT/RIGHT so understanding how they work is critical in making your decision to use one or the other. More than anything, will need to respect the full domain from negative infinity to positive infinity.

The following partition function is one of the easiest and most logical to understand. We need four partitions so let's just enter our boundary points and be done.

CREATE PARTITION FUNCTION RightPartitionFunction1(datetime)
AS
RANGE RIGHT FOR VALUES (
   '20040601', -- Jun 2004
   '20040701', -- Jul 2004
   '20040801', -- Aug 2004
   '20040901' -- Sep 2004
)

However, remember that we have the full domain to cover. As a result, the above partition function will create 5 partitions where the partitions will be:
 Partition 1: all data less than 20040601
 Partition 2: all data greater then/equal to 20040601 and less than 20040701
 Partition 3: all data greater then/equal to 20040701 and less than 20040801
 Partition 4: all data greater then/equal to 20040801 and less than 20040901
 Partition 5: all data greater then/equal to 20040901

This really isn't too bad but it seems to waste the first partition because it's completely empty... so, you could remove the first boundary point and just supply the second, third and forth. In that case, the first partition would cover June (instead of being empty) and you could get down to 4 partitions with only 3 boundary points.

CREATE PARTITION FUNCTION RightPartitionFunction2(datetime)
AS
RANGE RIGHT FOR VALUES (
   '20040701', -- Jul 2004
   '20040801', -- Aug 2004
   '20040901' -- Sep 2004
)

This will create 4 partitions where the partitions will be:
 Partition 1: all data less than 20040701
 Partition 2: all data greater then/equal to 20040701 and less than 20040801
 Partition 3: all data greater then/equal to 20040801 and less than 20040901
 Partition 4: all data greater then/equal to 20040901

OK, so this seems a bit more optimal. And by definition alone, I would agree with you. Where it becomes problematic is in the merge and split operations. When a partition function goes through a merge of a boundary point, that boundary point is essentially removed. This seems simple but also has an impact on the data which resides in that partition (the partition which includes that boundary point). So, if this partitioned table is NEVER going to be modified and you never need to plan for a merge or split, then you can choose whatever definition is easier for you to use. However, if you plan to do a merge or split, you need to understand the ramifications of the LEFT/RIGHT decision. If November 1 comes around and you want the data for October to be added to your partitioned table and you ONLY want to show the most four months, then you will need to add October and remove June. The process will essentially consist of the following:

  1. Create a place for new data (Oct) to reside
  2. Add this location to the partition scheme (more details on the scheme v. the function shortly)
  3. Split the last boundary to add the new month (Oct)
  4. Switch-in the staging table of October data so it becomes part of the partitioned table
  5. Create a staging table into which the old data (Jun) will go
  6. Switch-out the June data into the staging table
  7. Merge the boundary point to remove June from the table

More background first - Partition Schemes
You will want to have a place to put the new data (there are lots of things we could look at here... do you want to ALWAYS only use 4 filegroups and when you remove June's data do you want to reuse that space - or do you have a new location on which October can be placed). This is a whole different topic over which we should chat... but for simplicity let's just say that we want to use a new location. To tell the PF where this location is, you need to work with the physical definition of the partition function. OK, that might not be clear. So far, I've only told you about the PF and so far, I've only said that the PF described the logical boudaries. You may have been wondering how SQL Server maps these logical boundaries to the phyiscal files and the answer is through the Partition Scheme. The partition scheme (at initial creation) sets up the mapping of the boundaries to their phyiscal locations (through filegroups). For the first and second PFs created above the respective partition schemes would be:

CREATE PARTITION SCHEME [PartitionScheme1]
AS
PARTITION [RightPartitionFunction1] TO
  ( [Primary]    -- should be empty?
  , [PartitionedTablesFG1]  -- for June
  , [PartitionedTablesFG2]  -- for July
  , [PartitionedTablesFG3]  -- for Aug
  , [PartitionedTablesFG4]  -- for Sept
  )
and/or

CREATE PARTITION SCHEME [PartitionScheme2]
AS
PARTITION [RightPartitionFunction2] TO
  ( [PartitionedTablesFG1]  -- for June
  , [PartitionedTablesFG2]  -- for July
  , [PartitionedTablesFG3]  -- for Aug
  , [PartitionedTablesFG4]) -- for Sept

In both cases, it seems as though these do the same thing with the exception that the first PF ends up with an extra partition which is essentially empty. In this case (in a RIGHT-based PF), the second is a better choice (coming up there will be a reason for why we create an empty partition).

Back to SPLIT and MERGE
OK, so back to the issue of spliting and merging. Our goal is to remove June and add October so that the "current" data shows the four months of July, Aug, Sept and Oct. If we want to create a location on which October can be placed, we need to add another filegroup to the scheme. The way we do this is that we first need to add the filegroup/file to the database and then we need to add this new filegroup to our partition scheme.

Step 1 - Create a place for new data (Oct) to reside
Create a new filegroup and new file.

ALTER DATABASE CreditPT
 ADD FILEGROUP PartitionedTablesFG5
GO

ALTER DATABASE CreditPT
 ADD FILE
  (NAME = N'CreditPTFG5File1',
  FILENAME = N'C:\SQLTemp\CreditPTFG5File1.ndf',
  SIZE = 30, FILEGROWTH = 10, MAXSIZE = 50)
  TO FILEGROUP [PartitionedTablesFG5]
GO

Step 2 - Add this location to the partition scheme
The way to do this is to add a "NEXT USED" filegroup. Syntax:

ALTER PARTITION SCHEME PartitionScheme2 NEXT USED [PartitionedTablesFG5]
GO

Once the partition scheme has a new filegroup to use (and you can only have one at a time - note that if you add a second it solely replaces the first and does not warn you that you already had a next used), then the next time a partition is needed (which will be on the next split) then this "next used" partition will be used.

Step 3 - Split the last boundary to add the new month (Oct)
Now that we have a filegroup AND we have told the partition scheme to use it for the next split operation, we can split the PS to include this 5th filegroup

ALTER PARTITION FUNCTION RightPartitionFunction2()
SPLIT RANGE ('20041001')
GO

Step 4 - Switch-in the staging table of October data so it becomes part of the partitioned table
This step really needs more to it. To switch in your staging table you need to already have a staging table. To make this an optimal operation, you MUST create your staging table on the same fielgroup on which the partition will reside. What you're trying to do is make the switch optimal by making it a metadata only operation. This step really consists of creating an identical table with identical indexes and creating it with a constraint that matches the partitions' definition (so that SQL Server can trust that it only includes data within the interval for that partition).

CREATE TABLE ChargesPTRightOct
(  charge_no  int   NOT NULL IDENTITY,
 member_no  int   NOT NULL
      CONSTRAINT ChargesPTRightOctMemberNoFK
       REFERENCES dbo.Member(Member_No),
 provider_no  int   NOT NULL
      CONSTRAINT ChargesPTRightOctProviderNoFK
       REFERENCES dbo.Provider(Provider_No),
 category_no  int   NOT NULL
      CONSTRAINT ChargesPTRightOctCategoryNoFK
       REFERENCES dbo.Category(Category_No),
 charge_dt  datetime  NOT NULL
      CONSTRAINT ChargesPTRightOctChargeDtCK
       CHECK (Charge_dt >= '20041001'
         AND Charge_dt < '20041101'),
 charge_amt  money  NOT NULL,
 statement_no int   NOT NULL,
 charge_code  char(2)  NOT NULL
) ON [PartitionedTablesFG5]
GO

ALTER TABLE ChargesPTRightOct
ADD CONSTRAINT ChargesPTRightOctPK
  PRIMARY KEY CLUSTERED (charge_dt, charge_no)
--   ON Credit4MonthPFN (charge_dt)
GO

INSERT ChargesPTRightOct (member_no, provider_no, category_no
      , charge_dt, charge_amt
      , statement_no, charge_code)
 SELECT member_no, provider_no, category_no
   , dateadd(yy, 5, charge_dt), (charge_amt + charge_no)/10
   , statement_no, charge_code
 FROM CreditPT.dbo.Charge
 WHERE month(charge_dt) IN (10)
 ORDER BY charge_dt, charge_no
GO

ALTER TABLE ChargesPTRightOct
SWITCH TO ChargesPTRight PARTITION 5
GO

Step 5 - Create a staging table into which the old data (Jun) will go
Similarly, you will want to create a place for the June data to go - kind of like a “staging” table but more for the exit of the table.

CREATE TABLE ChargesPTRightJun
(  charge_no  int   NOT NULL IDENTITY,
 member_no  int   NOT NULL
      CONSTRAINT ChargesPTRightJunMemberNoFK
       REFERENCES dbo.Member(Member_No),
 provider_no  int   NOT NULL
      CONSTRAINT ChargesPTRightJunProviderNoFK
       REFERENCES dbo.Provider(Provider_No),
 category_no  int   NOT NULL
      CONSTRAINT ChargesPTRightJunCategoryNoFK
       REFERENCES dbo.Category(Category_No),
 charge_dt  datetime  NOT NULL,
 charge_amt  money  NOT NULL,
 statement_no int   NOT NULL,
 charge_code  char(2)  NOT NULL
) ON [PartitionedTablesFG1]
GO

ALTER TABLE ChargesPTRightJun
ADD CONSTRAINT ChargesPTRightJunPK
  PRIMARY KEY CLUSTERED (charge_dt, charge_no)
--   ON Credit4MonthPFN (charge_dt)
GO

Step 6 - Switch-out the June data into the staging table
This must also be a metadata only step. You need to make sure that it's created on the same filegroup as where the data currently resides (see the ON [ParitionedTablesFG1] clause. This is the current location of this partition. As a result, the switch out will be very fast.

ALTER TABLE ChargesPTRight
SWITCH PARTITION 1
TO ChargesPTRightJun
GO

Step 7 - Merge the boundary point to remove June from the table
This last step is the whole reason for this blog entry... yes, we're getting there ;).

ALTER PARTITION FUNCTION RightPartitionFunction2()
MERGE RANGE ('20040701')
GO

When you merge this boundary point, you will remove the value AND the parition in which it resides. Since this boundary point resides in the second partition (PartitionedTablesFG2) then it's actually PartitionedTablesFG2 that gets removed. As a result ALL of the data that resides on it, must be moved to the filegroup that now covers this boundary (which is PartitionedTablesFG1 and which was just emptied). As a result, MERGE does NOT optimally merge with a RIGHT-based partition on the rolling range scenario.

And - as a result, I recommend LEFT-based partition functions... Since the whitepaper tends to focus on using LEFT-based parition functions, please review that for more details. You can access the “Paritioned Tables” whitepaper on MSDN, here.

Hope this helps and thanks for reading!
kt

When I first heard about "stored procedures being written in C#, VB.Net, ... etc." I remember thinking; Is Transact-SQL dead? Am I out of a job? What does this mean for me and for DBAs in general? Luckily, I learned quickly that not only is Transact-SQL not dead but it's got some significant improvements that might make code written in client applications A LOT easier - for example:
    1.      New syntax to traverse hierarchical/recursive relationships,
    2.      Extended XML integration,
    3.      Much improved error handling,
    4.      Easier in-database pivoting of data

I was worried that the CLR would cause problems for the stability of the database, acting like badly written extended stored procedures which can leak memory, crashing the server or even worse, potentially become a vector for worms and viruses. However as I learned more about the technology, I think that Microsoft has put in the work to give the DBA control over this technology. SQL Server controls the memory used by the CLR, there is a fully integrated security model and the DBA has control over what the CLR code can do and can even turn off the CLR completely on systems that will not use it. In fact, I also learned that the SQLCLR was going to be "off by default" as well. And, for me, as well as many DBAs, I think these points have squashed many of our fears. But then I almost felt like I was missing something? If things didn't need to change then why are they changing?

So, then I just wanted to know more about why the extension was being made to begin with and when and why it might be useful. From considering that point alone came the most important thing I've learned about SQL Server 2005; there are many new ways of doing things: Transact-SQL; SQLCLR; XML; Full-Text; Integration Services; Analysis Services; and all of them have different benefits and implementation effects. DBAs need to learn when to choose each of the technologies and even more importantly when their use is not appropriate, even when they're not an expert in all of them (and I truly think it will be a challenge to be an expert in every area of SQL Server 2005).

SQL Server 2005's feature set has been expanded such that C# or VB.NET can natively run in the server. This gives you immediate access to the power of the .NET Framework and helps to allow better integration with existing utility components and web services already in place. Some of the greatest benefits are that it allows you to place complex computations on the server - close to the data - without having to send large amounts of data to a separate application server. And all of this is done while SQL Server manages memory and resources ensuring a more balanced server system. So, my main recommendation for DBAs who are skeptical is to not worry about knowing everything about SQLCLR but instead focus on why it might be proposed and determine if SQLCLR is really the best way to handle it!

I’m still convinced that Transact-SQL wins when interacting with SQL Server using set-based SQL commands, but if you have a stored procedure with a cursor it may be a candidate for the CLR, if you have code that loads all the data in a table to perform special aggregations it may be a candidate for the CLR, as might your table valued functions (as the CLR does a really optimized job here). To sum up: the DBA has to get a grip of these new technologies and not let the development community overturn years of best practices in database system design – the same principles apply, there is just a richer toolset that, when used wisely, should improve performance and developer productivity.

Categories:
Opinions | SQL Server 2005

About one month ago, I purchased two new laptops (don't even go there on why I purchased two; however, it's sufficient to say that I travel a lot and I believe in redundancy...my machines are usually setup so that either machine can do demos, etc. Yes, road warrior HA :) but heavy as heck and the airlines HATE me. Again, don't go there). Anyway, the machines are Compaq Presario R3000Z latops with AMD Athlon 3700 64bit chips, 2GB of memory (yeah!), widescreen and a few other gadgets (like DVDRW, 64B NVIDIA GeForce). Overall, I found the prices MUCH better than the equivalently equiped 32bit machines. But - before I get to why I chose 64bit let me say a few things about a couple of the features:

What about the widescreen... I'm kind of on the fence about this one. I certainly like the fullsize keyboard but when it comes to presentations (considering presentations are rarely widescreen), this means the far left and right edges are black bars and I can't really get the same large square 1600x1200 that I used to get on my old high res laptops. This runs 1680x1050 - which is very nice but again, a bit weird when I present. In fact, it took me a couple of presentations to really figure out how to configure the screen to not just display on a little tiny 1024x768 window centered on this big screen (wow did that stink). If you get one of these make sure you figure out how to project the primary image with “aspect ratio scaling” turned on. This makes a big difference if you're displaying 1024x768 on a high res monitor... Having said all of that however, MOVIES are great and certain applications can VERY nicely use this extra space like Outlook with preview pane, SQL Server Management Studio (SSMS), Visual Studio (VS), etc... So, not really completely on the fence but what I get used to doing with tools (like VS and SSMS) look very different when I switch to 1078x768 so that throws me off a bit when I'm presenting.

What about the touchpad... well, I was used to the track point (I think that's what it is called) which is the little tiny mouse that's the size of an eraser and sits in the middle of the keyboard (I think between HJ) and I have to admit... I'm still not used to the touchpad. On my old machine I used to always turn off the touchpad and now I'm finding that I'm almost useless without an external mouse. OK, this will just take me time. I'm told that I will eventually LOVE the touchpad but.... well, I'm just not there yet :)

And finally, what about the chip... am I running an x64 OS... Well, not yet. I have to admit that so far, I've stuck with the shipping OS which is Windows XP SP2 and so far things have been running well.

However, THIS week I'm going to install the RC2 x64 release of Windows and so far, I haven't decided if I should go with x64 XP or x64 2003. I'll certainly let you know. BUT - part of why I'm writing this blog is... Will your application run on x64? Well, that I don't know either. I'll be maily running SQL Server 2005 and in this press release SQL announced support of the AMD64 chip. What I plan to do is blog a bit about what I will gain in doing this on the desktop as well as why you might consider 64bit - either now or in the future for your machines - personal or enterprise server (and there are clearly quite a few reasons to do this). In the interim, you might want to check out the SQL Server site for 64bit computing here.

In the interim - you should DEFINITELY check out the new Windows Server Division Weblog and if you're an ISV, I want your application to work (and you should too :). For all ISVs, you should get involved and even let the Windows Server team help promote your app. They're looking for information about your application in this blog entry titled: Will your ISV App support Windows Server 2003 x64?

Feel free to post comments. What will I gain? What will I lose? Is there anything I should be aware of? Do any of you have this machine? Stay tuned!

Categories:
64bit Computing | Opinions

In the words of Stephen Forte, “Bid early, bid often.”

I am constantly amazed at the technical breadth and depth of the team of Microsoft Regional Directors... Roughly 150 people from every corner of the world who not only work in technology but also love it (well, most of the time :)...  I've been honored to be a member of this community for the past two years and they never cease to impress me. This blog entry only takes my ever-increasing respect for this group up a notch (especially for having thought about it and inspired us to join in)!

Fellow RD Stephen Forte, working with the Data Farm guru Julia Lerman, asked a group of RDs to donate 1 hour of their time - to be auctioned off on EBay - with all proceeds going to help fund relief efforts in the Aceh Province of Indonesia. 30 RDs have and I'm truly excited to be a part of it.

So, bid early, bid often. The auction runs through the 2nd of February 2005 at 6PM PST.

Categories:
Events | Opinions

This is an invitation given by Microsoft to leading edge application developers to participate in a research survey that seeks feedback on improving the application development experience working with data platforms. In the best interest of developers, the objectives of this survey are:

  • To understand the existing, as well as upcoming challenges that has the potential of hindering optimal performance and productivity of those who develop applications and software components.
  • To direct software manufacturers and its partners toward building powerful products that enable greater success among software development professionals.
  • Determine the levels of priorities around improvements in data handling from the view point of developers.

This survey should take 10-15 minutes at the maximum. We encourage you to be as honest and candid with your feedback, and to reflect the current state of your experience today on the primary database and development tools you use.

Click here to enter the survey site and see if you qualify for the profile of respondents we seek to complete this survey. Great prizes will be raffled off among the first 500 respondents who meet the target profile and complete the survey.

To forward this survey request to someone else, click here.

Categories:
Opinions

During the week of October 11, I delivered a four day “immersion” event where we focused on nothing but performance tuning - from design to indexing to optimizing procedural code. There were many states represented (CA, CT, FL, GA, IL, IN, KY, MA, MI, MS, NE, NY, OH, OR, TX, UT, WA, and WI) and it led to some great conversations - even outside of SQL! For the attendees, we had long days with lots of content (the comprehensive agenda is here) and evenings in a great city (yes, Chicago is my home town). And - to end the four-day event and kick off Bill's Friday event, I just couldn't let a group of so many visitors leave Chicago without having Chicago Pizza. On Thursday night we hit Pizzeria Due:

Not everyone was able to make it for dinner but we did happen to get a nice group shot here:

Finally, a few of the attendees were able to play “tourist” for a few minutes (I don't know how they had the energy!) and here are a couple of pictures I want to share with everyone. These next two Chicago pictures were taken by Ken (he's a New Yorker who was in town for the event - but we won't hold that against him :) :) :). In the first image, Ken took nine images facing North from the John Hancock building and in the second photograph there are five images facing South from the Hancock. Both of these are the result of “stitching” the images back together. Very cool Ken - THANKS!

Facing North from the John Hancock Building:

Facing Sorth from the John Hancock Building:

A fun event, a great group and a wonderful way to kickoff of our SQL Immersion event. Subscribe on SQLskills if you'd like to hear more about our upcoming SQL Immersion events planned for 2005 as well as those planned on SQL Server 2005.

Categories:
Events | Opinions

In a few past posts, a couple of other friends/RDs have given me geeky sql titles... such as sql hera (which Forte started here, referenced here and resulted in this) and the original culprit (from Clemens) started it all here....... major harrassment ensued and my irritation was subdued solely by being worn down.

Anyway, after months of avoiding the titles, I have decided to give up the fight and take advantage of a couple of birthday presents. Yes, I now own a couple of new domains. If you can guess what they are then you can get to this blog faster... they solely redirect you to here.

I hope all is well!

Categories:
Opinions

The week began with my playing shuttle bus, hotel rest stop and tour guide - and loving it... Friends from around the globe were landing in Seattle and so I was off to the airport (Monday) to greet them and get them over to the hotel. I live in Redmond and I own an SUV - a perfect combination for becoming Kimberly tour bus. I picked up Clemens Vasters, Hans Verbeeck and Andreas Eide and since the weather forecast said it was going to be a crummy week I insisted we head to Maggies Bluff for lunch. Maggies Bluff is the lowkey downstairs bar/restaurant which shares a kitchen with Palisades. Outdoor seating, plentiful parking, spectacular views of the city and a killer openface crab sandwich made it a must start to their stay in Seattle. And - to make it even more important, the weather was unexpectedly perfect... The forecast was for 60 and rain (of course) and it was 80 and sunny (go figure). But - this is about right for Seattle weather prediction. After lunch I dropped the guys off at the hotel and worked for a bit before I was off to the airport again - to pick up Goksin Bakir and Christian Weyer. Arriving just about dinner time we headed back to the hotel (without Christian's luggage as his luggage didn't make the connection ;) to hook up with a few folks for dinner. Off to the Bellevue Tap House for dinner and a relatively early night because tour guide Kimberly was soon to arrive.

However, tour guide Kimberly was foiled by the weather. Unfortunately, they were correct for the rest of the week and that messed up my plans for Tuesday... Originally, my plans were these:

Pick-up at the Hyatt at 8am sharp!

Quick breakfast stop at Gilbert's Bagels and grab bagels for the road...

Make the Edmonds/Kingston ferry at 9:20am

Drive out to Hurricane Ridge - should be there by 11:30-12. It's a beautiful stop with amazing views!

Go back through Sequim to the Three Crabs restaurant for lunch.

Head to Port Townsend to make the 2:15 the 3:00 or 3:45pm ferry to Keystone (on Whidbey Island) (this is where we can also decide to just head back via Edmonds - if we're running late?), stop at a Whidbey Winery if time permits and head up to Deception Pass

Drive back down I-5 from there to make the evening reception by 6:30ish

NOTE: If anyone ends up following this itinerary it will probably be a later return than 6:30 - I think this would be a great itinerary if you had a really full day and a full tank of gas... certainly let me know if the timings work! If you can extend the day a bit then you can probably stop at a winery after the ferry to Whidbey Island and then have dinner up in Anacortes, etc. It's a long day but a really great one (I would think) if the weather is good and you're not super rushed to be back early.

Anyway, with the Olypmics (the mountains!) socked in by clouds and the driving to be dulled by rain we decided to go a different route... We went for a pick up at 9, Gilbert's for a more relaxed breakfast and then we were off to the Museum of Flight (as suggested by Clemens, good idea CV). In my car for breakfast was Christian Weyer, Benjamin Mitchell, Clemens Vasters, Andreas Eide, Richard Campbell (aka Toy Boy) and the still blogless Goksin Bakir. We lost Benjamin and Andreas to Microsoft for the day but the rest of us were off to see Airforce One, Concorde, the Red Barn, and the new WWII exhibit. Here are a few interesting links for Concorde info:

Concorde on loan to Seattle Museum of Flight
Concorde Trivia from the BBC
British Airways Concorde Page

It was raining a bit so as we waited to get on Concorde, we stood underneath it. The photographer gets no glory but here's the a nice shot of the motley crew MotleyCrewUnderConcorde.JPG (261.05 KB). And here, well I'm glad Christian has a day job as he could never be a flight attendant ChristianFlightAttendant.JPG (101.28 KB)!

As we walked from Concorde back to the Museum of Flight I thought of the Boeing Plant tour in Everett Washington. I've always wanted to do this... so I called information to get the number for Boeing in Everett. After a few minutes of searching, the operator tells me there's nothing listed for BoWing in Everett. After spelling BoEing TWICE - we got the correct number.... hmmmmmm (that's all I'm going to say). Anyway, it was a go! This is the home to the 747, 767, and the 777. We drove up to Everett to make the 3pm tour (FYI - you cannot make reservations for same day tours so you might want to pre-order your tickets as they often sell out). I think we were all amazed at how large (and sadly, how empty) the facilities are. You can hear the statistics and sense that it's HUGE but it was another thing to be there. The doors were the size of 6 football fields.... I have many hopes that Boeing's 7E7 brings them back to filling every assembly spot!

After the tour we stopped for quick snack at the Keg and then headed back for registration and the first opening evening party at the Rock Bottom Brewery in Bellevue. With the intense three days that followed it was a nice relaxing way to start the event.

Thursday evening after the event ended a few of us ended up in the Hyatt's Hotel Bar. This photo (ThursNightCheers.JPG (93.35 KB)) is thanks to Richard Turner - who has challenged me to a game of chicken (this is from an email and well, I'm not going to explain). And, I'm looking forward to it. ;)

Since I'm leaving the technology stuff for later I'm going to skip to the closing event... Friday night boating on Lake Washington/Lake Union with DJ host Eric Schmidt EricSchmidtDJ.JPG (151.26 KB), formerly of SQL and now a Microsoft Technical Evangelist who hosted the Connected System's Track. Dancing got a bit out of control (mostly with Forte, Sanjay, and Goksin - Forte blogged about it here) but at least the boat didn't sink. Earlier, rumors spread that we were stopping at Bill's house for dinner (which is on Lake Washington) but, oh darn, we just passed by.

So, the event ended and it was an intense week of technology and socializing... Oh, and if you're wondering what the ECC stands for... it was the tla that was given to the event - as it was a bit intense (to say the least). Just to give you an idea of the hours - Wednesday ran from 8:30am until 9:30pm, Thursday ran from 8:00am until 9:00pm and Friday ran from 8:00 until 4:45 with bus pickups at 5:15 for the evening entertainment. We had 1 hr for lunch and 1hr for dinner and only a couple of breaks throughout the day. People who traveled to the event stayed in the hotel, ate in the hotel, and barely even saw the outside for 3 days..... hence the tla. You can probably figure it out here!

Thanks for a great week everyone!

Categories:
Events | Opinions

Well, I've not been 100% for the past couple of weeks - been a bit under the weather. I'm behind (go figure) and I have a lot of catching up to do! Thought I'd give you a heads up on what's to come and what I've done recently (well, before I seemingly dropped off the blogging planet).

On July 29, I did a .NET Rocks interview with Carl and Rory... I was nervous and unsure of the abuse that might be waiting for me on the show but it was nothing nearly as bad as my imagination had made it. I had a blast. In fact, we ended up running over and the show hit 2.5 hours. It was a great time! For those of you that were there you had GREAT questions and a very active chat. One person said that half the show was in the chat and well, there were some crazy/fun comments there for sure! Anyway, I'm looking forward to doing it again if they'll have me back. Listen to the show from Franklins.Net here or from MSDN here.  

On July 30, I delivered a lecture that was one of the shortest lectures I've ever given yet one of the most stressful. In fact, I started writing about it almost immediately after it finished but I never got to posting it. Here's a bit of inight into my 15 minute demo:

As I start to relax from what I would consider a very stressful day, I think about why it was so stressful?! I give presentations to sometimes extremely large audiences... but today was different! Today, I gave a presentation at Microsoft. Now this is something I do quite often as well (been delivering internal education courses for the past 9 years on campus) however, today was still different. First, I was the only external speaker and second, the audience was the Business Intelligence division of the SQL Server Development team. It all started about a month prior when I was asked to create a demo for the "All Hands" meeting. The BI All Hands meeting is an informal meeting where the BI team gets together and does a “demo fest” (which sounded harmless). It’s partially about learning how other groups are using SQL Server related tools as well as specific case studies of customers using BI. And – it’s also about team building. Between each demo the BI team played games, answered trivia questions and then the day ended with Karaoke. I have to admit – I’m glad everyone is technology focused in their careers! J J J

So, my demo was to be only 15 minutes and I was to demo my favorite features from the SQL Server 2005 tool set. This actually sounded like fun, I agreed.

For my "15 minutes of fame," I chose to demo: SQL Profiler (with Performance Monitor integration/correlation), SQLCMD, SQLiMail and SQL Management Studio. I had no slides and did nothing but one long, integrated demo using all of these tools. The demo actually went well – only one minor network glitch but even that only lasted a few seconds. Overall, it should not have been that stressful?! But there were two stressful factors: one, it was the SQL Server Development Team (an audience that’s already somewhat familiar with the product <g>) and two, there was a minor schedule adjustment only one week prior to the event. It wasn’t a huge schedule adjustment, just a simple flip-flop of a couple of sessions. However, the effect was that my demo would precede Paul Flessner’s chat. So, if you don’t know? Paul is a Senior Vice President of Microsoft and former General Manager of the SQL Team. Which, even this isn’t the worst thing that could happen BUT as luck would have it, we were already running late when I went on and then (of course) I had to make Paul wait. (I guess it was probably the wise thing for them to do - make the external speaker make Paul wait... I see how this works! J J). Seriously though, it was great fun. The day ended with some dancing (I was threatened not to show this evidence), some karaoke (and I don't really want to show this <G>) and some freedom. Finally, we’re able to start talking about SQL Server 2005; the NDA is gone, SQL Server 2005 beta II has shipped and now it’s time to have a lot of fun!!!

So – I’m writing this blog entry just to say hi... to tell you that I’m still around and that I’m still just as excited about SQL Server as I ever was. I’ve updated a few things on SQLskills and I plan to catch up on a lot of my outstanding Q&A within the next couple of days. Additionally, I plan to start blogging about SQL Server 2005. So, thanks for checking in with me and stay tuned there's a lot more good stuff yet to come! (well, I hope you like it!)

Hope you’re having a great summer (or winter J) and that you and your family are happy, healthy and together!

Categories:
Events | Opinions

These pictures are amazing so I thought everyone would enjoy this entry but it's mostly to say congrats to Kevin and Stephen for summitting another mountain...again. Stephen has blogged about climbing here and Everest here and as for Kevin - he is still blogless. However, since I was guilted into blogging by Clemens, I'm going to pass on the guilt to Kevin...

But a HUGE CONGRATS is definitely due!

CONGRATS guys!!!

The rest of this message is really from Kevin - with a bit of his climbing history (and pictures) as well.

July 2004
Bagged Rainier yesterday and brought the new SQL Mobile banner to the top! Below is the true summit of Mt. Rainer (14.4ft) with Mt. Adams, Mt. Baker and Mt. St. Helens in the background.

June 2004
In front of Denali (Mt. McKinley) at base camp!

September 2003
Kevin and Dennis Comfort at Everest Base Camp (17.5k ft) with Khumbu Ice Fall (far left) and Nuptse peak (far right) in the back ground.  Everest is hidden behind Nuptse peak.

Kevin with Banner at Kala Patthar.  Mt Everest is peak on the left.  Nuptse is peak to the right.

Kevin at the REAL top of Kala Patthar (not some whimpy second peak) at 18.2k FT!  Pumo Ri peak is to the right of him.

Altimeter reading at peak top of Kala Patthar..

Categories:
Opinions

So, this weekend my partner went a bid mad. He decided that we needed a holiday with the kids and his parents... So off we went leaving Thursday afternoon and back on Sunday. Most hotels had free high speed internet access so I was safe...

Day 1
Drive up to Anacortes via Whidbey Island. Take the Mulkiteo-Clinton Ferry over to Whidbey Island for a lazy island drive through Whidbey Island. Stop by Greenbank Farm or Whidbey Island Winery to try some of the local wines. I happen to love the local Loganberry Liquor especially over vanilla ice cream. Continue up Whidbey and stop for a walk across Deception Pass. End the day with dinner in Anacortes and then find a nice place to stay.

Day 2
Early rise for the 8:15 ferry to Victoria (you MUST have a reservation). The ferry lands in Victoria at 10:20 and by 11:15 we were in Victoria checking in to the Fairmont Empress. Short stay arranging cameras and unpacking and off to the harbor for a seaplane tour (the Mount Olympus Glacier Tour). Wow, what a view.

Back to the hotel RIGHT away to make High Tea at the Fairmont Empress (also should make reservations far in advance).

(Side note: Can you tell my partner works on Microsoft Project - we had a very tight schedule!)

After High Tea take a lazy afternoon nap and then a dip in the pool... Find a nice place for a light dinner in Victoria.

Day 3
Sleep in a bit, go for a swim and then have the breakfast buffet at the Fairmont. After checkout, drive to the Butchart Gardens and walk off the breakfast buffet. If you're lucky you might spot a hot-rod (94.27 KB) convention for a detour and see some amazing old (and impeccably maintained) cars. Take the ferry to Vancouver and if you're feeling a bit hungry I recommend the chips and gravy (i.e. greasy fries with gravy - yes, instant heart attack and on your way to having poutine which adds cheese curds to up your heart attack chances). Head off to your hotel in Vancouver and make sure to make a reservation at Sun Sui Wah - the BEST Chinese restaurant I've ever eaten at!!! Make sure to get the Alaskan King Crab (see this BEFORE image of the 9.5 pound King Crab (165.38 KB)). Sorry, I don't have the after image as it was gone too quickly (well, I actually forgot).

Day 4
Sleep in, go for a swim, have a lazy Sunday brunch and then see Vancouver. So much to do here (Stanley Park, etc.) but we went to the annual Molson Indy races (hint: this really isn't a great place for young kids or older adults - expensive and loud - but still fun to see). Head back to Seattle the old fashioned way - drive (and make sure to enjoy the delays at the border - on Sunday - use the BLAINE crossing if you want to avoid longer delays at the Peace Arch)...

Finish out the day by blogging about it......

For a complete list of the United “Three Perfect Days” articles check out http://www.hemispheresmagazine.com/home.htm. Just think you can tack on Three Perfect Days - Seattle and Three Perfect Days - Vancouver for a week+ long holiday!

Categories:
Opinions

Well, the person then asked if we could sitdown... I said sure...

We sat down at a table and they proceeded to rummage through their bag, pull out a bunch of paper (which turned out to be a single, very, large piece of plotter paper which was folded up) and proceeded to unfold it and smooth it out on the table...

It was their ERD - yes, all 500 tables (or whatever)...

They smoothed it all out (I was getting nervous as to the length of the still pending “question” :) and then here it came.....

How would you index this?

Literally, that was it. Here's my ERD - all 500 tables... what indexes do I need.......

How do I even begin? Database design (and indexing) are NOT something with an easy and single “right” answer. In fact, one of the reasons I love working with SQL Server is that there's almost always multiple ways to handle a problem - each having different pros and cons. So - understanding more than just the data is ciritical. For example, I could have noticed a customer table in their ERD and I could have said that an index to aid in customer lookups - by name - would be a good idea. However, if they responded with “well, we don't really use the customer name - this is our demographics analysis database,” then my recommendation would have been not only useless but a negative in terms of overhead, maintenance and therefore performance.

So - what's my point - there are hundreds of things that lead up to good design, optimal indexing, and effective procedural code... a design that gets better cache utilization and one that keeps the data pages compact, logically ordered or grouped, minimizes maintenance and has efficient lookups/modifications, etc... There isn't a single right answer in design.... and I think that's good! It means that you can design for the usage patterns you anticipate (within reason).

I promise - if there were a sqlserver /faster switch or a single silver bullet - I would tell you (well, I'd have you hire me as a consultant, surf the web and IM with friends for a week, and then set the switch on Friday :)....

Seriously, how does one design for performance?

You need to know the data and more important know how it's going to be used (which I realize might NOT be perfect - in fact, I've sometimes been surprised by usages patterns once something goes into production so you might not always be right). However, the more you know about usage patterns the better you will be able to design for them and proactively reduce the possibly negative impact of them (for example data modifications can cause fragmentation...which not only creates potentially excessive freespace but also makes the modification take longer to process - and well, there's so much more to this one...).

Having said that, over my next few blog entries I'll give you a bunch of ideas and best practices for which to look...

Categories:
Design | Indexes | Opinions | Tips

I'm flying through Chicago on my way back to Seattle from my past two weeks of conferences in Europe (first the SQL Server Summer Academy in Croatia and second Microsoft Tech*Ed Europe) both of which were great...and after all the craziness I'm in Chicago this week - to see family/friends and check on things for the SQLskills SQL Immersion Event - which will be held in downtown Chicago in October. Having grown up in Chicago (and been away for 12+ years) I quickly remember what summers are like. Yesterday it was 86 with 100% humidity (at least it felt like it) and today it was 70 and breezy... But - October, that's another story! October is typically an IDEAL time.

But - having complained about the weather a bit - I can't really complain about the day! My Father is the Chicago area representative for Hunter/Dufour and works through Racine Riverside exclusively selling these wonderful yachts in the Chicagoland area. This is a job change for Dad after many years...where sailing was his main hobby/passion but certainly not his career. Anyway, that's what led me to this post - I have sailed on many “boats” growing up but it's been many years... and I don't remember boats like these; these boats are NOT racers - these are cruisers - and they have every ammenity possible (and then some - for example, I never did get time to watch the flat panel TV w/DVD, etc...)

So - the “demo” boat we took out today was a Hunter 44 Aft Cockpit (2004) and it was MORE than comfortable. Easy to sail and filled with all sorts of gadgets so that even I looked like I knew what I was doing.... The features are just amazing. You just can't NOT love an electric winch, and roller furling for BOTH the jib and the main - gotta love not having to reef the main.... Wow, what a great boat!!!! Here are a few pics of the day...

My Dad at the Helm.JPG (21.52 KB)
Nice shot of the clean lines.JPG (20.69 KB)
The city is just ahead.JPG (20.36 KB)
Kimberly at the helm.JPG (24.29 KB)

So - if you're in the general area, would love to own a beautiful cruising yacht... talk to my Dad! He won't believe you read it here!! (well, neither will I :)

What a nice way to end the 4th of July weekend! Back to work!! Index recommendations are coming tomorrow!

Categories:
Opinions

Once again, Tech*Ed Amsterdam rates up there as one of my favorite conference weeks... the fun city, the social events, and especially - the speakers and the staff - all come together to make the week not only informational but just an absolutely sleepless blast! Last year I wrote up an “event log” about hardware nightmares and knights in shining armor (the hardware tech Allan and the local Dell guy Daniele) but this year I can safely say that BOTH of my laptops worked well... OK, so this is a story in and of itself. I'll put together a hardware tangent in another blog...

But back to TE - there is so much going on at this event that I know I won't do it justice in terms of highlights... but there were some great technical sessions (I really didn't get to as many as I would have liked) and some really fun sessions (Pat Helland's Bye, Bye Mr. CIO Guy and Clemens Vasters, Rafal Lukawiecki and Pat Helland as the Nerd, the Suit and the Fortune Teller, respectively). And the code highlights, demo scripts and full blown demo applications that result from this event (or that were discussed at this event) just give everyone an amazing amount of resources from which to learn (and leverage :). I'm looking forward to seeing more about Proseware (from Clemens Vasters' session CTS308: Building Proseware, Inc. - A Non-Trivial Service-Oriented System, which was in the top 10 highest sessions... and deserved it for sure!) I'm really impressed with how much functionality is included - and the complexity of the design; this is not for the faint at heart. This one isn't available yet (but hopefully it will be soon - and on msdn, once I know I'll let you know but it was written for Microsoft as an application to demonstrate the power of SOA - Service Oriented Architecture). I guess the one take away that I have with SOA is the breakdown of complex monolithic applications into more compartmentalized, autonomous pieces allows better overall manageability and modularization by designing smaller components - i.e. services. Ok, so not entirely my area and I still have a long way to go but I definitely “get it.” Benjamin Mitchell did a nice write-up of this sesssion for those of you that want more depth - click here. Another session I enjoyed was Pat Helland's ARC302 – Metropolis: Building Applications in a Service Oriented Architecture. I know he's posted these (and a few other goodies) on www.PatHelland.com you should really check them out. Don Box did a nice job giving a lot of insight in CTS200 Service-Orientation and the Windows/.NET Developer - unfortunately (and very sarcastically) he didn't hear one of the questions from the audience (Clemens was sitting next to me and asked - is COM dead?). Finally, another great session by Steve Riley (and I finally got to see Steve speak!) was SEC285 - Defending Layer 8: How to Recognize and Combat Social Engineering. Steve's a great presenter with great stories and really gets the point across... We have a lot to watch for in Layer 8 (referring to the 8th Layer of the OSI model - Humans!). Steve focuses on security, vulnerability and how hackers can learn the most about you, your network, your systems, etc. and gives some great insights in how to prevent these things. And I totally agree - I typically focus on system availability and the barriers related to it but we have the same problem - humans. People, process and architecture - and or designing a solution that doesn't solve the right problem - are usually the biggest factors in availability - not the technology...  The most likely reason for downtime is often something caused by a human - accidental or otherwise - and we all face tremendous barriers in managing, maintaining and securing our systems to keep them available. Steve focuses on intentional hacking and malicious intent and I often focus on the “accidental data deletion” or improper resource access which lead to damaged data... Regardless, the human factor is our biggest risk. It's amazing how much we all say some of the same things - with slightly different focuses - in terms of what impacts system availability (from both a performance and availability perspective)! Anyway, well done guys!! Really loved your sessions!!!!!

And - a few thank yous are certainly necessary as well! I CANNOT thank Heidi van de Zande, Sam Craig, Joanne Benny and Tim Sneath enough. I know there are so many other people that go into making Tech*Ed successful but these four are really at the helm of putting it all together. They truly make it seem easy. Once again it was tremendously organized and pulled off without any problems (at least not from my perspective and for that - I'm always in awe!). The only problem I really had was recovering from all of the great social events! This pic (RD Group Photo 8x6.JPG (242.63 KB)) is from the Friday night speaker party - in which quite a few RDs got up for a group shot. The RDs shown are: Goskin Bakir (crouching front), Thomas Lee (moving left), Christian Nagel (far left), Ingo Rammer (not a great shot - sorry Ingo), Juval Lowy (now moving right), me, Michele Leroux Bustamante, Christian Weyer, Benjamin Mitchell, Tomislav Bronzin and Clemens Vasters. Go here if you want to read more about the RD program!

So - thanks for a GREAT week. I'm in Chicago now.... lots of technical blogging to get done.

Categories:
Events | Opinions

The new Seattle Public Library opened on May 23, 2004 and it's an interesting structure that people seem to love or hate. I've always loved architecture (and I really wanted to be an architect when I was growing up :)) so more than anything I just wanted to learn more about the structure. I didn't expect to hear all about it last night; I thought we were just going to wander...but in fact, last night's event was great! A Principal from the LMN Architects firm discussed the ideas and principals behind the library's design and I was SOLD! The concepts behind the design make a ton of sense. They created “platforms,” each with a function and each with room to grow. And that's what surprised me the most; they started talking about “fillfactor.” Ok, they didn't use that term but there it was....

One of the biggest problems facing libraries is their ever expanding collection of books... and as we know they don't just place books at the “end” of the shelves. They have to put the books into the proper section, etc. And - as sections EXPLODE (think of the size of the computer section in 1960 and think of it now) and others shrink they need to “shift” books to make room. When a library is broken into defined floors it makes it very challenging to expand/shrink. Instead, libraries tend to create “forwarding pointers” (no, they didn't use this word either)... These forward references get you to where the “overflow” books are - on another floor, in another building or in the basement. This becomes confusing and eventually a ton of movement needs to be done and/or the library just doesn't work any longer and needs to expand. 

So - when they were designing the SPL (Seattle Public Library) they created a “books spiral” which cleverly creates a rotating set of floors as if they were one really long floor just spiraling down (about the equivalent of seven city blocks)... OK, I'm doing a horrible job explaining it but the bottom line is that they currently house somewhere around 900,000 books and with this new spiral they can expand selections fairly easily and still have a uniform feel to the library (they expect to be about to hold roughly 1.45 million books). They also (and this is where the fillfactor component comes in) kept free space on the top shelves as well as the bottom shelves so that they had room to grow without a lot of restructuring.

There are so many other interesting design techniques used I've love to spend hours learning more... In fact what they did with the glass to prevent heat is so simple yet effective (the building's entire exterior “net” is made of glass). They inserted thin slices of metal at a parallel to the ground (and so that you could still see through the glass) but at an angle to the glass (because the glass pieces are angled - you'll have to look at the pictures to really see what I mean) and what happens is that the sun reflects off the metal and the library doesn't get overly heated and they still allow in light without having to go with UV film, etc.

Check out the Seattle Times for their slide gallery here and the SPL's slide gallery here. If you're in Seattle you must make a stop at the SPL. There's a lot of info about the library on the web but the best place for details and lots of photos is the main site for SPL, specifically here. Here are a few others of interest:

Seattle Public Library - Main Site
Seattle Public Library: Design is fun on a grand scale
Projects of Rem Koolhaas

Here are a few pics I took:

NeonEscalator.JPG (24.16 KB) - the colors are used to help guide people where to go... I found some of the colors a bit overwhelming (especially the neon green elevators - neon green on the inside)
ReadingArea.JPG (30.67 KB) - lots of defined spaces for reading, relaxing and having a latte (yes, there are coffee stands within the library... of course!)
ReadingAreaTopFloor.JPG (35.13 KB) - this was my favorite area. Probably the equivalent of 12 stories high... you could see Eliott Bay and the colors were just stunning!

Enjoy.

Categories:
Opinions | FILLFACTOR

Thanks for being there! If you were there then you know - it's was quite fun and you had a lot of questions - questions that I didn't see until the session ended (whooops!). My Q&A dialog box was only showing a subset of questions so I didn't know..... Well, now I do (that's for sure)! I've got a huge list that I'm working through (I did get most of them in my 20 mins of text responses) but I will post the entire written-up Q&A over the weekend - to my blog - and then to my website and to MSDN shortly thereafter. The demo scripts are already on www.SQLskills.com under Current Events (which will move within a couple of days to Past Events).

And - I guess I just wanted to say THANKS?! The great comments you posted in the Q&A (in addition to the great questions) really make my job fun! I really wouldn't do what I do if it weren't for everyone's great support!!!

So - the entire audio presentation w/powerpoint presentation, demo scripts, Q&A and [not sure if they'll do a] transcript should be posted to MSDN by mid-week next week. I'll provide all of these links as soon as they're available. AND - I think I'm going to request another webcast on Index Maintenance. Seems like that's something everyone wants to spend more time on!

Categories:
Opinions

This is pretty exciting... Today we moved the blog over from the test site (where I added all of the other entries) to SQLskills... I guess I better start blogging. In fact, I think I'll try and blog most of the questions I get in email. In fact, I got two today. Stay tuned....

Categories:
Opinions

After Clemens blogged about my not having a blog... I guess I couldn't not blog any longer. And so I begin... In the past I've written content in the form of presentations, articles or classes, etc. They would go through editors, friends, anyone who would read it. Often they'd remove my sarcasm, bad spelling and/or polically incorrect 4-lettered phrases (that's all I'll say for now :) and following all of that there was often a delay in my opinions getting to anyone... Which is probably safer for me - but what the heck, time to have fun. Thanks Clemens?!

Categories:
Opinions

Theme design by Nukeation based on Jelle Druyts