OK, I first blogged about re-writing sp_helpindex here as sp_helpindex2 (April 2008). Shortly thereafter (Aug 2008), a reader found a bug and when I went digging, so did I (a couple of others). I did a few tweaks and the version that's stuck for the past couple of years is in this blog post: Updates (fixes) to sp_helpindex2.

However, as much as I've used that - I've always been frustrated by its output. And, so, over the months+, I wrote sp_helpindex3 (this solely added the index_id as the first column and then ordered the output; I've always hated the weird way in which sp_helpindex/2 ordered the output). Then, I started writing sp_helpindex4 which listed the clustering key columns with each index so that I could more easily identify columns on large tables without a lot of scrolling. Ultimately, I didn't like this because the clustering key (or some of the columns in a multi-column clustering key - might already be in the nonclustered index and so I still had to figure things out more manually). And, so I finally sat down and created what's jokingly being called sp_helpindex8.

Geeky note: Er, you might wonder with how many geeky SQL people I've joked about sp_helpindex8 but in fact, in my last few events (both onsite with a couple of customers as well as when teaching the Indexing Strategies and Performance Tuning portion of the Microsoft Certified Masters - SQL Server course (two weeks ago)), I've actually joked - and promised - that I'd have this blogged before this particular 3 week MCM rotation ends. And, well, that's this week. So, I've actually joked with quite a few folks on this and I'm close to the end of time on blogging it. So, here I am - sitting in a hotel room in Houston, after having presented at the Houston Area User Group (7-9pm) (yes, with Paul) and after having been onsite all day with a customer - blogging this darn thing. I just have to get it out there. ;-)

Having said that - I'm pretty happy with this version but it was pretty painful to write. And, before I get into what it does - and why I wanted this version - let me explain what I didn't do and why. There are a lot of other things that could be added. And, quite a few folks have said - why don't you do this or why don't you do that... Specific suggestions have been made such as adding index usage stats (from the current state shown by the DMV) or add index operational stats (to show if an index is suffering from latch IO waits, for example) or add index physical stats (to show if an index is fragmented), etc. And while many of these suggestions would make for lots of interesting uses of this output, I've stayed away from these. Why? Some are expensive to run - such as physical stats - even a limited scan can be painful on a really large table. Or, the results might not be able to be truly trusted  - such as with index_operational_stats. Why? Because the information from that DMV might only cover a short period of time. The information help by them is cleared whenever SQL Server is restarted (or if a database is taken offline or restored, etc.). So, when were those usage or operational stats last cleared? Moreover, is this single "look" at the DMV information really the best on which to base our decisions (even just in general).

In the end, I think that physical stats, operational stats and even usage stats - should be used wisely and carefully (definitely used but not by this). In fact, we often recommend that the best way to use these is by studying them over time and then analyzing their trends not just the point in time values. And, there are LOTS of posts that can help you with understanding this information; here's just a couple to get you started:

So, having said that - I've kept my sp_helpindex8 more pure ;-). What I want to see is EXACTLY what's stored in that index. In fact, this has always been my main goal. The first version (sp_helpindex2) added INCLUDED columns (2005+) as well as filter definitions (2008) to the output. So, what was missing? The added columns that SQL Server might have to add if a nonclustered is nonunique and/or when the clustering key is nonunique. The reason this is so important to me is that there are many things coming together and recommending indexes (DTA, the missing index DMVs, application tools, etc.) and while many of these tools are solely there to "help" us get better performance and "help" us create more effective indexing strategies - these tools sometimes recommend the same index over and over again (see Paul's recent blog post: Missing index DMVs bug that could cost your sanity... about the bug in the missing index DMVs) OR the tools might recommend similar indexes to what we already have.

Additionally, the output from sp_helpindex/2 only shows the columns that are EXPLICITLY defined in the key and in the leaf (using INCLUDE) but they don't show what SQL adds for you. Most folks know that SQL Server adds the clustering key to the nonclustered indexes and when you have a simple single-column clustering key it's not difficult to know that this IS "there" in your index even though it doesn't show up. But, even with that, I think there are still a few things that are lesser known. For example, if your nonclustered index is nonunique - did you know that the clustering key is not only added to the leaf level of the index (like INCLUDE does) but that the clustering key is added into the tree (as part of the key and ordering?). And, if have one index that lists this clustering key explicitly and another that doesn't - are you actually going to be able to easily tell that these are in fact the SAME. Ultimately, I want you to be able to better recognize what's REALLY in that index. What you might end up seeing is that two indexes are more similar than you first thought and as a result - maybe you can drop one of them. Or, you might be able to combine them to reduce your overall indexes (I call this index consolidation). And, all of this becomes significantly more challenging when you have more complex clustering keys (and, yes, they do exist!). The other thing I did is delimit the columns and index names properly using brackets. This is also nice because you might have identifiers that make the output of sp_helpindex/2 hard to read. This should help. However, I do have to admit that you probably have other problems if your index/column names have commas, spaces or brackets in them. But, this should still help/work.

And, that's why I wrote "sp_helpindex8" it's a MUCH better version of sp_helpindex but it requires a few other components. And, it was a lot more challenging to write. Basically, this information is more internal to SQL Server and not directly exposed anywhere. So, we had to do a bit of digging. Paul started the process (thanks!) but, I ended up spending about 5 hours getting it exactly the way I wanted it. And, I also ended up breaking it up into two parts: one sp that gets column definitions and the other that builds the final output. And, I've actually named it with a longer name (sorry!). But, I rarely type in the name anyway. What I typically do is go to SSMS, Tools | Options, then under Environment | Keyboard, I set the keyboard shortcut for Ctrl+F1 to my sp_SQLskills_SQL2008_helpindex.

Now, when I want to see the true defintions of indexes on a table - I highlight the tablename and hit Ctrl+F1. But, if you want to rename this to sp_helpindex8 - I won't be offended. ;-)

So, how do you create this?

  1. You need to create sp_SQLskills_ExposeColsInIndexLevels (sp_SQLskills_ExposeColsInIndexLevels.sql (6.73 kb)). This is what gives us the tree/leaf definitions. And, this works for both SQL Server 2005 and SQL Server 2008.
  2. You need to create the version specific sp_helpindex8. These are named: sp_SQLskills_SQL2005_helpindex (sp_SQLskills_SQL2005_helpindex.sql (10.50 kb)) and  sp_SQLskills_SQL2008_helpindex (sp_SQLskills_SQL2008_helpindex.sql (10.73 kb)). Again, I know they're long names but using the keyboard shortcuts can help. And, if you do rename these make sure you do a replace ALL. There are multiple places where there are comments and/or statements that need to execute to get everything to work (for example the sp_MS_marksystemobject execution).

NOTE: The very first version of the 2005 script had a bug in it (thanks Calvin!). Funnily one I already knew about and thought I had caught (it was fixed in the 2008 version). Anyway, as of 3:15pm CT on May 19, it's fixed in the scripts here. There could be others but so far - it's looking good!

Enjoy!
kt

OK, I've had a bit of a tangent going on optimizing stored procedures and its all revolved around adhoc plan caching and potentially wasted cache due to the single-use plans created (see the most recent posts titled:Plan cache and optimizing for adhoc workloads and Plan cache, adhoc workloads and clearing the single-use plan cache bloat). To be more specific, these are statements that when generated (via adhoc or prepared statements) don't end up being parameterized (in general - only relatively simple statements can be effectively parameterized and when your statements complex, SQL Server won't parameterize them - this can be good and in some systems this can be terribly bad). But, now you know how bad... you know how much of your cache you're wasting but where is it going and are there any similarities to these statements? Should you (or can you?) bring them together to see if there is a commonality that can be leveraged?

In general, the answer is "it depends" (you knew that was coming!). However, I do have good news, you can definitely gauge what's actually going on and how many of the statements are in fact, similar. This might allow you to leverage another database option (forced parameterization) and/or determine that you really NEED stored procedures. However, that might end up being more complex as it will require rewriting the app (or at least some of the app). Again, the good news here is that you'll be able to tell which part of the app.

Take for example the following:

DBCC FREEPROCCACHE
go
SELECT * FROM dbo.Member WHERE Lastname = 'Tripp';
go
SELECT
* FROM dbo.Member WHERE Lastname = 'Tripped'
;
go
SELECT
* FROM dbo.Member WHERE Lastname = 'Tripper'
;
go
SELECT
* FROM dbo.Member WHERE Lastname = 'Tripps'
;
go
SELECT
* FROM dbo.Member WHERE Lastname = 'Falls'
;
go

SELECT cp.objtype, cp.cacheobjtype, cp.size_in_bytes, cp.refcounts, cp.usecounts, st.text --, *
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) AS st
WHERE cp.objtype IN ('Adhoc', 'Prepared'
)
        AND st.text LIKE '%from dbo.member%' 
        AND st.text NOT LIKE
'%SELECT cp.objecttype%'
ORDER BY cp.objtype
go

objtype       cacheobjtype          size_in_bytes   refcounts  usecounts     text
Adhoc         Compiled Plan         24576               2              1                 SELECT * FROM dbo.Member WHERE Lastname = 'Falls'; 
Adhoc         Compiled Plan         24576               2              1                 SELECT * FROM dbo.Member WHERE Lastname = 'Tripps'; 
Adhoc         Compiled Plan         24576               2              1                 SELECT * FROM dbo.Member WHERE Lastname = 'Tripper'; 
Adhoc         Compiled Plan         24576               2              1                 SELECT * FROM dbo.Member WHERE Lastname = 'Tripped'; 
Adhoc         Compiled Plan         24576               2              1                 SELECT * FROM dbo.Member WHERE Lastname = 'Tripp'; 

The first thing to notice is that each of these queries has a *different* statement in the adhoc plan cache. The second is to recognize that each of these statements take 24K in the plan cache. If we were to change to "optimize for ad hoc workloads" then things might be a bit better:

sp_configure 'optimize for ad hoc workloads', 1
go
reconfigure
go
DBCC
FREEPROCCACHE
go
SELECT * FROM dbo.Member WHERE Lastname = 'Tripp';
go
SELECT
* FROM dbo.Member WHERE Lastname = 'Tripped'
;
go
SELECT
* FROM dbo.Member WHERE Lastname = 'Tripper'
;
go
SELECT
* FROM dbo.Member WHERE Lastname = 'Tripps'
;
go
SELECT
* FROM dbo.Member WHERE Lastname = 'Falls'
;
go

SELECT cp.objtype, cp.cacheobjtype, cp.size_in_bytes, cp.refcounts, cp.usecounts, st.text --, *
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) AS st
WHERE cp.objtype IN ('Adhoc', 'Prepared'
)
        AND st.text LIKE '%from dbo.member%' 
        AND st.text NOT LIKE
'%SELECT cp.objecttype%'
ORDER BY cp.objtype
go

objtype       cacheobjtype         size_in_bytes   refcounts  usecounts     text
Adhoc         Compiled Plan Stub  320                 2              1                 SELECT * FROM dbo.Member WHERE Lastname = 'Falls'; 
Adhoc         Compiled Plan Stub  320                 2              1                 SELECT * FROM dbo.Member WHERE Lastname = 'Tripps'; 
Adhoc         Compiled Plan Stub  320                 2              1                 SELECT * FROM dbo.Member WHERE Lastname = 'Tripper'; 
Adhoc         Compiled Plan Stub  320                 2              1                 SELECT * FROM dbo.Member WHERE Lastname = 'Tripped'; 
Adhoc         Compiled Plan Stub  320                 2              1                 SELECT * FROM dbo.Member WHERE Lastname = 'Tripp'; 

But, this is not the only issue. Even though each statement will only be in the cache once and will only take 320 bytes (instead of 24K) the cumulative effect of this might be significant. So, how can you see this? If you try to aggregate over these statements then you'll only be able to use the first 40-50 characters while that works here it definitely won't work for more complex statements. Until - SQL Server 2008. In 2008, you can use the query_hash to find similar statements and their overall affect on the query cache.

SELECT qs2.query_hash AS [Query Hash]
        , SUM(qs2.size_in_bytes) AS [Total Cache Size]
        , SUM(qs2.total_worker_time)/SUM(qs2.execution_count) AS [Avg CPU Time]
        , SUM(qs2.total_elapsed_time)/SUM(qs2.execution_count) AS [Avg Duration]
        , COUNT(*) AS [Number of plans] 
        , MIN(qs2.statement_text) AS [Statement Text]
FROM (SELECT qs.*, cp.size_in_bytes, SUBSTRING(ST.text, (QS.statement_start_offset/2) + 1, ((CASE statement_end_offset WHEN -1 THEN DATALENGTH(st.text) 
                ELSE QS.statement_end_offset END - QS.statement_start_offset)/2) + 1) AS statement_text 
                FROM sys.dm_exec_cached_plans AS cp 
                        JOIN sys.dm_exec_query_stats AS qs ON cp.plan_handle = qs.plan_handle
                CROSS APPLY sys.dm_exec_sql_text(QS.sql_handle) as ST) as qs2
GROUP BY qs2.query_hash ORDER BY [Avg Duration] DESC;

This is incredibly important because this will give you the information to determine what SIMILAR query (or queries) are consuming the cache and which ones have the highest cumulative effect. And, if you see a statement that is consistent and can be optimized using indexes (and consistently using the same indexes) what you might try first is "forced parameterization." This is a DATABASE-level option (available in 2005+) that significantly improves but still doesn't guarantee that a statement will be parameterized. See the Books Online for Forced Parameterization (SQL Server 2008 Forced Parameterization and SQL Server 2005 Forced Parameterization) for the rules on when SQL Server parameterizes. However, there are many statements that will. If you have A LOT of plan cache bloat AND you think your plans might benefit from forced parameterization - this is definitely something you can try.

Again, the good news here is that the QP will try harder to parameterize but there are still many cases where it won't. In this case, SQL Server parameterizes this statement (it becomes a Prepared statement) as:

(@0 varchar(8000))select * from dbo . Member where Lastname = @0

You can see this through the following query/results:

SELECT st.text, cp.objtype, cp.cacheobjtype, cp.size_in_bytes, cp.refcounts, cp.usecounts, qp.query_plan --, *
FROM sys.dm_exec_cached_plans AS cp
        CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) AS st
        CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp
WHERE cp.objtype IN ('Adhoc', 'Prepared'
)
        AND st.text LIKE
'%member%'
ORDER BY cp.objtype
go

text objtype cacheobjtype size_in_bytes refcounts usecounts query_plan
SELECT * FROM dbo.Member WHERE Lastname = 'Falls';   Adhoc Compiled Plan 24576 2 1 <ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.1" Build="10.0.2531.0"><BatchSequence><Batch><Statements><StmtSimple StatementText="SELECT * FROM dbo.Member WHERE Lastname = 'Falls';&#xD;&#xA;" StatementId="1" StatementCompId="1" StatementType="SELECT" ParameterizedPlanHandle="0x0600050088DDF510B8C0630A000000000000000000000000" ParameterizedText="(@0 varchar(8000))select * from dbo . Member where Lastname = @0" /></Statements></Batch></BatchSequence></ShowPlanXML>
SELECT * FROM dbo.Member WHERE Lastname = 'Tripps';   Adhoc Compiled Plan 24576 2 1 <ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.1" Build="10.0.2531.0"><BatchSequence><Batch><Statements><StmtSimple StatementText="SELECT * FROM dbo.Member WHERE Lastname = 'Tripps';&#xD;&#xA;" StatementId="1" StatementCompId="1" StatementType="SELECT" ParameterizedPlanHandle="0x0600050088DDF510B8C0630A000000000000000000000000" ParameterizedText="(@0 varchar(8000))select * from dbo . Member where Lastname = @0" /></Statements></Batch></BatchSequence></ShowPlanXML>
SELECT * FROM dbo.Member WHERE Lastname = 'Tripper';   Adhoc Compiled Plan 24576 2 1 <ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.1" Build="10.0.2531.0"><BatchSequence><Batch><Statements><StmtSimple StatementText="SELECT * FROM dbo.Member WHERE Lastname = 'Tripper';&#xD;&#xA;" StatementId="1" StatementCompId="1" StatementType="SELECT" ParameterizedPlanHandle="0x0600050088DDF510B8C0630A000000000000000000000000" ParameterizedText="(@0 varchar(8000))select * from dbo . Member where Lastname = @0" /></Statements></Batch></BatchSequence></ShowPlanXML>
SELECT * FROM dbo.Member WHERE Lastname = 'Tripped';   Adhoc Compiled Plan 24576 2 1 <ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.1" Build="10.0.2531.0"><BatchSequence><Batch><Statements><StmtSimple StatementText="SELECT * FROM dbo.Member WHERE Lastname = 'Tripped';&#xD;&#xA;" StatementId="1" StatementCompId="1" StatementType="SELECT" ParameterizedPlanHandle="0x0600050088DDF510B8C0630A000000000000000000000000" ParameterizedText="(@0 varchar(8000))select * from dbo . Member where Lastname = @0" /></Statements></Batch></BatchSequence></ShowPlanXML>
SELECT * FROM dbo.Member WHERE Lastname = 'Tripp';   Adhoc Compiled Plan 24576 2 1 <ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.1" Build="10.0.2531.0"><BatchSequence><Batch><Statements><StmtSimple StatementText="SELECT * FROM dbo.Member WHERE Lastname = 'Tripp';&#xD;&#xA;" StatementId="1" StatementCompId="1" StatementType="SELECT" ParameterizedPlanHandle="0x0600050088DDF510B8C0630A000000000000000000000000" ParameterizedText="(@0 varchar(8000))select * from dbo . Member where Lastname = @0" /></Statements></Batch></BatchSequence></ShowPlanXML>
(@0 varchar(8000))select * from dbo . Member where Lastname = @0 Prepared Compiled Plan 40960 7 10 <ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.1" Build="10.0.2531.0"><BatchSequence><Batch><Statements><StmtSimple StatementText="(@0 varchar(8000))select * from dbo . Member where Lastname = @0" StatementId="1" StatementCompId="1" StatementType="SELECT" StatementSubTreeCost="0.00657038" StatementEstRows="1" StatementOptmLevel="FULL" QueryHash="0xDEF0805C7B74A31E" QueryPlanHash="0x785C8E320D853B97" StatementOptmEarlyAbortReason="GoodEnoughPlanFound"><StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" NUMERIC_ROUNDABORT="false" /><QueryPlan CachedPlanSize="24" CompileTime="1" CompileCPU="1" CompileMemory="136"><RelOp NodeId="0" PhysicalOp="Nested Loops" LogicalOp="Inner Join" EstimateRows="1" EstimateIO="0" EstimateCPU="4.18e-006" AvgRowSize="173" EstimatedTotalSubtreeCost="0.00657038" Parallel="0" EstimateRebinds="0" EstimateRewinds="0"><OutputList><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="member_no" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="lastname" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="firstname" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="middleinitial" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="street" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="city" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="state_prov" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="country" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="mail_code" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="phone_no" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="photograph" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="issue_dt" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="expr_dt" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="region_no" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="corp_no" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="prev_balance" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="curr_balance" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="member_code" /></OutputList><NestedLoops Optimized="0"><OuterReferences><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="member_no" /></OuterReferences><RelOp NodeId="1" PhysicalOp="Index Seek" LogicalOp="Index Seek" EstimateRows="1" EstimateIO="0.003125" EstimateCPU="0.0001581" AvgRowSize="20" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="10000" Parallel="0" EstimateRebinds="0" EstimateRewinds="0"><OutputList><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="member_no" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="lastname" /></OutputList><IndexScan Ordered="1" ScanDirection="FORWARD" ForcedIndex="0" ForceSeek="0" NoExpandHint="0"><DefinedValues><DefinedValue><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="member_no" /></DefinedValue><DefinedValue><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="lastname" /></DefinedValue></DefinedValues><Object Database="[credit]" Schema="[dbo]" Table="[member]" Index="[IX_Member_Lastname]" IndexKind="NonClustered" /><SeekPredicates><SeekPredicateNew><SeekKeys><Prefix ScanType="EQ"><RangeColumns><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="lastname" /></RangeColumns><RangeExpressions><ScalarOperator ScalarString="[@0]"><Identifier><ColumnReference Column="@0" /></Identifier></ScalarOperator></RangeExpressions></Prefix></SeekKeys></SeekPredicateNew></SeekPredicates></IndexScan></RelOp><RelOp NodeId="3" PhysicalOp="Clustered Index Seek" LogicalOp="Clustered Index Seek" EstimateRows="1" EstimateIO="0.003125" EstimateCPU="0.0001581" AvgRowSize="161" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="10000" Parallel="0" EstimateRebinds="0" EstimateRewinds="0"><OutputList><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="firstname" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="middleinitial" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="street" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="city" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="state_prov" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="country" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="mail_code" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="phone_no" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="photograph" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="issue_dt" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="expr_dt" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="region_no" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="corp_no" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="prev_balance" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="curr_balance" /><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="member_code" /></OutputList><IndexScan Lookup="1" Ordered="1" ScanDirection="FORWARD" ForcedIndex="0" ForceSeek="0" NoExpandHint="0"><DefinedValues><DefinedValue><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="firstname" /></DefinedValue><DefinedValue><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="middleinitial" /></DefinedValue><DefinedValue><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="street" /></DefinedValue><DefinedValue><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="city" /></DefinedValue><DefinedValue><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="state_prov" /></DefinedValue><DefinedValue><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="country" /></DefinedValue><DefinedValue><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="mail_code" /></DefinedValue><DefinedValue><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="phone_no" /></DefinedValue><DefinedValue><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="photograph" /></DefinedValue><DefinedValue><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="issue_dt" /></DefinedValue><DefinedValue><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="expr_dt" /></DefinedValue><DefinedValue><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="region_no" /></DefinedValue><DefinedValue><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="corp_no" /></DefinedValue><DefinedValue><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="prev_balance" /></DefinedValue><DefinedValue><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="curr_balance" /></DefinedValue><DefinedValue><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="member_code" /></DefinedValue></DefinedValues><Object Database="[credit]" Schema="[dbo]" Table="[member]" Index="[member_ident]" TableReferenceId="-1" IndexKind="Clustered" /><SeekPredicates><SeekPredicateNew><SeekKeys><Prefix ScanType="EQ"><RangeColumns><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="member_no" /></RangeColumns><RangeExpressions><ScalarOperator ScalarString="[credit].[dbo].[member].[member_no]"><Identifier><ColumnReference Database="[credit]" Schema="[dbo]" Table="[member]" Column="member_no" /></Identifier></ScalarOperator></RangeExpressions></Prefix></SeekKeys></SeekPredicateNew></SeekPredicates></IndexScan></RelOp></NestedLoops></RelOp><ParameterList><ColumnReference Column="@0" ParameterCompiledValue="'Tripp'" /></ParameterList></QueryPlan></StmtSimple></Statements></Batch></BatchSequence></ShowPlanXML>

Ideally, you'll end up with less wasted cache because all adhoc statements from here - will use the Prepared version of the statement. This also means that they'll run with a compiled plan (so you'll also save compile time) and - if the plans are consistent then all of this is good. However, if other values (for example 'Smith') require a completely different plan - then this could actually be worse (this is the "It Depends" part). But, if you still have statements that do have consistent execution plans (because you know the data and/or you know exactly how this particular data is being used) then EITHER sp_executesql or a stored procedure would effectively force the parameterization as well and it would say to me that you really understand your data/application. But, if you're wrong... anytime you force the parameterize (and increase re-use of a plan - you could end up forcing the re-use of a bad (and possibly REALLY, REALLY, REALLY, REALLY, REALLY, REALLY, REALLY, REALLY, REALLY bad plan) :)).

So, while you solve one problem - you could create another one (see the blog post titled: Using the OPTION (RECOMPILE) option for a statement for more information). If the plans are NOT consistent then forcing a plan (through either forced parameterization, sp_executesql OR a regular stored procedure) can result in poor performance. When writing the stored procedure (and through good testing processes), you'll be able to create better and more effective code. This is truly my preferred method. And, I'm just about to wrap up my series with roughly 3 more posts (at least that's what I currently have planned but you guys have certainly been steering me in this direction for a bit :-)).

Thanks for reading,
kt

In my post describing the new SQL Server 2008 configuration option [optimize for adhoc workloads] titled: Plan cache and optimizing for adhoc workloads, I asked for you to send me your plan cache numbers. In addition to the comments on the post, I received a lot of feedback in email (and lots of comments - and even a related blog posts). The general state of the cache for many was that only 10s of MB were wasted, for a couple it was 100s MB and for a few it was GBs of memory - completely wasted. I've taken some of the worst numbers and compiled them below:

 CacheType Total Plans  Total MBs  Avg Use Count   Total MBs - USE Count 1   Total Plans - USE Count 1  % of Total
Prepared

1,541

7,925.85

1,245

7,858.43

1,448

99.15

Adhoc

86,624

8,592.48

5

7,332.59

76,145

85.34

Prepared

148,527

7,428.82

444

5,074.25

94,851

68.30

Adhoc

63,471

4,565.00

1,328

4,161.54

50,737

91.16

Adhoc

1,358

2,704.27

3,676

2,673.25

425

98.85

Adhoc                

48,140

6,233.64

173

2,106.05

20,493

33.79

Prepared

18,639

2,590.82

127

2,027.72

13,614

78.27

Prepared

18,010

3,237.18

354

1,944.48

9,561

60.07

Adhoc

17,392

2,417.38

100

1,787.33

7,741

73.94

Prepared

6,276

1,875.71

59,370

1,560.66

4,020

83.20

Prepared

16,832

2,844.61

10,829

1,507.63

5,732

53.00

Prepared

3,075

1,681.56

46,262

1,427.33

604

84.88

Adhoc

29,047

1,923.84

39

1,362.09

22,827

70.80

Adhoc 

17,028

1,255.60

185

1,160.85

15,845

92.45

Adhoc

118,838

2,073.25

2,813

1,128.47

41,212

54.43

Adhoc                

13,895

1,300.72

11

875.45

9,351

67.31

Prepared 

8,266

476.90

31

462.82

8,031

97.05

Adhoc 

8,865

507.63

6

293.98

4,738

57.91

Adhoc 

10,066

401.54

9

282.87

5,717

70.45

Adhoc 

18,676

506.66

12,463

260.48

7,938

51.41

Adhoc 

2,113

223.62

142

207.36

2,024

92.73

Adhoc 

10,107

240.27

76

157.51

6,073

65.55

Prepared            

1,888

154.91

4

125.08

1,438

80.74

Prepared            

1,561

195.27

14,761

123.70

1,089

63.35

Prepared 

742

125.30

14,517

109.39

593

87.30

Prepared 

22,566

838.79

357

99.87

505

11.91

Adhoc   

478

109.01

2

92.62

423

84.96

Adhoc 

2,917

173.43

26,651

82.73

1,313

47.70

Adhoc 

2,906

122.81

24

77.08

1,911

62.76

Prepared 

609

98.06

106

75.59

360

77.09

Adhoc   

638

75.74

24

52.04

414

68.71

Prepared             

322

23.12

8

17.82

275

77.09

First, some of these do have the "optimize for adhoc workloads" option set. However, not too many. And, for those that do have this set, the single-use plan cache was for the amount of space taken by the compile plan stubs - not actual execution plans. Typically, these are significantly smaller (typically only a matter of bytes) rather than 10s-100s of kilobytes. So, while the amount of cache wasted is reduced to only a fraction of what it would have been without the option set, there's still the potential for a lot of cache wasted.

Another thing to notice is that the only interesting nmumbers came from these two types of cache: Adhoc and Prepared. This is to be expected - the numbers for single-use plan cache are worst for the scenarios where statements are auto parameterized and "prepared" but not often re-used (where the statements submitted vary wildly because of different client applications/executions). But, while it is expected, what can you do about this? Unfortunately, not much - without changing the client application interface to the data (meaning - use stored procedures).

However, what you might benefit from (even if you have the option set) is periodically clearing the cache. However, you don't necessarily need to clear all of the cache - you can just clear the Adhoc plan cache.

To clear just the "SQL Plans" from the plan cache, use:

DBCC FREESYSTEMCACHE('SQL Plans')

If you want to clear all of the cache, you can use:

DBCC FREEPROCCACHE

And, in SQL Server 2005/2008 there are a host of combinations that will allow you to clear as much or as little as you'd like - down to just a single plan (using the plan_handle) in SQL Server 2008. For more information, check out the BOL topic: DBCC FREEPROCCACHE.

But, if you want to be more clever about it, you might want to automate the process of clearing these 'SQL Plans' when they're wasting more than a certain percentage of the memory that SQL Server's using OR more simply, once the single-use plan cache reaches a certain size (depending on the amount of memory that you have you might choose something as low as 500MB or possibly as high as 2-3GB). To help you out, I decided to write the queries to do this and give you a few options from which to choose. And, it's definitely easier to write more interesting code in 2008 because of some of the new memory-related DMVs (specifically: sys.dm_os_sys_memory and sys.dm_os_process_memory). However, you can still do this fairly easily in both editions.

So, let me summarize:
1) If you're running SQL Server 2008 and you have cache being wasted by single-use plans, be sure to use the new "optimize for adhoc workloads."
2) If you still find that you're wasting 100s of MB or GB of cache, consider creating a job that programmatically checks cache and then clears the 'SQL Plans' from cache based on one of the options/code below.

  1.  
    1. This is probably the easiest and most simplfied option.
    2. This is probably overkill for most folks but probably won't hurt either, especially if you have a lot of other types of cache being wasted by single-use plans.
    3. This is probably the most interesting and uses some of the new DMVs to see how much of the ACTUAL working cache is going to single-use plans. However, this is only for 2008. Could I write it for 2005, yes, but it's not really worth it. I'd just go with option 1 (or possibly 2) for 2005.

1. Clearing *JUST* the 'SQL Plans' based on *just* the amount of Adhoc/Prepared single-use plans (2005/2008):

DECLARE @MB decimal(19,3)
        , @Count bigint
        , @StrMB nvarchar(20)

SELECT @MB = sum(cast((CASE WHEN usecounts = 1 AND objtype IN ('Adhoc', 'Prepared') THEN size_in_bytes ELSE 0 END) as decimal(12,2)))/1024/1024 
        , @Count = sum(CASE WHEN usecounts = 1 AND objtype IN ('Adhoc', 'Prepared') THEN 1 ELSE 0 END)
        , @StrMB = convert(nvarchar(20), @MB)
FROM sys.dm_exec_cached_plans

IF @MB > 10
        BEGIN
                DBCC FREESYSTEMCACHE('SQL Plans') 
                RAISERROR ('%s MB was allocated to single-use plan cache. Single-use plans have been cleared.', 10, 1, @StrMB)
       
END
ELSE
        BEGIN
                RAISERROR ('Only %s MB is allocated to single-use plan cache - no need to clear cache now.', 10, 1, @StrMB)
               
-- Note: this is only a warning message and not an actual error.
        END
go

2. Clearing *ALL* of your cache based on the total amount of wasted by single-use plans (2005/2008):

DECLARE @MB decimal(19,3)
        , @Count bigint
        , @StrMB nvarchar(20)

SELECT @MB = sum(cast((CASE WHEN usecounts = 1 THEN size_in_bytes ELSE 0 END) as decimal(12,2)))/1024/1024 
        , @Count = sum(CASE WHEN usecounts = 1 THEN 1 ELSE 0 END)
        , @StrMB = convert(nvarchar(20), @MB)
FROM sys.dm_exec_cached_plans

IF @MB > 1000
        DBCC FREEPROCCACHE
ELSE
        RAISERROR ('Only %s MB is allocated to single-use plan cache - no need to clear cache now.', 10, 1, @StrMB)
go

3. Stored Procedure to report/track + logic to go into a job based on percentage OR MB of wasted cache (2008 only):

This I put into a script that you can play with here (sp_SQLskills_CheckPlanCache.sql (3.67 kb)).

Enjoy!
kt

 

OK, so, this is where we're at...

When a client application makes a request to SQL Server there are a few ways in which they can do it:

  1. Submit an adhoc transact-SQL statement
    • This statement can be parameterized and the parameterized statement can be deemed:
      • Safe: These are statements where the optimizer has evaluated the plan and feels that changes to values will not affect the execution plan. And, there are quite a few rules that you must meet for this to be the case. And, as a result, this is NOT something I really rely on and/or aim for (for optimization - I'd rather use stored procedures). You can monitor this through the Perfmon counter: SQLServer:SQL Statistics (Safe Auto-Params/sec).
      • Unsafe: These are statements where the optimizer feels that subsequent values will affect the execution plan. And, for adhoc statements many will be deemed unsafe. You can monitor this through the Perfmon counter: SQLServer:SQL Statistics (Unsafe Auto-Params/sec).
    • This statement might not end up being parameterized (this is not likely and the "failed parameterized statements" count should be low - you can monitor it through the Perfmon counter: SQLServer:SQL Statistics (Failed Auto-Params/sec).
  2. Submit a statement via sp_executesql that parameterizes the statement through defined parameters - this creates a plan that is neither safe nor unsafe but is ALWAYS an exact match for subsequent executions (even when the parameter values change). This is where I generally have problems with it...
  3. Execute a stored proceudre which has one or more of the above in it and/or DSE in it. This is where I think things start to get interesting and this is really the reson for why I started this series on optimizing procedural code.

First, and foremost, I believe that stored procedures are a SIGNIFICANTLY better way to develop high-end applications that absolutely must scale. Don't get me wrong, I understand the benefits of some of the tools out there that can generate your SQL and allow more rapid development of your application. And, if the application isn't trying to handle thousands of inserts/sec then this might work out really well. However, you still have to be careful. Many of these applications (that auto-generate SQL code) rely (very heavily in some cases) on sp_executesql and this can result in poor performance. Additionally, when there's A LOT of adhoc SQL then you can end up with a lot of your cache going to "single-plan use" where quite a bit of it is wasted (NOTE: there's a SQL Server 2008 configuration option [called "optimize for adhoc workloads"] that you really need to know about IF you have a lot of adhoc statements. This can significantly reduce the amount of cache that's wasted on plans that only execute once... I'll do a post on that feature NEXT). Again, while there are quite a few benefits the performance problems can become SEVERE. OK, I've definitely said this before but why am I rehashing this?

Because this is where we're at - we've looked at some of the problems with DSE in my post titled: Little Bobby Tables, SQL Injection and EXECUTE AS. Then, we started to look at EXEC and sp_executesql - how are they different? in the second post. Finally, I started to show *statement-level* recompilation using OPTION (RECOMPILE) in the post titled: Using the OPTION (RECOMPILE) option for a statement. The real reason for why I started this series is to prove: where, why and how stored procedures really shine (we're getting close!).

Reasons to use stored procedures:

  • Centralization - Sharing common code between applications asnd only need to make your changes ONCE! And, there's no possibility of different business rules being enforced in one application and not another and/or for incorrect business rules to be "out there" in your applications.
  • Simplification - Users should not need to know your schema. They shoudn't care that you've denormalized something and they should reap the benenfits through YOUR creating an access method that is simple and works even if you end up changing back and normalizing. The database schema should be isolated from the users...
  • Change control - You should be able to change your schema without breaking applications. This about this one a bit - this is EXACTLY the tactic that SQL Server takes with it's own internal tables. We access them trhough stored procedurs, views (aka catalog views) and functions. If they need to change the underlying schema then we don't need to worry - as long as they also make the sp, function or view return the same way it always has. This has complicated things over time. First, the SQL team has bascially decided that (for backward compatiblity) the sps will never handle new features (and, yes, this is HORRIBLY annoying (and, it's the reason for why I've rewriten sp_helpindex here [and, fyi, I have an even better/newer/shinier version coming soon]). And, they've had to add functions like DATABASEPROPERTYEX for the "extended" properties that were added around recovery models in SQL Server 2000. However, as long as the funtionality of the interface stays the same then our applications don't break. That's incredibly important and probably one of my favorite reasons to create some separation between the schema and the users (and/or interface).
  • Security - This is HUGE. And, it requires an understanding of object chaining and permissions. The general idea is that when objects are all within the same schema (and therefore have the same owner) that when execution rights are given on a stored procedure the caller does not need explicit permissions to the base objects (execution is essentially granted because the ownership chain is not broken). Another way to think about it is that your granting rights to execute a "process" and one that you have control (and ownershp) over all of the object involved. So, SQL Server only requires execute rights on the sp. However, this ONLY goes for explicity statements and not statements that are built dynamically. For those, the CALLER needs explicit permission OR the procedure needs to be created with EXECUTE AS. So, if you plan to use EXECUTE AS be sure to read my post titled: Little Bobby Tables, SQL Injection and EXECUTE AS.

But, more than anything else - you CAN use them for better performance. No, it's not solely because the might have an already compiled (and therefore optimized) plan that's available at runtime (yes, that CAN help) but compilation (or recompilation) can often be a small portion of runtime compared to the execution of a bad plan. And, this is where optimizing the procedural code can help. The bad news is that you have to know what you're doing. And, I could make some arguments that SQL Server could do some things better here but what I've been doing with this series is getting you familar with the options that will finally come together in the next a post titled: Stored Procedure Performance. (NOTE: I'll update this with a corrected title/link once the post is completed. I will try to get this this within the next couple of days.)

Thanks for reading!!
kt

Last week we completed another SQL Server Magazine/SQLConnections conference and already, it's time to start planning for the Fall show. Once again, Paul and I will be managing the SQL side of the conference and we'd like to invite you to submit abstracts for sessions at the conference. The Fall 2010 SQL Connections conference will be held in Las Vegas from November 1st through 5th (2010).

The conference will focus heavily on SQL Server 2008. Abstracts are still welcome on best practices for SQL Server 2005 and how to upgrade and migrate applications from SQL Server 200x to SQL Server 2008. However, we will consider all topics - from design to performance to troubleshooting to architectures to programming. Also, we're likely to closely follow our past conference format with three primary themes/tracks: Infrastructure and Scalability, Programming, and Business Intelligence. You should aim for the intermediate to advanced audience for any SQL Server 2005 materials and for SQL Server 2008 we'll consider any level - especially depending on the newness of the release. We're also looking at creating a "Foundations" track. It's titled: .NET Foundations but I think there's much that can be said for SQL topics and/or best practices in SQL for .NET developers.

For submitting session abstracts, please use this URL: 
http://www.deeptraining.com/devconnections/abstracts

The tool will be open from now until to April 26th (yes, just one week!), after which we won't accept any abstracts - no exceptions. If you have used this site before and have forgotten your password, you can have your secret word mailed to you. It's better to do this then to create a new account.

Please keep each abstracts under 200 words and in one paragraph. Please do not use bulleted items or line breaks, and please use a spell-checker. If you have an issue with the site, please follow up with an email to paul@sqlskills.com. Comments are disabled for this post.

Please create a new speaker record (if you haven't submitted before) and submit at least 3 abstracts; however, it will help your chances of being selected if you submit 5 or 6 abstracts. Please do not submit abstracts for sessions that you have previously presented at the Fall show.

What you will get if selected:
- $500 per conference talk. (Additional compensation for pre/post conference workshops.)
- Coach airfare and hotel stay paid by the conference
- Free admission to all of the co-located conferences
- Speaker party
- The adoration of attendees
-
etc.
 
Thanks for your continued support for our SQL Connections conference and we look forward to seeing your bright and shiny *new* abstracts!

Thanks,
Kimberly L. Tripp & Paul S. Randal
SQL Connections Conference Chairs

OK, for years I've been saying that SQL Server doesn't care about the order in which you define the columns of your table because internally SQL Server will re-arrange your columns to store all of the fixed width columns first and the variable columns last. In both the fixed-width portion of the row as well as the variable-width portion of the row, the columns are defined in the order in which they are declared. So, what does matter?

It's all in the cost of the variable array's offset values. If the large majority of your NULLable records are at the end, then SQL Server doesn't need to completely populate the variable block array (which saves 2 bytes per column). If you have a table where 36 columns are NULLable and generally they are NULL, then defining those columns at the end of the row can save you space.

The following script will show you how the maximum length of the row changes based on whether or not a later column in the variable block is NOT NULL - even when most/all of the prior columns are!

CREATE TABLE RowSizeVariableBlock
(
ID
int NOT NULL identity
,
c01 char(10) NOT NULL default 'test'
,
c02 datetime2(7) NOT NULL default sysdatetime
(),
c03 char(80) NOT NULL default 'junk'
,
c04 varchar(100)
NULL,
c05 varchar(100)
NULL,
c06 varchar(100)
NULL,
c07 varchar(100)
NULL,
c08 varchar(100)
NULL,
c09 varchar(100)
NULL,
c10 varchar(100)
NULL,
c11 varchar(100)
NULL,
c12 varchar(100)
NULL,
c13 varchar(100)
NULL,
c14 varchar(100)
NULL,
c15 varchar(100)
NULL,
c16 varchar(100)
NULL,
c17 varchar(100)
NULL,
c18 varchar(100)
NULL,
c19 varchar(100)
NULL,
c20 varchar(100)
NULL,
c21 varchar(100)
NULL,
c22 varchar(100)
NULL,
c23 varchar(100)
NULL,
c24 varchar(100)
NULL,
c25 varchar(100)
NULL,
c26 varchar(100)
NULL,
c27 varchar(100)
NULL,
c28 varchar(100)
NULL,
c29 varchar(100)
NULL,
c30 varchar(100)
NULL,
c31 varchar(100)
NULL,
c32 varchar(100)
NULL,
c33 varchar(100)
NULL,
c34 varchar(100)
NULL,
c35 varchar(100)
NULL,
c36 varchar(100)
NULL,
c37 varchar(100)
NULL,
c38 varchar(100)
NULL,
c39 varchar(100)
NULL,
c40 varchar(100)
NULL
)
go

insert RowSizeVariableBlock DEFAULT VALUES
go

select * from RowSizeVariableBlock
go

select * from sys.dm_db_index_physical_stats
(db_id(), object_id('RowSizeVariableBlock'), null, null, 'detailed'
)
-- review "max" record size = 114
go

insert RowSizeVariableBlock (c01, c03, c20)
values ('med row', 'up to c20', 'test'
)
go

select * from RowSizeVariableBlock
go

select * from sys.dm_db_index_physical_stats
(db_id(), object_id('RowSizeVariableBlock'), null, null, 'detailed'
)
-- review "max" record size = 154
go

insert RowSizeVariableBlock (c01, c03, c30)
values ('med+ row', 'up to c30', 'test'
)
go

select * from RowSizeVariableBlock
go

select * from sys.dm_db_index_physical_stats
(db_id(), object_id('RowSizeVariableBlock'), null, null, 'detailed'
)
-- review "max" record size = 174
go

insert RowSizeVariableBlock (c01, c03, c40)
values ('large row', 'up to c40', 'test'
)
go

select * from RowSizeVariableBlock
go

select * from sys.dm_db_index_physical_stats
(db_id(), object_id('RowSizeVariableBlock'), null, null, 'detailed'
)
-- review "max" record size = 194
go

While there are some other optimizations at this level, most tables cannot benefit from this as the data populations aren't as predictable nor are most tables filled with so many variable-width and NULLable columns. However, if you do have this data pattern, defining these columns at the end of your table's definition - MIGHT save a tremendous amount of space, especially when this table is very large!

Paul's blogged more on these structures as well as the NULL bitmap here: http://www.sqlskills.com/BLOGS/PAUL/post/Misconceptions-around-null-bitmap-size.aspx.

Enjoy! And, thanks for reading,
kt

I've always been concerned with security and I've always stressed the importance of auditing the REAL user context not just the current user (see this post on EXECUTE AS and auditing). So, I generally try to avoid using dynamic string execution and if necessary create well tested/protected parameters (fyi - using QUOTENAME can be a fantasic solution to protectng identifiers as input parameters but it can't protect more complex strings).

Having said that, what if I'm looking at a database for the first time... just poking around trying to see if there's anything that needs further attention? I've come up with a quick query... And, while it's not going to "solve" your problem (as that's going to take some re-writing of code) or even truly verify if you're vulnerable, it gives you a "quick list" of where you should look first! If your code uses dynamic strings AND it's elevated - then start there! 

SELECT OBJECT_NAME(object_id) AS [Procedure Name],
  CASE
      WHEN sm.definition LIKE '%EXEC (%' OR sm.definition LIKE '%EXEC(%' THEN
'WARNING: code contains EXEC'
      WHEN sm.definition LIKE '%EXECUTE (%' OR sm.definition LIKE '%EXECUTE(%' THEN
'WARNING: code contains EXECUTE'
  END AS [Dynamic Strings]
,
  CASE
     
WHEN execute_as_principal_id IS NOT NULL THEN N'WARNING: EXECUTE AS ' + user_name(execute_as_principal_id
)
      ELSE
'Code to run as caller - check connection context'
  END AS [Execution Context Status]
FROM sys.sql_modules AS sm
ORDER BY [Procedure Name]

Is this enough? Anything else you'd check? What do you think?

THANKS!
kt

OK, I'll definitely take a beating from all of you for having gone so long between my survey posts and now. I won't even go into the details but between some crazy work schedules, multiple sinus problems and even migraines... well, I've been a bit behind. Let's just say that April/May were rough at best. I'm feeling better and well, now I'm trying to catch up. I had really gotten the blogging bug in March but I completely lost it in April. But, this tipping point series is in dire need of lots of explaining so I'm really hoping to get a few posts done in this area for sure!

First, I started the discussion around this in a few surveys:

Survey/Question 1

Q1 was described as this: if a table has 1 million rows at 20 rows per page (50,000 pages), at what percentage (roughly) of the data would a nonclustered index no longer be used. Blogged here. Here's what the survey said as of today:

And, for Q1 the correct result (Between 0-2% of the rows) is actually the best result (but, by no means the overwhelming majority at only 28%). However, often people just "think" the answer is very small. So... I did a few more questions/surveys. 

Survey/Question 2

Q2 was described as this: if a table has 1 million rows at 100 rows per page (10,000 pages), at what percentage (roughly) of the data would a nonclustered index no longer be used. Blogged here. Here's what the survey said as of today:

And, for Q2 the correct result (Less than .5% of the rows) is actually at a tie for the best (but, again, even a small percentage at only 22%). Again, often people just "think" the answer is very small. So... I did one more question/survey. 

Survey/Question 3

Q3 was described as this: if a table has 1 million rows at 2 rows per page (500,000 pages), at what percentage (roughly) of the data would a nonclustered index no longer be used. Blogged here. Here's what the survey said as of today:

And, for Q3 the correct result (Between 10-20% of the rows) is actually NOT the highest answer. And, this is even more convincing that there's confusion around what's going on and why.

The Tipping Point

What is the tipping point?

It's the point where the number of rows returned is "no longer selective enough". SQL Server chooses NOT to use the nonclustered index to look up the corresponding data rows and instead performs a table scan.

When does the tipping point occur?

It depends... it's MOSTLY tied to the number of pages in the table. Generally, around 30% of the number of PAGES in the table is about where the tipping point occurs. However, parallelism, some server settings (processor affinity and I/O affinity), memory and table size - all can have an impact. And, since it can vary - I typically estimate somewhere between 25% and 33% as a rough tipping point (and, you'll see from a bunch of my examples, that number is not EXACT). Then, I translate that into rows.

Math for Tipping Point Query 3: If a table has 500,000 pages then 25% = 125,000 and 33% = 166,000. So, somewhere between 125,000 and 166,000 ROWS the query will tip. Turning that into a percentage 125,000/1million = 12.5% and 166,000/1million = 16.6%. So, if a table has 500,000 pages (and 1 million rows) then queries that return less than 12.5% of the data are likely to USE the nonclustered index to lookup the data and queries over 16.6% of the data are LIKELY to use a table scan. For this table, that percentage seems "reasonable". But, most of us say that the tipping point happens at a much lower percentage... why? Because row size - which determines table size (and therefore pages) is really what has the greatest impact. So, let's look at Tipping Point Query 2... 

Math for Tipping Point Query 2: If a table has 10,000 pages then 25% = 2,500 and 33% = 3,333. So, somewhere between 2,500 and 3,333 ROWS the query will tip. Turning that into a percentage 2,500/1million = .25% and 3,333/1million = .33% (not even 1%). So, if a table has only 10,000 pages (and 1 million rows) then queries that return less than a quarter of 1% of the data are likely to USE the nonclustered index to lookup the data and queries over one third of one percent are LIKELY to use a table scan. For this table, that percentage seems really low BUT, at the same time it makes sense (to a point) that a small table would be scanned... but, for less than 1%. 1% is NOT selective enough. For small tables, it might not matter all that much (they're small, they fit in cache, etc.) but for bigger tables - it might be a big performance problem. 

Math for Tipping Point Query 1: If a table has 50,000 pages then 25% = 12,500 and 33% = 16,666. So, somewhere between 12,500 and 16,666 ROWS the query will tip. Turning that into a percentage 12,500/1million = 1.25% and 16,666/1million = 1.66% (under 2%). So, if a table has 50,000 pages (and 1 million rows) then queries that return less than 1.25% of the data are likely to USE the nonclustered index to lookup the data and queries over 1.66% are LIKELY to use a table scan. Again, this seems like a low number. Again, for small tables, it might not matter all that much (they're small, they fit in cache, etc.) but as tables get larger and larger - it CAN be a big performance problem. 

Why is the tipping point interesting?

  • It shows that narrow (non-covering) nonclustered indexes have fewer uses than often expected (just because a query has a column in the WHERE clause doesn't mean that SQL Server's going to use that index)
  • It happens at a point that's typically MUCH earlier than expected... and, in fact, sometimes this is a VERY bad thing!
  • Only nonclustered indexes that do not cover a query have a tipping point. Covering indexes don't have this same issue (which further proves why they're so important for performance tuning)
  • You might find larger tables/queries performing table scans when in fact, it might be better to use a nonclustered index. How do you know, how do you test, how do you hint and/or force... and, is that a good thing?

Real example of an interesting tipping point

Earlier today, I went on facebook and twitter and gave the following information - very vaguely - and I asked "why" is Q2 so much slower than Q1 if Q2 returns only 10 more rows. Same table and no hints (other than MAXDOP)...

Q1: SELECT * FROM table WHERE colx < 597420 OPTION (MAXDOP 1)

  • returns 197,419 rows
  • takes 116,031 ms (1 minute, 52 seconds)
  • 1,197,700 logical reads, 5 physical reads, 137,861 read-ahead reads
  • 7,562 ms CPU time

    Q2: SELECT * FROM table WHERE colx < 597430 OPTION (MAXDOP 1)

  • returns 197,429 rows
  • takes 244,094 ms (4 minutes, 4 seconds)
  • 801,685 logical reads, 1410 physical reads, 801,678 read-ahead reads
  • 9,188 ms CPU time

There were lots of great guesses... but, it's the tipping point. SQL Server chose to "tip" the second query because it was "over the line". But, it's important to realize that there are cases when that's NOT a good idea. And, what are your options?

In SQL Server 2005 - the only option is to force the nonclustered index to be used:

Q2: SELECT * FROM table WITH (INDEX (NCInd)) WHERE colx < 597430 OPTION (MAXDOP 1)

But, this can be TERRIBLY bad on some machines where the IOs could be a lot faster (and where data might already be in cache). These specific numbers are exactly that - specific to this HARDWARE (and, I chose not-so-optimal HW in this case to highlight this problem). And, depending on what number you use (what if this is a parameter in sps?) you might force SQL Server to do WAY more IOs by forcing the index than allowing the tipping point to do its job. But, depending on your hardware (and/or what you know to be in cache at the time of execution), it might be better to force an index instead of letting SQL Server choose. So, should I force the index? Be careful, if you're wrong - it could take more time and actually be slower.

In SQL Server 2008 - there's a new hint - FORCESEEK:

Q2: SELECT * FROM table WITH (INDEX (FORCESEEK)) WHERE colx < 597430 OPTION (MAXDOP 1)

FORCESEEK is better because it doesn't tie you to a particular index directly but it also doesn't let SQL Server tip to a table scan. However, just like forcing an index - you can be wrong!

So, what should you do? It depends. If you know your data well and you do some extensive testing you might consider using a hint (there are some clever things you can do programmatically in sps, I'll try and dedicate a post to this soon). However, a much better choice (if at all possible) is to consider covering (that's really my main point :). In my queries, covering is unrealistic because my queries want all columns (the evil SELECT *) but, if your queries are narrower AND they are high-priority, you are better off with a covering index (in many cases) over a hint because an index which covers a query, never tips.

That's the answer to the puzzle for now but there's definitely a lot more to dive into. The Tipping Point can be a very good thing - and it usually works well. But, if you're finding that you can force an index and get better performance you might want to do some investigating and see if it's this. Then consider how likely a hint is to help and now you know where you can focus.

Thanks for reading,
kt

A couple of weeks ago I wrote a blog post titled Whose job is it anyway? It's an interesting debate and something I've been hearing more and more - that SQL Server is a "set it and forget it" technology - a black box where you just don't need to know how it works to do well with it. In fact, I've even had a few folks comment that they think it would be better to "roll their own" database rather than have to learn how to work in a "general purpose" database. And, while there are certainly lots of different angle to this debate - one fact remains... if you don't know anything about the database on which you're developing (whether it's SQL Server, mySQL, Oracle, whatever), I *PROMISE* you won't have a truly scalable, optimal solution. Why do you think there are so many knobs? It's because there are so many different ways to work with data. There is more than one way to query, more than one way to design. This is also why every answer to a "how should I do this" question starts with "It depends". And, while that seems like a scary response it's actually a good one. It means that you have lots of options - options that can offer many different pros/cons. And, as a result of knowing these pros/cons, you can make better decisions - decisions that will ultimately determine how well you can scale.

So..... while I don't think this debate will EVER be finished (as to WHOSE job it is to know these things), I do think a lot of folks are seeing the effects of not knowing more about their store (and, again, this is NOT limited to SQL Server in any way, shape or form).

At a minimum, hear the discussion on RunAsRadio with Richard, Greg and I and let us know what you think!

Kim Tripp on the Roles of Developers and DBAs with the Database!

Cheers,
kt

I started the series here: http://www.sqlskills.com/BLOGS/KIMBERLY/post/Spring-cleaning-your-indexes-Part-I.aspx and I want to continue with Part II today by clarifying some great questions/comments that have come up on the series. In Part III, I'll give you a few more ways to get rid of (or consolidate) indexes. And, I think there's still a bit more that Paul and I will investigate further (wrt to operational stats) but, I want to address a few comments and a few interesting things that both Paul and I have found.

In the Part I post, I talked about using sys.dm_db_index_usage_stats to see if there are any indexes that just aren't being used at all... A few comments asked why I didn't use operational_stats instead. To address that first, there are a few key differences:

  • dm_db_index_operational_stats is persisted only as long as an object is in cache (however, it's not cleared when objects are forced out of cache with DBCC DROPCLEANBUFFERS). If you want to clear ALL DMVs for a specific database, then a relatively easy way to do this (IN TESTING) is to take the database offline and then immediately bring it online again.
    • ALTER DATABASE <dbname> SET OFFLINE
    • ALTER DATABASE <dbname> SET ONLINE
      • NOTE: If there are any suspect files, you will NOT be able to bring your database back online without FIRST taking all suspect files OFFLINE. And, if you take a FILE offline then it's even more important to know that THERE IS NO WAY TO BRING A FILE ONLINE without restoring it from backups. So, it's VERY important to understand that OFFLINE/ONLINE for a database is really easy IF AND ONLY IF there are no other problems with the DB. You really need to resolve those problems first (or at least know that you're going to need to resolve those problems later through backup/restore) before you take a database offline.
  • dm_db_index_operational_stats is (from BOL) neither persistent nor transactionally consistent. This means you cannot use these counters to determine whether an index has been used or not, or when the index was last used. For information about this, see sys.dm_db_index_usage_stats.

Having said that though, none of these are really any guarantee of perfect information. And, they're not meant to be. I look at these DMVs as being a quick and easy way to get some relatively descent insight into what is or is not happening in my environement. However, even though the sys.dm_db_index_operational_stats might give you insight that you have a problem it still doesn't give you good insight into exactly what that problem might be. For example, it *might* be splits that causes some of your wait times to increase (column: page_io_latch_wait_in_ms) but, it could be something else too (some other system issue).

The main point, you can use these to get insight into which tables have the biggest problems (i.e. the biggest waits) and where they might have a lot of splits (column:nonleaf_allocation_count) but, in all honesty, that's not a guarantee. In fact, the reason I said "might" is that pages that are allocated at the end of the leaf level STILL allocate a page and require an entry to be made in the next level up in the index. So, a lot of nonleaf_allocations COULD be for a perfectly unfragmented index. So, it still doesn't tell you how fragmented the objects are or what the REAL problem is (or even if it is a problem yet).

Basically, these just tell you where you have the most activity and give you a starting point for problem solving. But, none of these (usage or operational) really tell you how to solve the problem. However, sometimes even knowing where to start IS the problem in and of itself. So, I'm not against these DMVs and I really do think you can get some good insight from them. Just use them as a tool to help focus your investigations. Use better tools like sys.dm_db_index_physical_stats to really see if you have fragmentation and where it's the worst.

hth,
kt

First and foremost, happy spring! I truly hope we're on the path to summer (although who would know it here - we're in Florida for SQL Connections and the weather is a bit chilly and it's been raining off/on today - I hope this is short term (no, I don't want to look at the forecast as I don't want to jinx it :)). But, wherever you are - I hope you're on your way to nicer weather and minimal cold weather (ok, I guess I only have warm wishes for the northern hem... for you southies - I hope your fall is lovely!!).

But, for everyone - now's a good time to start thinking about cleaning out the [non-literal index] closet... and getting rid of some of those dusty indexes?

Why/when should you get rid of some of your indexes?

  1. It's possible that some of them aren't being used at all...
  2. Especially when they're not being used but even when they're "redundant" (or minimally useful) they're still costing you in many ways:
    1. Wasting space on disk
    2. Wasting space in memory (well, if they are being used then they're cluttering up your cache)
    3. Wasting space/time in your maintenance routines (so, here they're cluttering up your cache for sure!)
    4. Wasting space in your backups
  3. You might be able to reduce your overall indexes with index consolidation...

So, for this post, I'm going to target #1 - are there any indexes that just aren't being used at all...

First, how do you know if your indexes are being used?
In SQL Server 2005 and higher, there's a DMV (dynamic management view) called sys.dm_db_index_usage_stats and it's there to track index usage patterns. However, it's not persisted since the beginning of time and as a result, if you look at this and believe that it's telling you ALL of the indexes that have been used in your database - then you might be mistaken. The index usage stats DMV is cleared when SQL Server is restarted as well as when you detach/attach the database or when you backup/restore the database.

Therefore, you don't want to just run the following query and drop all of the indexes that aren't being used. A better way to "trust" this information is to periodically persist the data from the DMV in your own table and then query it after you've completed a business cycle's worth of activity - logging all of the usage stats. Then you can trust this much more. Again, here are a couple of negatives:
1) it's not persisted
2) it only keeps the database_id and the index_id (which could change over time). You're right in thinking it probably shouldn't change but, a nonclustered index's ID is not permanent so, it's better to track the index name in addition to the index_id. And, when you run your queries to determine what to delete, you can easily verify indexes against the current indexes because your comparison is within the same db (more on this below).

If you want to persist this, then you have two ways to do this:

Store the index usage patterns in a table within the specific database you're tracking:

Pro: it goes with the database when you back it up, etc. and, it's easier to reverse engineer which actual indexes you're referring to (grabbing the names and not just the IDs).

Con: it's a bit more complex of a query to run and you'll need to run it for all of your databases (ok, it's really not all that bad - but, using something like sp_msForEachDB will really help)

Store the index usage patterns from all databases in a table within master or your own "performance database":

Pro: you only need one job to handle all the index usage info AND object_name *does* support TWO parameters (object_id and database_id) so, as long as you trap the name at the time of insert then you'll be good.

Con: it doesn't go with the database (e.g. backup/restore - and if you're restoring to a test system and you want to see what the usage patterns were then you'll need to get this information as well...)

Here's a simple query that you can run that shows all the indexes used right now - and adds the databasename/objectname into the results - in a persisted table you'll also want to add the runtime:

SELECT getdate() AS RunTime
, DB_NAME(i.database_id) as DatabaseName
, OBJECT_NAME(i.object_id, i.database_id) as ObjectName
, *
FROM sys.dm_db_index_usage_stats AS i
WHERE object_id > 100

And, if you want to get a few more insghts into how to persist this on Paul's blog here: http://www.sqlskills.com/BLOGS/PAUL/post/Indexes-From-Every-Angle-How-can-you-tell-if-an-index-is-being-used.aspx.

OK, so, you have a few options to think about and I have a few more parts to post!
kt

PS - The Tipping Point is coming soon too. I'm still adding a few things to that one!!! ;-)

This is a tough topic. It's a big topic and more than any other - I think there are a lot of misunderstandings about what the log is for, why it's so critical and ESPECIALLY when/why it gets extrememly large. Simply put, it gets large when it's not managed correctly. OK, there are times when it can become large - even if it is well managed. But, more often than not, when a transaction log is wildly out of control (orders of magnitude larger than the data itself) it indicates a management/maintenance problem.

There are a lot of places where you can go to find out the technical details behind the transaction log but I'm going to target this blog post to the relatively straightforward easy (no, really easy!!) facts about transaction log maintenance.

What kind of transaction log management is right for YOUR database?

First and foremost, you MUST decide whether or not you need to do log backups. SQL Server *requires* you to make some form of decision. Well, I take that back. They don't tell you anywhere that you need to make this decision but the transaction log can get wildly out of control if you don't (see the next section for more details on this one :)).

Why? Transaction log backups will allow you better recovery options in the event of a disaster. If you create a good backup strategy, you should be able to recover from a disaster very close (possibly even up-to-the-minute) to the time of the disaster. Howevre, you are not required to do log backups. Instead you can do only database-level backups and recover with those. That's fine. There's really nothing wrong with that strategy. However, it does mean that you have a greater potential for data loss. Basically, if you decide that you're doing to do weekly full backups - then you need to be OK with losing everything that's happened since your last full backup. If that's OK, then performing full database backups (and never worrying about the log) is absolutely fine.

However, if you want more granular control and more recovery options (again, possibly even up-to-the-minute recovery - which is transactional recovery up to the time of the disaster), then you MUST add transaction log backups into your disaster recovery strategy.

So, make this decision FIRST:

  1. Am I OK with some data loss? (then you're probably OK with just database-level backups... but, you will need to do something else! be sure to keep reading!!!)
  2. Do I want to minimize data loss to the smallest amount possible? (then you're going to want to AUTOMATE transaction log backups)

But I didn't do anything - why is the log WAY out of control (in terms of size)?

OK, even if you consciously make the decision to ONLY do database-level backups, you are NOT DONE!!! In fact, this is actually what led me to do this post. I found these two (relatively dated but interesting nonetheless) MSDN forum discussions for TFS (Team Foundation Server) databases:

    MSDN Forum discussion "Recommended SQL Maintenance Plan": http://social.msdn.microsoft.com/forums/en-US/tfsadmin/thread/b23f7018-3eaa-4596-96e4-728b02cf6211/ 
    MSDN Forum discussion "Huge log files":
http://social.msdn.microsoft.com/forums/en-US/tfsadmin/thread/605d51f7-23fd-470c-945e-53fa7ed5aa87/

And, I know EXACTLY what happened in ALL of these cases (and MANY more... Paul and I see this ALL the time, in fact). In the "Huge log files" thread, there's a database mentioned (TfsWareHouse) with a 124MB mdf and a transaction log of 61.8GB. It didn't mention whether or not there were other data files but my guess is that there weren't. My guess is that they were completely shocked by why the data portion had grown to a size that's 510 TIMES the size of the database... The reason is actually somewhat simple (no pun intended). If you're not going to do transaction log maintenance (meaning transaction log backups), then you need to tell SQL Server that. (This is the part that's completely unexpected.)

When a database is created, SQL Server runs that database in a "pseudo simple recovery model". (Yes, I know - that didn't help.) What that means is that SQL Server automatically clears inactive records from the transaction log once it knows that it no longer needs them. It no longer needs them to be stored in the log because no one is using the log (i.e. you're not doing ANY backups). However, once you do start to do backups (and, people generally start by doing a full database backup), then SQL Server looks to your recovery model to determine what to do with log records. If the recovery model is set to full (and, yes, this is the default), then SQL Server gives you the "full feature set" with regard to backup/restore. SQL Server is expecting YOU to manage the transaction log by backing it up. Once it's backed up, SQL Server can remove the inactive records from the transaction log (and when you do a transaction log backup, it automatically clears the inactive records by default).

So, there are really two choices - and ONLY two choices here:

  1. Perform transaction log backups as part of your maintenance plan
  2. Change the recovery model to the SIMPLE recovery model so that SQL Server clears inactive transactions from the log automatically

Is there anything else to do for the transaction log? 

Yes! If you decide that you want to do transaction log backups then I would recommend a few things. I'd first recommend reading 8 Steps to Better Transaction Log Throughput and when you decide how large your transaction log needs to be, then also read Transaction Log VLFs - too many or too few?. These two posts will help you to create a more appropriately sized log as well as one that won't be prone to performance problems (such as internal VLF fragmentation).

If you want to learn more about the transaction log, I'd suggest a few of Paul's resources (it's probably because he has such a fantastic tech editor... oh, I'm asking for trouble with this comment!! ;-):

  1. Read Paul’s blog post to his TechNet article on Logging & Recovery. It’s a great article that covers a lot of different aspects of logging. He also did a great short video on why the transaction log grows wildly out of control. Here’s a link to the blog post that pulls all of the TechNet resources together: http://www.sqlskills.com/BLOGS/PAUL/post/TechNet-Magazine-feature-article-on-understanding-logging-and-recovery.aspx.
  2. Read Paul’s blog post to his TechNet article on Database Maintenance. It’s a great overview of all of these maintenance tasks and will give you a good overview of what each one does. Here’s a link to the blog post that pulls all of the TechNet resources together: http://www.sqlskills.com/BLOGS/PAUL/post/TechNet-Magazine-Effective-Database-Maintenance-article-and-August-SQL-QA-column.aspx

OK, so, I think that sums up part III. I think that's the last one in the series for now. I'll go through and explain "The Tipping Point" next. However, I was hoping for more results to my brain teasers (in those two posts)!!

Cheers,
kt

OK, it seems as though there's A LOT of confusion about what steps are required for proper database maintenance. And, it seems as though some recommendations are being given as "quick fixes" without any real recommendation for root cause analysis. I'm not saying that the generalizations are horribly wrong but in many cases they're just too broad and/or unspecific to actually be useful (and, well, in all honesty, some of them are just really bad recommendations because they’re so ambiguous). And, in my random internet trolling for the day, I found 4 different references that I want to go through (which is why this is only Part I). For this post, I’ll focus just on Sharepoint.

First, what did I see that’s motivating this post?
I found the following KB article – which was referenced by numerous sites as recommended reading. And, without knowing a lot about SQL (and, that’s NOT meant as a dig at all – it’s just that most apps that sit on SQL don’t ever even recommend that you need to know SQL and I can argue certain aspects of that point as well BUT, wrt to maintenance, it can really become a problem if you don't know a few things about these tasks), it does seem like good reading: Information about the Maintenance Plan Wizard in SQL Server 2005 and about tasks that administrators can perform against SharePoint databases

Here is the part that over-simplifies picking what maintenance tasks to run vs. what not to run:

DIRECTLY TAKEN FROM THIS KB ARTICLE IS THE FOLLOWING:

We have tested these tasks and the effects that these tasks have on database schema and performance. The following table summarizes the results of these tests.

Task                                        Safe to perform this task?
Check database Yes
Reduce a database Yes
Reorganize an index Yes
Clean up the history Yes
Update statistics Yes. However, this task is unnecessary because the SharePoint Timer service performs this task automatically.
Rebuild an index No. The task does not restore existing index options before the rebuild operation. However, you can use scripts that restore index options.
Note This problem was corrected in SQL Server 2005 Service Pack 2.

We used the following criteria to determine whether a task was safe to perform:

  • Whether the task modified the database schema from its default state
  • Whether the task decreased performanceResults may vary depending on the environment.

However, if you use the Maintenance Plan Wizard to perform the tasks that are listed in the table as "safe to perform," you are likely to experience increased performance in SQL Server 2005.

The big problem is: this is just too little information about too many VERY important tasks!

Let me break this down task by task and give you a few other places to go for more information.

Check database

The check database task refers to DBCC CHECKDB. This is definitely an important part of any maintenance plan. And, it really is a safe task to run as it’s NOT corrective by default. However, there is nothing mentioned about how this command may completely flush your buffer pool as it reads all of the pages of all of the objects it’s checking. So, this might impact performance but, of all of the tasks, this is the safest to run and it’s definitely a recommended task.

If you want to learn more about DBCC CHECKDB, check out these things: 

  1. Read Paul’s blog post to his TechNet article on Database Maintenance. It’s a great overview of all of these maintenance tasks and will give you a good overview of what each one does. Here’s a link to the blog post that pulls all of the TechNet resources together: http://www.sqlskills.com/BLOGS/PAUL/post/TechNet-Magazine-Effective-Database-Maintenance-article-and-August-SQL-QA-column.aspx  
  2. Read Paul’s blog post on Myths around causing corruption – so that you can get better insight into where/why the actual corruptions are occurring. 
  3. Finally, if you’re really interested in the internals of CHECKDB and how it works – Paul’s written a ton about it in his CHECKDB from Every Angle category. FYI, 3 of his 9 years on the SQL Server Development Team were spent writing CHECKDB and repair for SQL Server 2005 (so, he definitely knows how it works J). Here’s the link to the category: http://www.sqlskills.com/BLOGS/PAUL/category/CHECKDB-From-Every-Angle.aspx

Reduce a database 

OK, I’m sure I’ll get a lot of responses to this one but IMO, a database maintenance plan SHOULD NEVER INCLUDE A SHRINK.

Let me explain… J
To be honest, I'm not even a fan of manually running database-level shrinks (DBCC SHRINKDATABASE) either. Don't get me wrong - there are ACCEPTABLE times to shrink parts of a database but, in general, I'd recommend only using DBCC SHRINKFILE for individual file-level shrinks. I wouldn't schedule shrinks nor would I EVER turn on [the database option] autoshrink. I don't think shrinks should EVER be automated - either through the database option OR through maintenance plans.

If you need to do regular shrinks - then it's likely that you have some other problem. And, without DIRECTLY addressing this problem, you *might* be making things worse.

This is a bigger discussion and I’ve found a few other references that I want to pull together. I’ll post another post about this within the next day or so – and link to it from here BUT, for right now…Know this – free space is generally GOOD. Excessive free space has happened FOR A REASON. Maybe there’s a pattern to it but often shrinking is worse than just leaving the free space for the next data explosion (a bunch of data comes in, the database grows, the data is archived, the free space remains for the next set of data that comes in).

If you shrink the database you might make things worse by fragmenting everything. Paul’s video that goes with the TechNet article on Database Maintenance shows you the [shocking if you didn’t know this] effect of shrinking a database on indexes.

Reorganize an index, Update statistics and Rebuild an index

These need to be grouped together to start because this KB article does NOT address the impact of running these together. In fact, the problem, if you run these together – is that you MIGHT make things worse. First, let me give you an overview of each:

Reoganizing an index removes fragmentation in the largest part of an index (it’s called the leaf level of the index) and removing fragmentation in this level has the greatest (and positive) effect on range query scans and cache. So, this is really the most important type of fragmentation to remove. However, this is NOT the only way to do it… 

Rebuilding an index completely and totally removes ALL forms of fragmentation in all levels of an index; however, this is the most expensive (yet most effective) way to do it. As a result of rebuilding an index, SQL Server also updates the statistics for the indexes that were rebuilt. Therefore you do not need to update statistics OR reorganize an index if it gets rebuilt.

Updating statistics is important for query processing and optimization. The query processor uses statistics on your data to help determine how many rows will be processed by your query/statement. If SQL Server can accurately estimate the rows, then it can choose a more effective plan. However, if it doesn’t have good statistics, then it may not do as good of a job at accurately estimating rows and therefore it might not come up with as optimal of a plan. So, this is an integral part to good database health. However, some of this might be done via the database option: auto update statistics which is ON by default in SQL Server (and, YES, you should leave this on). Check out this post on: Auto update statistics and auto create statistics - should you leave them on and/or turn them on??

However, if you use a maintenance plan then I really see two problems: 

  1. You’ll end up doing maintenance on things that may not need it. The default behavior for these tasks is just to run them on the selected objects. And, since many people will choose all objects (possibly even of all databases) then you’ll probably select objects that won’t really need this as frequently as you run this maintenance plan. 
  2. You might end up running a combination of things that either – wastes cycles/CPU and a MASSIVE amount of log space (which can translate into all sorts of concerns for DR technologies like database mirroring which will need to send all log rows to the secondary server). For example, if you run ALL three of these things then they’ll have to be run in a certain order (you can change this in a maintenance plan). However, the default order is: Reorganize Index(es), Rebuid Index(es), Update Statistics. This means that the work that’s done by reorganizing is effectively wasted as the rebuild would have taken care of it AND the work that’s done for updating statistics could mean that they update statistics TWICE (during the rebuild AND after) and the end result is WORSE because the update statistics command might use a sampling mechanism to generate statistics (which can lead to LESS EFFECTIVE statistics information). However, this is ONLY if you change the wizard’s default. The default is for the updating statistics command to do a “full scan”. So, even if the statistics will end up being the same – it’s still problematic because it means that for all indexes you’ve just rebuilt – you’ve now updated their statistics TWICE.

SUMMARY

A database maintenance plan is CRITICAL for best performance (especially for databases that are prone to some of the problems corrected by these maintenance tasks (yes, you can read SHAREPOINT into that statement). Sharepoint uses GUIDs as PRIMARY KEYs (read this post to hear more about the side effects of this choice) and as a result, as clustering keys. This means that many Sharepoint tables are prone to [potentially a MASSIVE amount of] fragmentation.

You absolutely need to have a maintenance plan. But, what should it be?

My absolute preference is to NOT use the Database Maintenance Plan Wizard UNLESS you really know what you’re doing. It just doesn’t give enough prescriptive advice. And, if you just select the defaults, you will end up with an inoptimal maintenance plan.

A better approach would be to create your own maintenance plan. If you write the code yourself (or leverage one of the custom ones that are already out on the web) then you can strategically target ONLY the objects that have the warning signs and/or are out of date and you can set when to rebuild vs. when to reorganize (generally people rebuild if a table has more than 30% fragentation and they reorganize when it's less than 30%). Fragmentation is something that can be detected programmatically using the DMV: dm_db_index_physical_stats (in SQL 2005/2008) or by using DBCC SHOWCONTIG (in SQL 7.0/2000). Here are a few places to go to see the more flexible and programmatic way of rebuilding/reorganizing indexes:

Smart Indexing Part II - Conditional Rebuilding a blog post (with conditional index rebuild code) from SQLMCA Bob Duffy (a good friend who is located in Dublin, Ireland and whose wife (Carmel) just had a baby last week – congrats again Bob!! You guys are seriously outnumbered now!!!) here: http://blogs.msdn.com/boduff/archive/2007/06/08/smart-indexing-part-ii-conditional-rebuilding.aspx

Custom Index Defrag / Rebuild Procedures - a blog post with some posted code as well. http://www.sqlstuff.info/post/2008/03/Custom-Index-Defrag--Rebuild-Procedures.aspx

Rebuild and Reorganize Indexes in SQL 2005 – an article (with conditional index rebuild code) from SQL Server Central here: http://www.sqlservercentral.com/scripts/31857/  (NOTE: You will need to become a subscriber to get to this article.)

Rebuild Only the Indexes that Need Help - an article by Andrew Kelly (SQL MVP) on SQL Server Magazine here: http://www.sqlmag.com/articles/index.cfm?articleid=99019&pg=1 (NOTE: You will need to become a suscriber to get to the full text of the article.)

Or, build your own! Check out the BOL topic for the sys.dm_db_index_physical_stats for SQL 2005 here: http://msdn.microsoft.com/en-us/library/ms188917(SQL.90).aspx, Example D has sample code to help you get started! For SQL 2008 it’s here: http://msdn.microsoft.com/en-us/library/ms188917.aspx. It’s still Example D for the sample code to leverage. J

The most important thing I can tell you is that a SMALL amount of time getting familiar with what’s really happening in SQL as well as WHY it’s happening to you WILL BE A LOT MORE PRODUCTIVE then just slamming in a maintenance plan that solves some problems but probably creates others.

Hope this helps!
kt

OK, so this is interesting. I've got a few answers to my last survey (Tipping Point Query #1) and well, there's a good mix of answers (and, yes, some are correct! ;)). Be sure to go back and review that last post so that you can evaluate it and these two tipping point questions completely. So, now I want to see if people really know the basis of "the tipping point".

Try these two:

Tipping Point Query #2

Table1 (t1) has 1 million rows at 100 rows per page. The table has 10,000 pages. A nonclustered index exists (on name) but it does not cover the query. At what percentage (of the table) is this nonclustered index no longer selective enough to use:

Tipping Point Query #3

Table2 (t2) has 1 million rows at 2 rows per page. The table has 500,000 pages. A nonclustered index exists (on name) but it does not cover the query. At what percentage (of the table) is this nonclustered index no longer selective enough to use:

OK, so I'd really love to see quite a few responses to these *3* "tipping point" questions. I PROMISE to do a nice long (and detailed) post for what is the actual tipping point AND the answers to all three of these questions. I'll explain the math as well as how you can generalize "what is selective enough" so that you can better create your nonclustered indexes!!!

Thanks for reading - and responding to these brain teasers!!

Cheers,
kt

PS - It's snowing here (ah...again)... maybe I'll spend the day creating brain teasers??! Do you guys like this kind of a post? (well, I suppose you won't really know until I post the answer part of it... but, just in general??). I think it's pretty cool. But, don't worry, I won't (nor will Paul) make all of my posts surveys. But, I think this is a really good one. I'm anxious to see if the asnwers come in correctly for these two as well! Have at it!

Along the same lines of improving database design and getting better performance on SQL Server (which [IMO] DOES take an experienced SQL Server database developer - but, we'll talk more about "whose job this really is" in many more posts and probably even a RunAs - which Richard and I just setup to record on Thursday (Mar 12)), I started thinking about how I could convince people of why they NEED a database developer. So, I thought I'd ask this VERY important question...

What percentage of data IS selective enough to use a nonclustered index which doesn't cover the query... in other words (just in case you're not entirely sure of what I mean :)), think of indexes in the back of a book... if you need to go to the back of the book to reference a bunch of data (this is called a [bookmark] lookup in SQL Server), there's a point where the randomness of the lookups (especially if you think in terms of many rows on a page) becomes too expensive. For example, imagine that the index is customer name and the data (the book) is customer orders - and, each page (of this rather weird book ;)), has 20 orders on it. Doing a query to lookup customer number 12's orders might be really easy (if they only have only a few orders) BUT, what if the query is "show me all of the orders for people that have an 'e' in their name". First, the number of people have have an 'e' in their name is probably better than 50% (that's TOTALLY a guess) and, if there's 20 orders per page then a lookup from the index into the book would require SQL Server to touch every page roughly 10 times. If the table has 50,000 pages (therefore 1 million rows - at 20 rows per page), then to find the 500,000 rows (remember, I'm estimating half), SQL Server would have to do 500,000 bookmark lookups. For a table with only 50,000 pages that's terribly expensive.

So, here's the question - what's the tipping point? When is a nonclustered index on customer name NOT going to be used to lookup rows of sales orders? I'm going to use a survey to see what you think and then within a week, I'll give the specific SQL Server math AND a query you can run within your own DBs to see EVERY one of your table's "tipping points". It's really interesting and I think will really help you to understand why SQL Server might not be using those nonclustered indexes.............

Cheers,
kt

It's an exciting year for us for DevConnections! SQL Server 2008 has now been out for a few months and an SP is coming up soon. This is the sign that some customers wait for to migrate over to the new release feeling that an SP indicates a higher level of stability. But, this is also a time when some companies are shying away from upgrades because of the immediate and very quantifiable costs. And so Paul and I really struggled with what to focus on when we put together our Connections line-up.

What we decided to do is focus on your getting the most from the system that you have now - with best practices that apply to SQL Server 2000, SQL Server 2005 and SQL Server 2008. Personally, Paul and I are going to demo and focus on 2005/2008 but the concepts work on all 3 versions (and even most of the syntax as well - but, for index fragmentation analysis and maintenance the commands changed between 2000 and 2005/2008 so that's one minor difference). For the conference itself, we're focusing on upgrade, new features in 2008 and things to be aware of architecturally in all areas of Administration/Ops, Development and Business Intelligence. And, given that this isn't a "new product year" for SQL, other big conferences are likely to have fewer SQL sessions than usual (and most do...seriously).

So, with SQLConnections you get 46 *SQL* sessions and 3 full-day *SQL* workshops (and workshops on other technologies as well - all of which are spread over 2 pre-con days and 1 post-con day - with none of the SQL ones running concurrently so you could attend all 3). Top it all off with a more intimate event than many others means more interaction to get your tough questions answered! In fact, to help make sure we see as many people as possible, Paul and I usually schedule our sessions before and after lunch so that we can spend the entire lunch gap inside our session room answering even more questions! We even have a session called "Follow the Rabbit" where YOU drive the session with your questions. It's great fun and we've been doing this for the past few years with a lot of success!

If you want to see a bit of the personality and flair offered at Connections - check out MyConnections - it's our conference magazine (note: it's 9.80 MB to download but, it's 84 pages). It's something you get automatically after attending and it's filled with technical articles and all sorts of additional information that comes from Connections. And, here's a link to a fun and fast-paced video with highlights of the conference itself (nothing technical - just fun shots of the event). For example, did you know that EVERY year a Harley is given away at Connections? Here's the quick video: mms://bcast.sswug.org/sswugtv/DevConnectionsFall08.wmv.

Finally, did you know that EVERY attendee gets a FREE SQL Server 2008 Standard Edition license with one CAL. That can cover your attendance right there and get you started on development and learning with SQL Server 2008.

We really have a great time at Connections and we hope to see you there!! (And, Florida in March is a nice destination from the winter weather for many of us as well. :)

We hope to see you there!!
Kimberly and Paul

Something I learned while the SQL Server 2008 Internals book was in tech edit (thanks to our *awesome* tech editor Ben Nevarez - who, unfortunately, does not have a blog or anything...yet! (well, I'm hopeful)), was that you can use a FOREIGN KEY constraint to reference a UNIQUE index - one without a PRIMARY KEY or UNIQUE key constraint. At first glance this might seem like something relatively insignificant but in terms of reducing indexes and/or consolidating indexes it offers something that constraints do not. When you create a UNIQUE index you can use INCLUDE to reference (and include) non-key columns in the leaf level of an index. This offers more choices for covering and if you want to cover a query using INCLUDE but also have a UNIQUE column(s) as the key - you can do that with a regular index but not with a constraint based index. So, that got me thinking even more - can I use a UNIQUE index with INCLUDE and even a filters - from a FOREIGN KEY. My guess was that it probably wouldn't work because it would be too costly to have to verify it on every referencing row BUT, I did have hopes that a filter of IS NOT NULL would work. However, it does not. ;-( 

So, you CAN reference a UNIQUE index with INCLUDEd columns but not filters. Even that's really cool!

And, when you start your spring cleaning - try and cleanup and/or consolidate some of those redundant indexes!!

Cheers,
kt 

Given the general state of the economy...many companies are looking to cut back. Going back over what we've done and "optimizing" things -> budgets, expenses, etc. is the norm right now. And, scaling back is not always a bad thing - unless the wrong things are cut. Unless the wrong things are used to motivate you. Prioritizing and/or really assessing what gives you the biggest gains for your dollars is hard. In fact, one of the things that always seems to be first on the cutting block is training. Training is hard to quantify. And, the results of good training are also hard to quantify. Instead of fixing a problem (which you can often see the exact improvement) you might instead avoid a problem. Avoid downtime. Avoid data loss. Process more rows - with the same hardware. But, how do you know the cost of what could have happened. Ugh. To be honest, if I could do that - Paul and I would be on a beach. ;)

But, I do have a reason for this post... what should you be thinking? Where should you focus your attention? What can you cut - safely, temporarily, permanently and what might you help to prioritize?

Should you upgrade software?

  • Is there a feature that makes something easier? Some new features are really powerful "big" features. For example, Policy-Based Management (PBM) might help you to better centralize certain rules (in PBM-speak "policies") and then enforce them on many servers - even 2005 servers... so, you might be able to upgrade a smaller number of servers and still get some of the benefits. Many of the tools work against multiple versions so you might be able to minimize (and/or prioritize) which servers you upgrade and slowly migrate others. Potentially following an every-other-version upgrade strategy... upgrading some servers from 2000 to 2008 and leaving some of your 2005 servers to wait to upgrade until SQL11 (the next version after SQL10 - which is SQL Sever 2008).
  • Are you starting a new project - architecting a new database? Wouldn't it be easier to start on the newer version and get better longevity (maybe?!)? For example, sparse columns might make a major difference in your base table's architecture...and be easier than if you were to architect (and write all of the code) for 2005 but then later need to do a major architectural change to move to 2008 (well, to *really* benefit from things like sparse columns). There are some really good features in 2008 and some *might* warrant upgrading... upgrading now. But, if you don't have a direct need then I'd argue that you could probably stay with 2005 (or even 2000) and then push this out a bit until you absolutely need to move forward.

Should you upgrade hardware?

  • Again, are there features that will directly impact: performance, availability, manageability?
  • Can you wait? I can't really answer this and - for everyone - the answer is going to be "it depends". There might be something that significantly reduces costs and/or minimizes downtime and as a result, you'll just have to do cost-benefit analysis. This is a tough one... but, maybe you can do rolling upgrades and let some of the lesser servers take the hand-me-downs. :)
  • Can you do rolling upgrades moving the most critical to a new server and then a less critical server to the one freed up by the last upgrade...

Is there anything you can do to get more out of what you already have??

In my opinion, this is probably even more important than the two above. Upgrading hardware and software is something you will ALWAYS need to consider but if you could get better performance, scalability and availability out of the hardware/software you have now, then you'll benefit *now* without additional funds spent (actual outgoing funds) and you still be able to leverage what you do today when you do upgrade. So, what this really translates to (IMO) is tweaking and tweaking a bit more - what you already have? How? What can you look for? What can you do to help??

  • Upgrade to the latest service packs/hotfixes (at least upgrade to the free stuff - you might see some gains and in some cases (like SQL Server 2005 SP2+) you might get some new features. (important note: test this on a non-production server FIRST!!)
  • Update your hardware's firmware? You might have missed an update that improves performance (important note: test this on a non-production server FIRST!!)
  • Bottleneck Analysis - Some good resources for this are: Performance Tuning Using Waits and Queues and the SQLCAT team.
  • Workload Analysis - Some good resources for this are: Troubleshooting Performance Problems in SQL Server 2005, Working with Tempdb in SQL Server 2005, Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005...well, there are lots of good whitepapers that are specific to certain types of workloads and/or perf problems...check out our whitepapers page here: http://www.sqlskills.com/whitepapers.asp and the CAT team's whitepapers pages here: http://sqlcat.com/whitepapers/default.aspx and the general SQL Server on microsoft.com pages here: http://www.microsoft.com/sqlserver/2008/en/us/white-papers.aspx and for 2005 here: http://www.microsoft.com/sqlserver/2005/en/us/white-papers.aspx
  • Maintenance - often overlooked and incredibly important. A database that has solid maintenance practices (fragmentation analysis and cleanup, VLF analysis and cleanup, transaction log management, finding corruption in its early stages through automated CHECKDB executions...) performs better, is easier to recover, might naturally stay smaller (more compact) and therefore require less hardware. In fact, analyzing indexes - to get rid of unused indexes and to consolidate redundant indexes can end up saving disk space, backup space, cache, maintenance costs, etc. Both Paul and I have blogged quite a bit about many of these!
  • Other tips and tricks
    • Blogs... which is why you're here and there are so many out there! Here's a link I recently found that lists a bunch of SQL-related blogs: http://technet.microsoft.com/en-us/sqlserver/bb671052.aspx and, of course, Paul's post on "So many blogs" and the PASS list of blogs here: http://www.sqlpass.org/Community/BlogDirectory.aspx.
    • Webcasts... there are lots out there and we now have a page which has most of ours listed on it (thanks to Paul for creating this!!) here: http://www.sqlskills.com/webcasts.asp and there are LOTS more on TechNet, MSDN, etc.
    • Conferences... OK, maybe a shameless plug for conferences like SQLConnections *but* in having put together the agenda (with Paul) where we specifically focused on best practices topics and performance tuning - I can tell you that some of the tips and tricks that we recommend can significantly improve performance, may minimize needed disk space (by creating more optimal and often fewer indexes), may improve availability with better design practices and/or maintenance and much more than that! And, in getting away from the office for a few days and focusing just on learning you might do two things. First, you might learn some tips and tricks that you never would have (or it would have taken *a lot* more time and/or been harder to really understand?). Second, you might come back with a whole new and renewed enthusiam for doing things - and with an ordered/prioritized list of things to try. And, this might even help to motivate you because it also shows that your company really is committed to you/your job (having spent money specifically on your learning) - and you to them.

So, I do think that there are SMARTER ways to save. A well trained employee is worth a lot more than a cheaper one. And, there are smarter things to cut. I hope this might help you think of things to do and/or places to look to get better performance with what you have! I think blanket "no training" or "no upgrades" statements are never good for anything - even the budget (the longer term effects can be much worse - but also much harder to quantify).

Really, the answer is always different. It depends............

kt

Paul and I started discussing a comment that came up regarding the many issues surrounding logging & recovery. It's one of our favorite topics and in fact was the title to an article that Paul recently wrote for TechNet here: http://technet.microsoft.com/en-us/magazine/2009.02.logging.aspx. And, as a sidenote, depending on how much you already know about the transaction log - you might want to review that article first!

The comment that came up was related to a common misunderstanding on what is and what is not required to make a backup transactionally consistent when restored. And, in my opinion, some of the confusion as to whether or not log backups are "required" is because many changes have occurred release to release. Also, a lot of us say "log backups are required for better recovery" and while restoring log backups is what allows features like up-to-the-minute recovery and point-in-time recovery, not all strategies or recovery procedures actual require additional and/or separate log backups (some backups actually backup part of the log during their backup - and this is actually something that has changed release to release). And so, this is the reason for this post, I want to try and clear up a few of the many misconceptions about what happens with regard to the log during backup and restore. What's really interesting is that some of the best features (seemingly minor) have been around exactly this - the behavior of the transaction log during other backups and the requirement during restore. So, I thought I'd give a play by play from 2000 to 2005/2008 to discuss the differences and what's changed and why those changes were significant. The biggest changes were between 2000 and 2005.

First and foremost, the log portion of a database is required to make that database transactionally consistent. The transaction log is the key to SQL Server's durability (data integrity even after power loss). Transaction log backups are the key to our being able to recover from more catastrophic failures (possibly even point-in-time recovery if the right backup strategy exists). Inside the database, SQL Server doesn't really need all of the transaction details after they've guaranteed a transaction's durability (or, more simply put, once the effect of the change has been reflected in the data portion of the database then the details of that change are no longer needed in the log portion of the database). As a result, you can have SQL Server clear the "inactive" portion of the log by setting the database's recovery model to the SIMPLE recovery model. Loosely translated the SIMPLE recovery model means "when SQL Server no longer requires the transaction information to guarantee durability - then the log information can be removed from the log". Setting the recovery model to SIMPLE limits your backup options and makes administration easier (i.e. simple :)); however, it does not offer any other protection in the event of a more catastrophic disaster (because the log is being regularly cleared then there's no transactional information to backup). For some development/test databases and databases where data loss is not a major concern, then this can be an easy choice because log management (i.e. backups) does not need to be performed. However, if you want to minimize data loss - you can't choose the SIMPLE recovery model; you must choose either the FULL (which is the default) or the BULK_LOGGED recovery model. However, the discussion on when/why to choose BULK_LOGGED is a lengthy one and it does NOT impact the rest of this blog post. However, I did write a chapter for a SQL Server 2000 HA book and I described in detail the best uses for the BULK_LOGGED recovery model as well as the benefits and concerns. While this was written for SQL Server 2000, most of it *still* applies (and there are a few timeline based examples as well). You can download a pdf of this chapter here: http://www.sqlskills.com/resources/SQLServerHAChapter9.pdf.

As for the main purpose of this post - there are basically a few key questions that I want to answer/clarify by version:

  • Is the log backed up as part of the other backups?
  • Is it cleared?
  • Is there anything else that's affected?
  • What happens to the log during other database, filegroup, file, database-differential, filegroup-differential and file-differential backups? And, since the behaviors and internals seem to be grouped into two groups, I will differentiate between these two different groups of backup strategies with the following types:   
    • Database-level backup strategies are backups that use database and optionally database-differential backups
    • Granular backup strategies are backup strategies that use file and/or filegroup backups and optionally file-differential/filegroup-differential backups

SQL Server 2000
Database-level backups cannot occur simultaneously with log backups. However, granular backups *CAN* occur concurrently with log backups.
If a log backup is attempted while a database-level backup is running, then the log backup is paused. This can have the following affects:

  • the transaction log may require auto-growth and become very large
  • the transaction log for a secondary server (i.e. through log shipping) can fall *very* far behind the primary server. And, this is a HUGE concern for high availability. If a full backup takes 4 hours to run, then logs cannot be shipped for 4 hours. As a result of this limitation, some chose to use a granular backup strategy. The reason why log backups CAN occur concurrently in SQL Server 2000 is because, in implementation, SQL Server does NOT to backup the log as part of these more granular backups. As a result, transactional integrity is not guaranteed until the appropriate log chain is rolled forward. This has the following affects:
    • Granular backups only support the BULK_LOGGED or FULL recovery models (somewhat negative but not really)
    • The transaction log backups could run and even clear the inactive portion of the log while these granular backups were running (this is a huge benefit because it limits the need to auto-grow during these backups)
    • Recovery during restore is required (for transactional integrity) which means that all logs need to be restored to cover the time of the granular backup (and then all of those up-to-the-minute or to the desired point-in-time). And, even if a filegroup is set to READ_ONLY - *all* transaction logs need to be restored (this is a big negative but there is a trick: perform periodic file/filegroup differentials (after setting the filegroup to READ_ONLY) so that you can avoid having to perform numerous transaction log restores).

So, to answer the questions for SQL Server 2000:

  • Is the log backed up as part of the other backups?
    • for Database-level backups: YES
    • for Granular backups: NO
  • Is it cleared?
    • When a transaction log backup is performed then the default behavior is to clear the inactive portion of the log
    • When a database-level backup is performed AND there's no corresponding log chain (meaning the first time you backup the database OR the first time you backup the database after the transaction log chain was broken), then the transaction log is cleared. NOTE: Breaking the continuity of the log is relatively easily done in SQL Server 2000 when someone executes a BACKUP LOG with NO_LOG or a BACKUP LOG with TRUNCATE_ONLY command. To disable these from execution (for the FULL or BULK_LOGGED recovery model, use TRACE FLAG 3231). This is a VERY COOL and *SAFE* trace flag. I blogged about this trace flag in a "MSDN webcast Q&A" here. An important side note here is that in SQL Server 2000, log backups can be performed AFTER the continuity of the log has been broken. So, if someone manually cleared the log (using NO_LOG or TRUNCATE_ONLY) and did NOT follow that with a database-level (or appropriate granular backups), then scheduled log backups could continue to run without failure or errors. However, log backups performed AFTER the continuity of the log has been broken CANNOT be restored. So, during recovery you might receive an error that a log backup cannot be applied because it's too "late" to apply. Using Trace Flag 3231 reduces this possibility. However, SQL Server 2005 fixes some of these issues.
  • Is there anything else that's affected?
    • Log backups are paused during database-level backups
    • When restored, database-level backups are transactionally consistent (and can be recovered directly - without restoring additional logs)
    • When restored, granular backups require transaction log backups to guarantee transactional integrity (note: this can be complex to determine the "minimum effective log sequence" and I wrote a series of articles for SQL Server Magazine on how to determine the appropriate log sequence here)
  • What happens to the log during other database, filegroup, file, database-differential, filegroup-differential and file-differential backups?
    • for Database-level backups: log backups cannot occur concurrently 
    • for Granular backups: log backups can occur concurrently and are required for recovery

SQL Server 2005
The biggest improvement in SQL Server 2005 was that log backups are no longer paused by database-level backups - they *can* occur simultaneously; however, this change to database-level backups also applied to granular backups. While 2000 did allow log backups at the same time as a granular backup, they did so by NOT maintaining transactional integrity in the backup. In SQL Server 2000, you need to restore logs to make the granular backup transactionally consistent. In SQL Server 2005, they changed ALL backup strategies to follow the same behavior - database-level and granular backup strategies ALL backup the required log information needed to recover the backup to a transactionally consistent point in time which is essentially when the backup completes (this is a lot more complex than it sounds but Paul wrote a comprehensive post on exactly what this means here). Simply put, this requirement means that transaction log backups CAN occur concurrently; however, the log CANNOT be cleared until the backup completes. The primary negative effect is that the transaction log may require auto-growth and become very large. However, the positives are that you can do granular backups in any recovery model (although there are still some limitations to how this works in the SIMPLE recovery model but they added a new option during backup to allow a backup of ALL of the READ_WRITE_FILEGROUPS as a unit - separately from the read-only file groups which could be backed up at any time after they are set to READ_ONLY).

So, to answer the questions for SQL Server 2005:

  • Is the log backed up as part of the other backups?
    • for Database-level backups: YES
    • for Granular backups: *YES*
  • Is it cleared?
    • When a transaction log backup is performed then the default behavior is to clear the inactive portion of the log
    • When a database-level backup is performed AND there's no corresponding log chain (meaning the first time you backup the database), then yes, the inactive portion of the log is cleared. As far as breaking the continuity of the transaction log... In SQL Server 2005, they significantly reduced the problems that occur after the log chain is broken by NOT allowing log backups to continue. If a log backup is attempted after the continuity of the log is broken then you will receive error: 
        Msg 4214, Level 16, State 1, Line 1
        BACKUP LOG cannot be performed because there is no current database backup.

      So, this means that you don't necessarily need the trace flags. However, I still recommend using the trace flag because it would be better to not break the continuity of the trace flag to begin with! And, in fact, in SQL Server 2005, there are two trace flags: 3231 and 3031. They are both safe and here's how the two differ:
      • Trace Flag 3231 (same as 2000): When set, BACKUP LOG with TRUNCATE_ONLY and BACKUP LOG with NO_LOG do not allow a log backup to run if the database's recovery model is FULL or BULK_LOGGED.
      • Trace Flag 3031 (new in 2005): When set, BACKUP LOG with TRUNCATE_ONLY and BACKUP LOG with NO_LOG run as a CHECKPOINT - regardless of recovery model.
  • Is there anything else that's affected?
    • Log backups are *NOT* paused during database-level backups
    • When restored, database-level backups are transactionally consistent (and can be recovered directly - without restoring additional logs)
    • When restored, granular backups are transactionally consistent (and can be recovered directly - without restoring additional logs). However, you must always remember that the database cannot be brought online until the entire database is at a single transactionally consistent point in time. All read-write-filegroups must be restored as a unit (if in the SIMPLE recovery model) OR you must use transaction log backups to recover the entire database up to the SAME point in time.
  • What happens to the log during other database, filegroup, file, database-differential, filegroup-differential and file-differential backups?
    • for Database-level backups: log backups *can* occur concurrently (but the log will not be cleared until the backup completes) 
    • for Granular backups: log backups can occur concurrently (but the log will not be cleared until the backup completes) 

SQL Server 2008
Almost everything is the same in SQL Server 2008 as it was in 2005 - they made the largest number of improvements in 2005. However, one thing did change. In SQL Server 2008, the BACKUP LOG with NO_LOG and BACKUP LOG with TRUNCATE_ONLY options are not allowed at all. There is no need for the trace flags (3231/3031) because breaking the continuity of the log is not allowed (well, there is still a way... I'll get to that in a moment :)). In SQL Server 2008, if BACKUP LOG with NO_LOG or BACKUP LOG with TRUNCATE_ONLY are attempted, you will receive this error:
     Msg 3032, Level 16, State 2, Line 1
   One or more of the options (no_log) are not supported for this statement. Review the documentation for supported options.
But, what if you really don't want to backup the log? Why? Take this scenario (from a real customer!)... You have a 10GB database that's been around for quite some time AND you're doing regular full database backups... then, all of a sudden you run out of disk space. In looking around for large files (to investigate why you ran out of space), you find that this 10GB database's log is 987GB... so, you wonder - what happened? A database that is in the FULL recovery model (remember, this is the default) requires transaction log management. The easiest way to manage the log is with regular log backups; however, you're only doing full database backups (which do NOT clear the log). As a result, the transaction log grows and grows and grows and grows - until you're out of disk space (Paul demo'ed this in a TechNet Podcast here). At this point, how do you get rid of this 987GB transaction log? In prior releases, you can "clear" the log by using TRUNCATE_ONLY or NO_LOG but in 2008, what do you do? Switch to the SIMPLE recovery model. And, if you only want to do full database backups, stay there. And, if you want to physically shrink down the size of the transaction log file to a reasonable size - check out these two related blog posts: 8 Steps to Better Transaction Log Throughput and Transaction Log VLFs - Too many or too few?. And, in related news, Linchi Shea posted a good post on some tests he ran related to too many VLFs here and a second post that shows that some workloads don't see any issues wrt to lots of VLFs here. But, the long story short is that you still want to be proactive about creating a reasonably sized transaction log (my two other previously mentioned posts). Significant auto-growth can cause problems and backup operations (and managment in general) can be more difficult with lots of VLFs.

Wow, that was much longer than I was expecting... and, in writing it all down - pretty complex (I had a hard time trying to section things but I think this works?!). Regardless, all the facts are there so this should help to clarify what happens the when, where and why - wrt to the transaction log. Let me know if you have more questions!

Thanks for reading,
kt

Way back in June 2005, I blogged about '8 Steps to better transaction log throughput'. I did this blog post after seeing (again and again) overly fragmented transaction logs... Transaction logs can become *VERY* fragmented when they are not preallocated and instead they grow excessively through unmanaged (and probably the default settings for) auto-growth.

While having WAY too many VLFs because of auto-growth is still the most common form of problem within transaction logs, another problem has been creeping up more and more... too few VLFs. If you preallocate a very large transaction log (10s to 100s of GB), SQL Server may only allocate a few VLFs - as a result, log backups will be allowed to run normally but, SQL Server only clears the inactive VLFs when you've moved into a different VLF. If your VLFs are 8GB in size, then you need to accumulate 8GB of log information before the log can be cleared...so, many of your log backups will occur normally but then one (the one that finally hits > 8GB in used size) will take quite a bit more time AND possibly cause you performance problems because it's now clearing 8GB of log information.

First, here's how the log is divided into VLFs. Each "chunk" that is added, is divided into VLFs at the time the log growth (regardless of whether this is a manual or auto-grow addition) and it's all dependant on the size that is ADDED not the size of the log itself. So, take a 10MB log that is extended to 50MB, here a 40MB chunk is being added. This 40MB chunk will be divided into 4 VLFs. Here's the breakdown for chunksize:

chunks less than 64MB = 4 VLFs

chunks of 64MB and less than 1GB = 8 VLFs

chunks of 1GB and larger = 16 VLFs

And, what this translates into is that a transaction log of 64GB would have 16 VLFs of 4GB each. As a result, the transaction log could only clear at more than 4GB of log information AND that only when it's completely inactive.

To have a more ideally sized VLF, consider creating the transaction log in 8GB chunks (8GB, then extend it to 16GB, then extend it to 24GB and so forth) so that the number (and size) of your VLFs is more reasonable (in this case 512MB).

Have fun and thanks for reading!!
kt

BEWARE - there's a bug when growing the log in multiples of 4GB. Check out Paul's post for more information: Bug: log file growth broken for multiples of 4GB.

Well... I think I had had too much tea that morning ;-). But, as always, chatting with Richard and Greg was great. Here's the specific show link: http://www.runasradio.com/default.aspx?showNum=76.

Oh, and just for the record, I didn't come up with that title. But, I do hope that all your [high-priority and important] queries are indexed!

Enjoy!
kt

I first posted an update to sp_helpindex here. My version of sp_helpindex was solely to expand what sp_helpindex showed and adds 1 or 2 things based on version: for SQL2005+ it adds included columns and for SQL2008 it also adds the filter predicate. So, there were two versions of sp_helpindex2 depending on which verison you're using. A lot of folks like the changes to this sp but, alas, it had a bug (or two :) and in fact, I found a few others when I went back over this as well. So, thanks to Josh (who commented here) and to a private email (thanks Vasco!), I have an updated version of sp_helpindex2:

For SQL Server 2005, here's your new sp_helpindex2 script: sp_helpindex2_2005.zip (2.89 KB)

And, here's a simple test script for 2005:

DROP TABLE tbl1
GO

CREATE TABLE tbl1( c1 int, c2 int, c3 int, c4 int)
GO
CREATE INDEX ix_1 ON tbl1(c1) INCLUDE (c2)
CREATE INDEX ix_2 ON tbl1(c1)
CREATE INDEX ix_3 ON tbl1(c1) INCLUDE (c2, c3)
CREATE INDEX ix_4 ON tbl1(c1, c3) INCLUDE (c2)
CREATE INDEX ix_5 ON tbl1(c3) INCLUDE (c1, c2, c4)
CREATE INDEX ix_6 ON tbl1(c1, c2) INCLUDE (c3, c4)
go

sp_helpindex2 tbl1
go

index_name

index_description

index_keys

included_columns

ix_1

nonclustered located on fg1

c1

c2

ix_2

nonclustered located on fg1

c1

NULL

ix_3

nonclustered located on fg1

c1

c2, c3

ix_4

nonclustered located on fg1

c1, c3

c2

ix_5

nonclustered located on fg1

c3

c1, c2, c4

ix_6

nonclustered located on fg1

c1, c2

c3, c4

For SQL Server 2008, here's your new sp_helpindex2 script: sp_helpindex2_2008.zip (2.84 KB)

And, here's a simple test script for 2008:

DROP TABLE tbl1
GO

CREATE TABLE tbl1( c1 int, c2 int, c3 int, c4 int)
CREATE INDEX ix_1 ON tbl1(c1) INCLUDE (c2)
CREATE INDEX ix_2 ON tbl1(c1)
CREATE INDEX ix_3 ON tbl1(c1) INCLUDE (c2, c3)
CREATE INDEX ix_4 ON tbl1(c1, c3) INCLUDE (c2)
CREATE INDEX ix_5 ON tbl1(c3) INCLUDE (c1, c2, c4)
CREATE INDEX ix_6 ON tbl1(c1, c2) INCLUDE (c3, c4)

CREATE INDEX ix_1f ON tbl1(c1) INCLUDE (c2)
WHERE c3 IS NOT NULL

CREATE
INDEX ix_2f ON tbl1(c1)
WHERE c4 > 2

CREATE INDEX ix_3f ON tbl1(c1) INCLUDE (c2, c3)
WHERE c4 > 2 AND c1 < 50 AND c2 = 12

CREATE INDEX ix_4f ON tbl1(c1, c3) INCLUDE (c2)
WHERE c4 IS NOT NULL AND c1 = 12

CREATE INDEX ix_5f ON tbl1(c3) INCLUDE (c1, c2, c4)
WHERE c1 > 5

CREATE INDEX ix_6f ON tbl1(c1, c2) INCLUDE (c3, c4)
WHERE c4 < 20
go

sp_helpindex2 tbl1
go

index_name

index_description

index_keys

included_columns

filter_definition

ix_1

nonclustered located on PRIMARY

c1

c2

NULL

ix_1f

nonclustered located on PRIMARY

c1

c2

([c3] IS NOT NULL)

ix_2

nonclustered located on PRIMARY

c1

c2

NULL

ix_2f

nonclustered located on PRIMARY

c1

c2

([c4]>(2))

ix_3

nonclustered located on PRIMARY

c1

c2, c3

NULL

ix_3f

nonclustered located on PRIMARY

c1

c2, c3

([c4]>(2) AND [c1]<(50) AND [c2]=(12))

ix_4

nonclustered located on PRIMARY

c1, c3

c2

NULL

ix_4f

nonclustered located on PRIMARY

c1, c3

c2

([c4] IS NOT NULL AND [c1]=(12))

ix_5

nonclustered located on PRIMARY

c3

c1, c2, c4

NULL

ix_5f

nonclustered located on PRIMARY

c3

c1, c2, c4

([c1]>(5))

ix_6

nonclustered located on PRIMARY

c1, c2

c3, c4

NULL

ix_6f

nonclustered located on PRIMARY

c1, c2

c3, c4

([c4]<(20))

Have fun!
kt

YES!!!

OK, well, I guess I should be more specific because as in most things in SQL Server, the real answer is "it depends". And for these two options, it depends mostly on your SQL Server version. Since SQL Server 7.0, the way that auto update works, has changed (much so for the better!!). So, if you're in SQL Server 2005 or SQL Server 2008, I would say most definitely - leave these ON (or if you turned them off - turn them back on!!!)! If you still have problems with a specific index causing your grief, then turn off auto update at the index level NOT at the database level. To turn of auto update at the index level use STATISTICS_NORECOMPUTE in the index (or statistics) definition.

Now, as for why (and how!) this has changed over the versions... here we go:

SQL Server 7.0

  • Invalidation: Statistics were *invalidated* when a row modification counter (sysindexes.rowmodctr) was reached. This meant that they could not tell where the modifications were occuring and, if modifications were somewhat isolated to a specific column ALL of the statistics for the TABLE would be invalidated (so, statistics could be invalidated earlier than necessary)
  • Updating: Even worse, in SQL Server 7.0, when statistics were invalidated, they were immediately updated. This caused two problems - thrashing at the time of invalidation because all of the stats needed to be updated AND two, if the statistics were not used for awhile then extra work was involved to update them and by the time they were used, they might already be somewhat out of date already.

SQL Server 2000

  • Invalidation: Statistics were still invalidated based on a row modification counter.
  • Updating: SQL Server 2000 fixed the "updating-potentially-too-often" problem by only updating statistics when they were needed.

SQL Server 2005

  • Invalidation: The biggest changes were introduced in SQL Server 2005 where they decided to NO LONGER user the sysindexes.rowmodctr and instead use an internal (and undocumented) columns specific modification counter. Now, statistic invalidation is more isolated to only those columns which are heavily modified. This internal/undoc'ed column is sysrowsetcolumns.rcmodified and can only be seen when connecting to SQL Server using the DAC (Dedicated Admin Connection).
  • Updating: Updating didn't really change but, SQL Server 2005 added "Aynch Auto Update" for statistics so that when the QO (query optimizer) encounters an out-of-date (i.e. invalidated) statistic, they can "trigger" the update but not wait for the update (meaning that they'll optimize using the out-of-date statistic). This can be both positive (faster) and negative (might not be the best plan if the statistics have changed drastically). It is off by default and IMO, I'd leave it off in most cases but if you find that auto update events (which can be Profiled) are causing you grief, then you can turn this on at the database level.

SQL Server 2008

Nothing new except "Filtered Statistics" and these are interesting as the density vector is still relative to the table (not the predicate) but the histogram is just over the predicate (OK, I know I'll have to blog a lot more about this one!). Anyway, I'm still playing/learning a lot more about these and they make the most sense with filtered indexes (as opposed to just a filtered statistic) but, just like statistics on secondary columns you will also potentially want statistics on the secondary columns of your indexes. The next question is should they have a filter or not. I've found that sp_createstats doesn't seem to create statistics with filters and I'm going to need to do some testing here but I think statistics with filters (filters that match the non-clustered index) should help to make the stats better (and even allow better usage of filtered indexes) but, I'm really going to need a bunch of time with this - and another post :). As for auto create/auto update - no changes there!

Long story short, if you're using SQL Server 2005 or SQL Server 2008, you should leave auto create/auto update ON.

Thanks for reading!
kt

PS - A few of you have mailed me about a bug in the sp_helpindex2 script(s). OK, that's my next post!!! Possibly with an sp_helpstats2 script as well!

OK, I first posted on some of the limitations to indexes in SQL Server 2005 and 2008 in part one here. Now, I want to dive into index internals for a post (or two). And, I often get the question “who is the best audience for your blog – or, for this post” and well, that’s a bit hard to answer. At SQL Connections in Orlando, I delivered a session titled: Index Internals & Usage and while we (fyi – Paul and I co-chair the SQL Connections portion of “DevConnections”) put it in the "developer-focused track," it was more of a Dev/DBA "hybrid" session with the emphasis on database development and best practices in creating and managing indexes (rather than management/maintenace/operational tuning - which is more for DBAs). Here at TechEd this week, I'm going to focus more on the management/maintenace/operational tuning side with a session called Are your Indexing Strategies Working? I'll also do a complementary blog post for that as well...

Having said that thought, indexes are definitely in a group of topics - very much so related to performance and scalability (index internals, indexing strategies, log maintenance, general database maintenance) which really needs to cross almost all database-related disciplines (dev, admin, ops, etc…). If you work with SQL Server in almost any capacity, you need to get a feel for at least some aspect of indexing for performance.

So, for this post, I’m continuing with some internals. In the first post (in this series), I wrote about limits. Limits/boundaries are interesting to discuss but it's also important to remember that good performance takes a lot more than just staying within the bounds of what’s possible. Creating indexes solely because you can – without reason and only with upper limits in mind – can be even worse than under indexing. So, if you find that you're wanting more about indexes (I have many blog posts that are solely Q&A posts), check out my Indexing category here. Now that you know how many indexes you can create, a better question would be when is it appropriate to create indexes at all?

So, what is “finding the right balance” in indexing? In my opinion, there are three requirements/pre-requisites:

  1. knowing the data
  2. knowing how the users use the data
  3. knowing how the underlying structures and database stores/manipulates and uses indexes

Bringing all of these things together is what I try to do in my workshops, seminars and lectures – in this post, I'll start with a smaller more digestible piece - internals.

Indexes have 2 components: a leaf level and a non-leaf level (or b-tree). The non-leaf level is interesting to understand and discuss (in terms of internals) but simply put, it’s used for navigation to the leaf level (more than anything else). So, we'll start with the leaf level (as does SQL Server - the leaf level is always built first). The leaf level of an index contains something (I’ll explain more coming up) for every row of the table in indexed order (note: I am focusing on traditional indexes in every release from SQL Server 2000 up to and including SQL Server 2008 – with the exception of filtered indexes which I will write about in a later post). Once the leaf level is built, non-leaf level(s) can be built to help navigate to the leaf level but the architecture is rather straightforward. The non-leaf level stores something for every page of the level below – and levels are added (each smaller than the previous because each level only contains one the first entry from every page) until the index gets to a root of one page. While it sounds like this could result in a lot of levels (ie. a tall tree), the limitation on the size of the key (which has a maximum of 900 bytes or 16 columns) helps to keep index trees relatively small. In fact, in the example I’ll show coming up – which has a fairly large (large meaning WIDE) index and has a key definition which is at the maximum size – even the tree size of this example index (at the time the index is created) is only 8 levels high/deep…

To see this tree (and the math used to create it – which is the same thing that SQL Server would go through to create it), we’ll use an example where the leaf level of the index contains 1,000,000 “rows.” I put quotes around “rows” because I don’t want to imply that these have to be data rows – these are really just leaf level rows and I’ll explain more on what leaf level rows can be... The leaf level rows are 4,000 bytes per row (therefore only 2 rows per page) or 500,000 pages. This is not ideal but at least the pages are almost full and we’re not wasting a lot of space – if we had two 3000 byte rows we’d still only fit 2 per page and then we’d have 2,000 bytes of wasted space. Now, as for why these are just “rows” and not specifically data rows is because this leaf level could be the leaf level for a clustered index (therefore data rows) OR these leaf level rows could be rows in a non-clustered index that uses INCLUDE (which was new to SQL Server 2005) to add non-key columns to the leaf level of the index (which therefore creates wider leaf rows (wider than the 900 bytes or 16 column maximum). Again, while this doesn’t currently sound interesting, I’ll explain why this can be beneficial coming up (possibly in another post depending on how long this particular post becomes… J).  

The leaf level of this index would result in a 4 GB structure (and this is only at the time it’s created – if a lot of rows are added and the key is not ever increasing then this structure could become heavily fragmented and therefore much larger/taller). In this case, it’s relatively large (again because of “row” width) and with an index key of 900 bytes you can even see that in this case, the tree would be relatively small and only result in 8 levels – as shown below.

Root page of non-leaf level (Level 7) = 2 rows = 1 page

Intermediate non-leaf level (Level 6) = 15 rows = 2 pages (8 rows per page at 900 bytes)

Intermediate non-leaf level (Level 5) = 122 rows = 15 pages (8 rows per page at 900 bytes)

Intermediate non-leaf level (Level 4) = 977 rows = 122 pages (8 rows per page at 900 bytes)

Intermediate non-leaf level (Level 3) = 7,813 rows = 977 pages (8 rows per page at 900 bytes)

Intermediate non-leaf level (Level 2) = 62,500 rows = 7,813 pages (8 rows per page at 900 bytes)

Intermediate non-leaf level (Level 1) = 500,000 rows = 62,500 pages (8 rows per page at 900 bytes)

Leaf level (Level 0) = 1,000,000 rows = 500,000 pages (2 rows per page)

 

Having said that though, this is NOT a goal. :) In more realistic scenarios [where the key is much smaller and] even when there are more rows, there are fewer levels (3-4 is quite normal). Most importantly, the size of an index (and the number of levels) depends on two things – the width of the key (in terms of the number of bytes) and the number of pages in the leaf level of the indexes. The number of pages in the leaf level of an index depends on the number of rows and the size of the rows (again, in terms of bytes) of the rows in the leaf level.

You can see the size of your index by using one of the following commands:

In SQL Server 2000: DBCC SHOWCONTIG … WITH ALL_LEVELS

In SQL Server 2005/2008: querying the dmv: sys.dm_db_index_physical_levels

To see the syntax of these commands and their output, we’ll use some structures created in the credit sample database. Using credit, you can see exactly how these commands work and how they return the details about every level.

NOTE: you can download a zip of a SQL Server 2000 backup of this database here – and since this is a SQL Server 2000 backup, you can restore this to SQL Server 2000, SQL Server 2005 or SQL Server 2008.

USE credit
go

SELECT *
FROM sys.dm_db_index_physical_stats
    (db_id(), object_id('Charge'), 1, NULL, 'DETAILED')
go

DBCC SHOWCONTIG('charge', 1) WITH ALL_LEVELS, TABLERESULTS
go

Using the DMV or DBCC SHOWCONTIG you can get the same picture of the charge table. Using the detailed (or ALL_LEVELS) parameter, you get the entire structure (all levels) for the clustered index (index_id = 1 is always the clustered index, IF the table is clustered). The reason it returns all levels is that the 'DETAILED' mode has been specified.

The clustered index in this table has 1,600,000 rows (DMV column: record_count or SHOWCONTIG column: rows) and these are stored on 9303 pages (DMV column: page_count or SHOWCONTIG column: pages). If you read to the next level which is level 1 because the leaf level is level 0 (remember index levels always start with the leaf level 0 and then go up to the root), you can see that it's number of "rows" is equal to the number of pages in the leaf level... and this keeps going until you get to a root of 1 page. In this case, the clustered index (which is the widest structure of the table) has a very narrow clustering key (the key is on charge_no which is an int) only has a total of 3 levels even though the table has 1,600,000 rows. Ideally, you should run this on a few of your production tables (in a development/test environment) and you can start to get some insight into how big your structures are. However, a BIG factor that you might see in production is fragmentation. If a particular level (or levels for that matter) are heavily fragmented then each level might be wider and less compact (and therefore less performant). Reviewing the DMV columns avg_fragmentation_in_percent and avg_page_space_used_in_percent, you can get a feel for how full each page is. Poor page density reflects that your pages are not as full as they could be but there are many factors for why this is the case: bad row size, splits due to inserts, splits due to updates of varchar columns or even a poorly chosen fillfactor that has left too much space on the pages. However, page density is only one piece of the puzzle and if your avg_fragmentation_in_percent is very low (0-5%) then I wouldn't be over worried about your pages not being entirely full unless you have the time to possibly re-design tables (eg. vertically partition them) and then rewrite your applications to direct your statements at only the appropriate base table. But, another factor to consider is the rate at which your fragmentation occurs as well as when you can fix that fragmentation. This is a HUGE discussion that requires time... And, I want to get back to index structures for now. However, both Paul and I have blogged quite a bit about rebuilding v. defragging indexes and what those operations do/how, etc. In fact, just today, Paul has blogged a Q&A about myths and misconceptions about index rebuild operations. So, I'll get back to internals for now! :)

You can use LIMITED (which is the default mode), SAMPLED, or DETAILED. All three have excellent uses and all use IS locks (to minimize blocking). Limited gives you a quick overview of fragmentation and mostly describes how intact and in order the levels are. Limited is quite clever in that it only scans the first non-leaf level above the leaf to determine how much fragmentation there is... since the non-leaf level always tracks the first entry (and a pointer to the page) then they know EACH and EVERY page in the leaf level by ONLY reading the non-leaf level (which is [typically] a lot smaller and therefore faster). However, because they don't touch every page and determine page density then they only track how out of order the levels are and not how dense/full the pages are (which is also a form of fragmentation). So, if you want a bit more details, you can use SAMPLED. The SAMPLED mode returns the fragmentation from reading every 100th page of the index (or heap). If the table has less than 80MB used (which is 10,000 pages), every page is read instead (which is a DETAILED scan). The DETAILED mode reads every page of every level to calculate the most accurate picture of your tables fragmentation. This is the best form of analysis but also takes the most time.

If you’re interested in learning a few more of the tips/tricks with using this DMV, check out the following script: Using dm_db_index_physical_stats.zip (2.23 KB)

A favorite tip is that the database in which you want to analyze tables does NOT have to be in 9.0 compatibility mode in order to use this DMV. Don’t get me wrong, you will get errors if you try to use this DMV in a database that’s not in 9.0 compat mode; however, if you are in master (which is set appropriately and cannot be changed) and then use the first parameter to target a non-9.0 compat mode database, then this DMV works great. However, a second "gotcha" is for parameter 2... as long as you don’t use 2-part naming for the objectname (2nd) parameter, everything will work as expected. If you specify object_id('tablename') from master for a table that's in credit then object_id will return NULL. The query will still run but against all tables in credit rather than the one you thought you were targeting. If you want to use this DMV across databases, you will need to supply the database name in the first parameter and then make sure that you use 3-part naming for the second parameter.

Now that you are getting to know some of the structures (in terms of seeing physical structures and internals), where do we go from here? The best route to start “finding the right balance” for performance is to know the data and as well as get some general insight into usage patterns (this is probably the hardest component to know and sometimes you only know exactly what’s going on if you profile what’s actually happening in production – is that too late? To a certain extent yes and to another extent no…there are still many things for which you can plan and other things you can confirm or test once the application is running (i.e. Profiler). All of those things together are going to help to “find the right balance”.

Having said that, and having discussed the general internals of a b-tree (and therefore an index structure), what’s the difference between a clustered and non-clustered index? Well… stay tuned, that will be part 3 in this series. And, then (finally), we'll get to appropriate uses for INCLUDE (which was new for SQL Server 2005) and then appropriate uses for Filtered Indexes (a new feature in SQL Server 2008). Also, somewhere in there I'll post a few tips from my TechEd session so that you can start to determine if your indexing strategies are working??

Thanks for reading!
kt

Memorial Day weekend we were in Chicago to celebrate my Father's life. We did a "Celebration of Life" memorial and we had a few drinks (celebratory Meyers, Tonic and lime - which was my Father's favorite drink), we (7 of us) gave a few heartfelt speeches, and a few friends wrote a song (and passed out the words - to which we all sang along) and we grieved... but, in a refreshingly-not-overly-depressing way. I have to admit - it was exactly what I'd want as well. It was a wonderful day filled with memories and friends. After that, we visited with my Mom as well as my Grandmother. Paul blogged a couple of pics (yes, that chair is VERY big!). Then, we were back in Seattle for only one week...back to work...and preparing for TechEd 2008 ITPro week.

As for TechEd being spread over 2 weeks, well... I think it offers some excellent logistical options (smaller size means more possible venues AND/OR it means that they could possibly grow the size for each event). And, for some topics, I think there is a very strong separation between developer and ITPro (admin/ops) but for SQL - I think it's hard to get it perfectly right. I think there's a lot of developers who need to know more about admin/ops just so that they can develop more optimal (and even manageable solutions) and I think that DBAs should have a really good architectural overview of a lot of features to better administer them. So, for SQL, I'd *love* to hear your comments on what you think............

For Paul and I, the decision is relatively simple, we came for this second week for ITPro/Ops. But, we've also spoken at the developer events (and sometimes we even write/present sessions specifically targeted at developers at our SQL Connections shows and/or at User Groups (we just did a local .NET user group in Redmond and the discussion around Indexes became so popular that we're going back in August (for Indexes) and again (tbd) for Disaster Recovery techniques). Basically, developers tend to say...oh, that's why I should x or y or z....... so, maybe next year we'll hit both? Regardless, I'd still like to know what you think? Were you at the Developers event? Do you wish you could be at both? Are you at both?

As for what we're doing - Paul's already blogged it here: http://www.sqlskills.com/blogs/paul/2008/06/05/OffToTechEdUSITProTomorrow.aspx.

But, I thought I'd do a quick recap so that you can get some insight into our week as well as where to find us to come and chat. We'd love to meet you and/or hear your success (or disaster) stories!

Monday

  • Full day pre-con seminar: SQL Server 2008 Overview for DBAs

This is ACTION packed (and a very full day!) and will include giving out the updated SQL Server 2008 HOLs DVD. We weren't really sure we were going to be able to do it... we didn't really burn too many of the CTP6 version of the DVDs, nor were we sure that CTP6 would still be the most relevant. But, it's still excellent to learn on and this time our DVD includes 17 labs:

HOL Lab Filename

Length

Lab Name

Lab Abstract/Description
Using Policy-based Management.doc 75 minutes SQL Server 2008 Policy-based Management Security, best-practices, proper configuration settings - how do you control these things on one or more server? These hands-on labs show you how to implement and leverage the new policy-based management framework to define and control your business rules and your server's compliance for one or more instances of SQL Server 2008. 
Data Recovery Preventative Techniques.doc 75 minutes SQL Server 2008 Data Recovery and Preventative Techniques Hands-on Lab Can you recover from a dropped table? Can you reconcile tables that have become out of sync due to human error? These hands-on exercises show you how to bring a database back online quickly after a table is dropped as well as how to reconcile the differences between a production environment and a recently restored version of your database - so that you can manually merge the recovered data back into your production database. Once all of the recovery techniques are shown, the last exercise shows how DDL triggers can prevent some of these human errors altogether.
Table and Index Partitioning.doc 75 minutes SQL Server 2008 Table Index & Partitioning Hands-On Lab Table and Index Partitioning allows large tables to be managed more granularly. These hands-on labs show you how to implement and leverage these key features: a partition function, a partition scheme and the sliding window scenario.
Database Mirroring Part I.doc 75 minutes SQL Server 2008 Database Mirroring, Part I Database Mirroring allows you to create a secondary (mirror) database to handle requests either automatically or manually, in the event of a diaster at the principal database. These hands-on labs show you how to implement and leverage as well as when and how to use Database Mirroring. You will setup database mirroring in a High Availability configuration (synchronous mirroring with a witness), see the affects of failover, and see how automatic page repair restores damaged pages in the principal or the mirror.
Peer to Peer Replication.doc 75 minutes SQL Server 2008 Peer-to-Peer Replication Hands-On Lab Replication gives you a scale-out configuration where multiple servers participate in bi-directional transaction replication. Setting up and configuring this topology has a few requirements - many of which are minimized by using the Replication Wizards - but all need to be understood to configure a peer topology correctly. These hands-on exercises show you how to implement a peer topology correctly.
Using Performance Data Collection.doc 75 minutes Performance Data Collection in SQL Server 2008 Performance Data Collection brings together many key tuning features into one cohesive toolset. These hands-on labs show you how to create a Management Data Warehouse, setup and control the collection intervals and analyze the results of system data collection sets.
Instant Initialization.doc 45 minutes SQL Server 2008 Instant Initialization Instant Initialization allows data files of any size to be created instantly - eleminating zero-initialization. These hands-on exercises show you how to configure your server's permissions to leverage instant as well as the security vulnerability created by enabling this feature.
Online Operations.doc 75 minutes SQL Server 2008 Online Operations Hands-On Lab Online Operations are critical to the success of any server that needs to be highly available. These hands-on labs show you how to implement and leverage these key features: online index operations, partial database availability and online piecemeal restore.
Database Development Clients Lab.doc 120 minutes SQL Server 2008: Database Development Hands-On Labs The goal of these hands-on lab materials is to get an understanding of when to use one or more of the advanced features of SQL Server 2008 Database Development. After completing these self-paced labs, you will be able to:
* Set up a Database Project using Visual Studio 2008 Team System Database Edition
* Make changes to the database schema and deploy those changes
* Create and edit a project that uses the LINQ to SQL Object Relational mapper
* Use LINQ to SQL to query and maintain a SQL Server database using the managed classes
* Use LINQ to SQL with stored procedures
* Create and edit a project that uses the ADO.NET Entity Data Model mapper.
* Use the ADO.NET Entity Data Model to model a many-to-many relationship in the database
* Use the ADO.NET Entity Framework classes and LINQ to Entities to query and update a database
* Use Visual Studio 2008 to quickly get an ADO.NET Sync Services application up and running.
* Set up SQL Server 2008 Change Tracking
* Use ADO.NET Sync Services with SQL Server 2008 Change Tracking
Snapshot Isolation.doc 75 minutes SQL Server 2008 Snapshot Isolation Hands-On Lab The goal of these hands-on lab materials is to get an understanding of the appropriate uses of transaction isolation levels as well as how snapshot isolation affects conflicting readers and writers.
Database Mirroring Part II.doc 120 minutes SQL Server 2008 Database Mirroring, Part II Part II of the Database Mirroring HOLs allows you to go through setup, implementation and numerous failover scenarios - step-by-step. While Part I offers quicker setup through SQLCMD scripts, Part II works through the setup process more slowly allowing you to see how things work together. This lab is longer but also goes through changing the mirroring configuration as well as forcing failover. Part I should be completed first and Part II should be completed only if time permits.
Service Oriented Database Architecture.doc 120 minutes SQL Server 2008 Service Oriented Database Architecture Hands-On Lab Manual The goal of these hands-on lab materials is to get an understanding of how and when to use Service Broker in deploying a service-oriented database application.
Database Snapshots.doc 75 minutes SQL Server 2008 Database Snapshots Hands-on Lab The goal of these hands-on lab materials is to get an understanding of how to use the Database Snapshot feature of SQL Server 2008. After completing these self-paced exercises, you will be able to:
* Understand how to create a database snapshot
* Understand how to investigate file sizes and sparse file configuration (using both T-SQL queries and Windows Explorer)
* Understand the benefits and challenges with creating multiple snapshots
* Understand how a database snapshot is created when transactions are in flight as well as when they're not
* Understand how to use database snapshots for testing and reverting databases
* Understand the requirements to drop database snapshots and drop databases that have database snapshots
* Understand how to create a database snapshot on a mirror database
Dynamic Management Views.doc 75 minutes Understanding and Using DMVs Hands-on Lab The goal of these hands-on lab materials is to get an understanding of the more advanced new features of SQL Server 2008 that give access to server information that can be used for performance tuning, server health monitoring, and problem diagnosis. After completing these self-paced labs, you will be able to:
* Determine what DMVs exist, what their input parameters are, how and where their data is stored, and be able to persist DMV data to your own tables.
* Access information from the query plan cache, including determining frequently executed queries and their query plans.
* Access physical statistics information about indexes (e.g. fragmentation).
* Access information about tempdb space utilization.
Resource Governor in Action.doc 45 minutes SQL Server 2008 Resource Governor Hands-on Lab The goal of these hands-on lab materials is to get an understanding of when to use one of the more advanced features of SQL Server 2008: Resource Governor.  After completing these self-paced labs, you will be able to:
* Understand appropriate uses for Resource Governor
* Create Resource Pools
* Create Workload Groups
* Monitor Resource Usage
Understanding Spatial Data.doc 120 minutes SQL Server 2008: Understanding Spatial Data Hands-on Lab The goal of these hands-on lab materials is to get an understanding of one of the more new features of SQL Server 2008:  Spatial Data Support. After completing these self-paced labs, you will be able to:
* Understand what spatial data is
* Understand the different types of spatial data
* Create instances of spatial data
* Investigate the properties of spatial data
* Query the relationships between different instances of spatial data
* Integrate spatial data into a managed code application
* Move spatial data between managed code and the database
* Create spatial data graphically using WPF 
Using SQLCMD.doc 75 minutes Understanding Command-line Management with SQLCMD in SQL Server Hands-on Lab The goal of these hands-on lab materials is to get an understanding of one of the more new features of SQL Server 2008:  Spatial Data Support. After completing these self-paced labs, you will be able to:
* Use SQLCMD with an initialization file, system environment variables and parameters to create customized “master” scripts for automation and administration
* Use SQLCMD and SQL Server Management Studio with the Dedicated Admin Connection for troubleshooting
* Use SQL Server Management Studio to modify and execute SQLCMD mode scripts
* Upgrade databases from SQL Server 2000 using a parameterized script running with SQLCMD

This is a GREAT resource for playing with a lot of these new technologies and it's exciting that we have enough copies to give away to our pre-con attendees! A few of these are featured as HOLs at this year's TechEd as well and some of these can also be found with our prior Jumpstart resources.

Tuesday

  • 13.15 - 14.30 (Room N230) DAT354 Are Your Indexing Strategies Working? (featuring me as speaker/presenter)
  • 15.00 - 16.00 (TechEd Online Stage) Panel: Leveraging SQL Server Technologies to Build a Solid High-Availability Strategy
  • 16.00 - 18.00 DAT track booth (green)

Wednesday

  • 10.15 - 11.30 (Room N220D) DAT375 Corruption Survival Techniques: From Detection to Recovery (featuring Paul as speaker/presenter)
  • 11.30 - 14.45 DAT track booth
  • 15.00 - 16.00 Blogger's Lounge

Thursday

  • 10.15 - 11.30 (Room S230E) DAT363 Essential Database Maintenance (we're co-presenting this one)
  • 11.45 - 13.00 Speader Idol judging (I'll do my best to play Paula and I have hopes that Paul won't play Simon but he does have that British thing going for him)
  • 14.30 - 18.00 DAT track booth

Friday, we fly home... then, we're going to test all of our dive gear and take a little dive vacation at the end of the month. Hopefully, I'll be able to post a couple of nice underwater shots!

Oh, and I've finally tweaked my Indexing post (the one that survived the drive corruption - oh, but as an update to that post... Even though I got that drive repaired, almost all of the jpgs, some of the pdfs and even a few of the Office files were still corrupt. Office opened a few of them and "repaired" them on open (which was really cool) but I did lose the photos I had taken that weekend (well, all of the ones that I removed from my camera's SD card). Anyway, I plan to (well, hope to) post the Indexing post (part 2 to this one) tomorrow!

Cheers,
kt

PS - It's hot as hell here... and the humidity is NASTY!!! But, it beats the SOLID rain that we've been having in Seattle...

OK, we were in Iceland and then Florida for our Accidental DBA workshops and both went really well. People agree that there are quite a few involuntary/accidental DBAs out there and overall, we helped quite a few to see a lot of options for better performance, availability, recovery, and/or just manageability.

So, this is our "resources post". We waited until after the SQL Connections delivery to post these as we figured we might add a few more to the list (as is typical when you deliver content more than once - it's really never the same twice!).

Also, I used a few "interactive" (or build) slides in my presentation - specifically on transaction log backups and the concepts of "clearing the log" which really only clears the inactive portion of the log. To help you visualize this, I've added these slides here: TrippRandal_ClearingTheLog-BuildSlides.zip (647.2 KB).

Finally, we've taken all of the scripts that we demo'ed and placed them on SQLskills on our Past Events page here: http://www.sqlskills.com/pastConferences.asp.

And, if you were there and you think we missed something, feel free to ping me (or Paul!) with an email and we'll make sure to update this resources post (and/or [at least] help you find it what you're looking for!!).

Next stop - Microsoft TechEd ITPro in June (we're back in Orlando again)!
kt

In my blog post on my new sp_helpindex proc (sp_helpindex2), I mentioned that the indexes in my sample were not necessarily a recommended set of indexes - just a test set of indexes. So... in this post, I thought I'd start a series on indexes, limitations and best practices/uses... Especially, why/how to best choose when to use INCLUDE v. having columns in your key. To start, I thought I'd give some background, as well as limitations that exist in various releases from 2005 to 2008 CTP6 (Feb CTP), plus what's expected in the SQL Server 2008 RTM (ah... I did say "expected" so don't come back and yell at me if/when I'm wrong ;-))

First, let's go through a few rules and limitations and background:

SQL Server 2005

  • 250 total indexes per table: 1 clustered index and up to 249 nonclustered indexes (no, this is not a goal!)
  • The index key can be up to 16 columns OR 900 bytes - whichever comes first...
  • The leaf level is limited just as a table is limited to 1024 columns (and, all column types are acceptable in the leaf level of an index - even LOB columns)
  • Statistics are kept for every index (so, up to 250 index-related stats) and there can also be statistics on columns or sets of columns in addition to the index-related stats. In earlier releases, statistics used index ids and as a result, the number of statistics were limited to 250 total minus the statistics used by indexes... in SQL Server 2005, they changed to having statistics kept/managed separately (use sys.stats to see them). As a result of using sys.stats, you can now have 2000 statistics on a table, in addition to the 250 (total) indexes and their statistics. If you want to test this out (and check it on various versions of SQL Server), use this script to setup a test database, a test table and then use dynamic string execution to loop through (until it errors) with creating nonclustered indexes and statistics: IndexMax.sql (853.00 bytes).

SQL Server 2008 CTP6

  • So far, it seems as though most of the maximums have not yet been lifted...
  • 250 total indexes per table: 1 clustered index and up to 249 nonclustered indexes (and this number  - for CTP6 - includes filtered indexes AND spatial indexes too!)
  • The index key limit hasn't changed (it can be up to 16 columns OR 900 bytes - whichever comes first)
  • The leaf level is still limited just as a table is limited to 1024 columns (and, all column types are still acceptable in the leaf level of an index)
  • Statistics in CTP6 seem to be limited to only 2000 stats per table...

SQL Server 2008 RTM (expected/target... no guarantees on this one :)

  • 30,000 columns per table (mostly to allow sparse columns)
  • 1,000 total indexes per table: 1 clustered index and up to 999 nonclustered indexes. This is also not a goal BUT, it makes sense because of both sparse columns and filtered indexes. Both Paul and I will try to post some entries about sparse columns and filtered indexes in the coming days...
  • The index key limit won't change
  • The leaf level is will be limited just as a table is limited to 30,000 columns (and, all column types are still acceptable in the leaf level of an index)
  • Statistics are also said to be increasing and likely to 30,000... And, for having extra statistics just sitting around and possibly not being used - well, outside of a minimal amount of disk space taken by the stat blob (which does start to get interesting at 1,000s I suppose), even stats that don't get used don't really create much of a problem. So, I'm OK with this one increasing - even significantly - but I have to admit I'm somewhat nervous about the significan't increase in indexes.........

So... you can have A LOT more indexes in SQL Server 2008 but just because you can - DOES it mean that you should?!

And on that - I'll leave you hanging for my next post where I start to talk about WHY they're increasing this (hint: sparse columns and filtered indexes = more columns/more indexes)....

Have fun,
kt

IMPORTANT NOTE: Use the updated version of sp_helpindex2... see this post: Updates (fixes) to sp_helpindex2

OK - so this has been frustrating me for many months... when you create indexes with included columns (which was a new feature of SQL Server 2005), they're not shown by sp_helpindex or by DBCC SHOW_STATISTICS. I understand this not showing for statistics because included columns are not factored into the histogram (that's only the high order element which is the first column in the index) OR the density vector (which only shows the densities (or averages) for the left-based subsets of the key). So, why doesn't sp_helpindex show it? Well... I guess it just didn't get updated for SQL 2005. So, in SQL 2008, I was hoping I'd not only see included columns but also filtered indexes... well, neither is there and sp_helpindex is still the same old proc. Don't get me wrong, you can use SSMS to see all of the index properties for a single index (pane, by pane for each property) OR you can run queries to find the included columns for a given index:

SELECT
(CASE ic.key_ordinal WHENTHEN CAST(AS tinyint) ELSE ic.key_ordinal END) AS [ID],
clmns.name AS [Name],
CAST(COLUMNPROPERTY(ic.object_id, clmns.name, N'IsComputed') AS bit) AS [IsComputed],
ic.is_descending_key AS [Descending],
ic.is_included_column AS [IsIncluded]
FROM sys.tables AS tbl
   
INNER JOIN sys.indexes AS
      
ON (i.index_id >AND i.is_hypothetical = 0) AND (i.object_id = tbl.object_id)
   INNER
JOIN sys.index_columns AS ic 
      
ON (ic.column_id >AND (ic.key_ordinal >OR ic.partition_ordinal =OR ic.is_included_column != 0)) 
         
AND (ic.index_id = CAST(i.index_id AS int) AND ic.object_id = i.object_id)
   INNER
JOIN sys.columns AS clmns 
   
ON clmns.object_id = ic.object_id AND clmns.column_id = ic.column_id
WHERE (i.name = N'[MyIndex]') AND ((tbl.name = N'[MyTable]' AND SCHEMA_NAME(tbl.schema_id) = N'[MySchema]'))
ORDER BY IsIncluded, [ID] ASC
but, there isn't a nice clean way to show all of the included columns for all indexes for a particular table... until now :)A couple of weeks ago I sat down and rewrote sp_helpindex. I was actually on a plane from Hyderabad to Frankfurt or from Frankfurt to San Fran or from San Fran to Seattle (it was a long day :) and I was using (and well, forcing myself to learn how to use :) my new Vista laptop. OK, that's a HUGE story in and of itself and it definitely warrants its own post but I'll sum up the story with the fact that I had to purchase a new laptop while in Hyderabad because BOTH my primary laptop (T61p) AND my backup laptop (T60p) BOTH (yes, BOTH!!!) suffered catastrophic disk failures on their boot drives within 24 hours of each other. In the end, I really cannot believe the "coincidence" of two laptops crashing within 24 hours of each other. Yes, I thought MTBF too (at first) but the laptops were two Lenovos - one Lenovo (the T60p) was purchased in Feb 2007 and the second, a Lenovo T61p was purchase in Oct 2007. And, it was the T61p that went first. The only thing I can even begin to speculate about and/or think to attribute it to (as I was in India for 17 days from Mar 3 through Mar 20 and this all started on Mar 17) was an overactive metal detector at the hotel at which I was staying (or something related to St. Patrick but I've since ruled that out - and no, I wasn't drinking green beer either...). OK, I really need to do another post to give you all of the details about this trip BUT, I did get a new laptop... and, having just bought it only shortly before I flew back I felt like I really needed to get my money's worth so I just *had* to work on the flights home (ah, security with *3* laptops was NOT fun and I'm *VERY* glad that none of them asked me to "boot" my laptops to prove they were working... that could have been a VERY bad situation... lol).

OK - so back to the story... I was working on the flights and I was preparing to deliver some content on the Friday after I returned (yes, I taught a full day in India on Wednesday then flew back leaving India at 2:15am Thursday morning so that I could arrive back in Redmond at roughly 7pm Thursday night - about 30 hours later - and then teach Friday morning for an 8:30 start time... ah, I was *really* tired on Friday night :). Anyway, in preparing, I decided that I finally needed to re-write sp_helpindex. When I was first writing it, I was only thinking of SQL Server 2005. So, here's the 2005 version that I wrote: sp_helpindex2_2005.zip (2.71 KB).

So, I had wanted to blog that when I got back to Redmond but in preparing for the trip we're on now AND rebuilding my primary and backup laptops, well, it got tabled. So now, today, Paul and I are in Iceland (working with our great friends at Miracle Iceland) and we're teaching "the Accidental DBA" (this past Monday) and SQL Server 2008 New Features in Database Infrastructure and Scalability (Tue through Thursday)... I was giving a lecture on Filtered Indexes in SQL Server 2008 and I, once again, found myself needing a better sp_helpindex. So, when Paul got up to talk about Compression (which is no short lecture for him :), I had time to rewrite sp_helpindex... again. And, here's what I ended up with...

exec sp_helpindex2 'member'

index_name index_description index_keys included_columns filter_definition
member_corporation_link nonclustered located on PRIMARY corp_no NULL NULL
member_ident clustered, unique, primary key located on PRIMARY member_no NULL NULL
member_region_link nonclustered located on PRIMARY region_no NULL NULL
NCIndexCoveringLnFnMiIncludePhone nonclustered located on PRIMARY lastname, firstname, middleinitial phone_no NULL
NCIndexCoversAll4Cols nonclustered located on PRIMARY lastname, firstname, middleinitial, phone_no NULL NULL
NCIndexLNinKeyInclude3OtherCols nonclustered located on PRIMARY lastname firstname, middleinitial, phone_no NULL
NCIndexLNOnly nonclustered located on PRIMARY lastname NULL NULL
QuickFilterTest nonclustered located on PRIMARY lastname phone ([lastname]>'S' AND [lastname]<'T')

So, in the end, I can quickly see whether or not my index has included_columns and/or a filter_definition. Don't get me wrong, these indexes above are NOT necessarily a good combination of indexes (or recommendation of ANY kind) to have - these were just created to make sure that my code works. And, as my good friend Gunnar would say - "it's not my best code but it's not my worst code either" <G>. And, so, here it is: sp_helpindex2_2008.zip (2.75 KB).

Pretty darn useful for sure! Oh, and I used the undoc'ed sp_MS_marksystemobject so that I could still create the sp_ in master but then execute it in all other databases. It's frustrating that this behavior (with sp_ named objects) no longers works in 2005/2008 but at least the sp_MS_marksystemobject still sets the behavior so that we can create this one proc in master but use it in all other databases.

Have fun!
kt

OK, I still have a way to go in learning about data compression in SQL Server 2008 but one thing that I do know is that nothing is free. So, the trade-off will be performance (i.e. CPU) v. space. And, that's not really a new trade-off wrt to compression. Sometimes that trade-off has other benefits that still minimize the overall cost (for example, backup compression compresses in-memory and before it goes to disk... this actually makes the overall backup process faster because the actual backup written to disk is smaller). However, if we're talking about data and data access, then we need to think more about how the data is going to be used as well as the impact on performance. I can definitely think of many reasons to compress older (and read-mostly, if not read-only) data (mostly due to volume) but depending on the queries and the impact to uncompress it (based on the volume of data being accessed), I'm going to do a lot of testing before I compress high performance/OLTP data. To help estimate the savings on space, SQL Server 2008 offers a stored-proc: sp_estimate_data_compression_savings.

Compression in SQL Server 2005
SQL Server 2005 offers the ability to have read-only data compressed using Windows NTFS file compression. File compression is only supported for secondary non-primary data files and only when they're set to read-only. If the entire database is set to read-only then all files (incl. the primary and log) can be on compressed drives. While supported, and it can make sense to do this when you have large amounts of historical data, it's still not very granular.

The other form of compression in SQL Server 2005 was introduced in SP2 as data compression for the decimal/numeric data types, called vardecimal. First, you enable compression at the database level and then you turn it on at the table level. The primary form of compression used by vardecimal is when your actual values are generally much smaller than the defined/declared decimal/numeric column. For example, if you've chosen to define a lot of columns as precision/scale (38,4) then as a decimal column each value (per column, per row) will take 17 bytes whether you use all of it or not. If you only store the value 87.5 (which would normally take only 5 bytes as a decimal(3,1)) then you're wasting 12 bytes. This form of compression will still be supported in SQL Server 2008 so if you're interested in how the vardecimal type works, check out this whitepaper. As for the new forms of compression... row-level compression is similar to vardecimal, but the other forms are quite different, and very interesting (especially the page-level dictionary compression)!

Compression in SQL Server 2008
In addition to offering support for NTFS file compression and vardecimal, SQL Server 2008 offers row-level compression or page-level compression (which includes row-level compression) AND it offers the ability to turn these on at the partition-level or at the table-level for all partitions. While I think the per-partition option is excellent, you might still want to separate your OLTP and read-only data into separate tables for other benefits (like online index operations which I mentioned here) but, the "table-level only" options are certainly starting to decrease! And, more granular options always means better manageability.

So, how does compression work in SQL Server 2008:
   Paul wrote about backup compression here.
   Sunil wrote about data compression here and here.
   Chad Boyd wrote about both here.

Paul and I will post more on compression... I really want to get some numbers regarding performance and Paul will dive into all of the internals using DBCC PAGE (go figure! :).

Enjoy!
kt

OK, so thought I'd do a follow up to the post I did a couple of days ago titled: The perils of case-insensitive data (and our life in tangent-land). The reason I'd like to followup on it is that I received some excellent comments and I want to make sure that you're all aware of the tips/tricks and recommendations that there were (some of you may not have returned to see all of the comments). Really, I was impressed by the speed at which people responded as well as the great comments (and things I learned!). It just reminds me of the fact that none of us can know everything AND that our SQL community is awesome in its willingness to share and communicate.

As for the tips/tricks and "yes, duh!" realizations I came to... here are the interesting points from the comments:

First - why did my comparison work for a single character (e.g. '%A%') but not when I did a character range (e.g. '%[A-Z]%')? Well, it was because it was unicode! This was a "right! duh!" realization that I think I dreamed after I wrote this BUT, Hugo Kornelis is exactly right in his comment. Thanks Hugo! Here is a direct cut/paste of his comment:

The reason [A-Z] doesn't work, is that a collation doesn't just govern case sensitive vs case insensitive but also (amongst others) the sort order of letters. And most case sensitive collations sort like A - a - B - b - ... - Z - z. So [^A-Z] would include all letters except the lowercase z.

You can use [A-Z] to find uppercase characters in a binary collation (since all uppercase characters are in one range of ASCII, and all lowercase characters in another), but not in any other collation.

And, you can check out more from Hugo on his blog: http://sqlblog.com/blogs/hugo_kornelis/default.aspx

Second - the comparison query that I wrote all together (where I stated each letter individually in the WHERE clause) only took a few seconds to write (thanks to cut and paste :) AND it did work...And, sometimes getting something to work and moving on is all we can do (come on - you've ALL been there, eh? :). However, my main comment was that "it wasn't pretty". A much more elegant and unbelievably simple solution came from David R Buckingham (aren't the great answers always the really simple ones :)). Here is a direct/cut/paste of his comment:

The following query will return any fully lower case names in the table:

SELECT LastName
FROM Person.Contact
WHERE LastName COLLATE Latin1_General_CS_AS_KS_WS = LOWER( LastName ) COLLATE Latin1_General_CS_AS_KS_WS

I don't believe that David has a blog... maybe he should :).

Third - a very cool and clever trick that came in from Denis Gobo is related to the performance of repeatedly doing case-sensitive searches on a case-insensitive column. I suggested that creating an additional column (preferably a computed column that uses the case sensitive collation) would be an easy and optimal solution. This is still definitely true when the case-insensitive values are NOT selective enough to warrant using an index and the case-sensitive values are... However, if both the case-sensitive AND the case-insensitive values are reasonably selective then the trick that helps is from his comment. Here is a direct cut/paste of his comment:

Kimberly, the way to force an index seek is to do this

SELECT *
FROM MyTestContacts
WHERE Lastname = N'adams'
AND Lastname COLLATE Latin1_General_CS_AS_KS_WS = N'Adams'

The WHERE might return more than one row but the AND will return only the case sensitive one

I wrote about that a while back here:
http://sqlservercode.blogspot.com/2007/05/make-your-case-sensitive-searches-1000.html 

And, you can check out more from Denis on his blog: http://sqlservercode.blogspot.com/

Now, as for the issues related to creating a view in a database that has a different collation from the server's collation... Here, I'm fairly certain that there's still a bug. However, I'm happy to say that I don't think that it's the most likely situation that exists for collations. I think the two most likely situations are:

  1. The server has one collation. The database inherits that collation. The database developer makes column level collation changes throughout the db. This seems to work well. OR
  2. The server has one collation. The database has a different collation. The database developer consistently uses that collation throughout their app. A good example of this is where people have case-sensitive databases on case-insensitive servers. This works fairly well (although there are some issues wrt to temp tables, etc. and default_collation is a good thing to know).

I guess there's even a third one where column level changes are made in a database whose collation is different from the server but where there aren't any views that also change the collation to yet a different collation (and this is where there seems to be a bug).

So, this was an excellent (and reasonably fun :) :) exercise to go through wrt collations. And, this is how I (we?) learn! I really want to thank everyone for reading - and commenting/sharing! - the things they learned/knew. That's part of why I love the SQL Server community. And, speaking of which, I thought I'd end this entry with a few community links - as a reminder to everything that's out there:

Thanks for reading! Thanks for commenting!
kt

Categories:
SQL Server 2005 | Tips

ITForum.gifITForum.gif

 

 

 

OK, so after SQL Connections in Las Vegas, Paul and I head off to Barcelona for the second week of TechEd's two week event (week one for developers and week two for IT professionals). November's definitely a busy month. So, if you're in the US - we hope to see you at Connections and if you're in Europe, we hope to see you at Microsoft TechEd ITForum 2007.

 

Here's what we're doing:

Sessions

  • (DAT205) The Next Release of Microsoft SQL Server: Manageability Overview

The next release of SQL Server will contain exciting new manageability features targeted at reducing total cost of ownership. Come learn more about what's in store in areas like policy-driven administration and performance data collection and analysis. The session focuses on the database engine.

  • (DAT301) SQL Server Indexing - Unravelling the Unknown

Knowing tips and tricks to indexing is extremely helpful and will help you to solve "known" query problems. But what's lurking in the unknown? Is SQL Server using your indexes? Or, do you have a bunch of indexes sitting around wasting space and negatively impacting performance? Finally, SQL Server 2005 has an answer! SQL Server 2005 DMVs (Dynamic Management Views) can provide you with valuable information about your current indexing strategies, what should be removed, and even what's missing. Do you know how to find this information, leverage it, and then programmatically respond to it? Come to this session to learn a few tips and tricks as well as how to figure out the unknown indexing problems!

  • (DAT305) Secrets to Fast Detection and Recovery from Database Corruptions

How can you tell whether your data is corrupt? If you have corruption, how do you work out what’s wrong with the database? How do you ensure you have a valid backup? If you don’t have a valid backup, how and what do you repair? If you do have a backup, how do you work out whether you should restore or repair? And at what granularity? How do you go about determining what went wrong in the first place? It’s all about limiting downtime and data-loss when a corruption occurs - from knowing the tools to understanding the choices to planning a successful strategy. Some of the features discussed:

  • Torn-page detection and page checksums
  • IO read-retry
  • Backup checksums
  • Consistency checks (DBCC CHECKDB and related commands)
  • Database repairs

Facing database corruption is almost inevitable in every DBAs career - make sure you're prepared when it happens to you.

Chalk-talks

  • (DAT01-IS) SQL Server Upgrade Best Practices, Tips, and Tricks

Even though SQL Server 2005 has been out for a while, many companies are only just getting ready to upgrade. Come to this session to learn best practices, tips, and tricks distilled from two years of customer experiences. We'll also touch on some issues you'll face going to SQL Server 2008 when it's released next year. Come to this session to learn and share - bring your questions and experiences!

  • (DAT02-IS) SQL Server 2005 Database Mirroring: Setup to Implementation to Monitoring

Database Mirroring is one of the most exciting technologies in SQL Server 2005. With more and more people including it in their Disaster Recovery Strategies, it's important to know when to implement Database Mirroring as well as the implications of the architecture you choose. In this session there will be no slides just demos that will explore how Database Mirroring works in its various configurations and how that may effect your performance. Join us and see database mirroring in action and get your questions answered!

  • (DAT07-IS) DBCC Internals

All DBAs should have heard of (and used) DBCC – especially for consistency checking. Get down deeper than 400-level with this session on how some of the most important DBCC commands work. Topics covered include CHECKDB, SHRINKFILE, INDEXDEFRAG, and more.

Instructor-led labs

  • (DAT03-ILL) SQL Server Always On Technologies Instructor-Led Lab: Part 1 - Database Mirroring

See Database Mirroring in action! From implementation to monitoring to failover, database mirroring provides an ideal solution for many disaster recovery scenarios and this session will prepare you to handle them with minimal downtime or data loss.

  • (DAT04-ILL) SQL Server Always On Technologies Instructor-Led Lab: Part 2 - Database Snapshots

Database Snapshots are useful in many situations: database maintenance, data recovery, and point-in-time data access. You can even create a snapshot on a mirror database to get better return-on-investment (ROI) on your high-avaialbility (HA) investments. In this session we will explain how database snapshots work as well as go through several exercises, including working with multiple database snapshots and creating database snapshots on a mirror database.

  • (DAT05-ILL) SQL Server Always On Technologies Instructor-Led Lab: Part 3 - Online Operations

The bane of any DBA's life is to have to take data offline to perform maintenance or recover from a disaster. The various Online Operations in SQL Server 2005 alleviate much of this frustration. This session will show you how to move a table online for better isolation and control, partition a table online, access a database that's partially damaged, and perform online piecemeal restore.

Lunchtime Demos

  • (DAT01-PD) Database Recovery Techniques

In this fast-paced demo session nasty things will be done repeatedly to database. Then the methods and approaches to recovery will be shown. Not for the faint-hearted!

So, just like SQL Connections the week before, serious amounts of info with tips and tricks for you to take home and implement!

It's going to be a great week. We hope to see you there!!

Kimberly (and Paul)

120x240_SQLConn_IBT.jpg120x240_SQLConn_IBT.jpg

 

 

OK, it's about that time again - the Fall conference season - is here! Building on our co-presented Database Maintenance workshop at SQL Connections in Orlando, Paul and I are doing a *ton* of stuff at SQL Connections this Fall. The conference is back at the Mandalay Bay hotel and officially runs from November 5th to 8th, with pre-con workshops on the 5th. But, after Spring, we decided to significantly add recovery content to our maintenance content that we decided to have our maintenance content as a pre- pre-conference workshop on the 4th and then on the 5th, we have all new content on Disaster Recovery and Lessons Learned. AND, after the week of sessions, we decided to add a HANDS-ON (bring your own laptop) post-con workshop on the 9th! Our day off is Tuesday the 6th as it's Microsoft day... with a session line-up looks great with lots of juicy details about SQL Server 2008 - as well as some best-practices sessions for those of you who are happy with SQL Server 200x for now.

 

 

Here's what we're doing:

Workshops

  • November 4th - Pre-pre-con: SPR301: SQL Server Database Maintenance: From Planning to Practice to Post-Mortem

No matter how much effort you spend on the design of your database, if you don't maintain it in production then it will suffer from performance and manageability problems. The key to continued performance and smooth operations is a well thought-out and automated database maintenance plan. This full-day workshop has three sections: planning, practice, and post-mortem. Planning for database maintenance actually starts with database design, so one of the things covered will be how to avoid design choices that limit database maintenance or contribute to maintenance problems. We'll discuss a laundry-list of maintenance problems and then explore how to tell if you need to mitigate them, strategies and best-practices for doing so, and how to avoid having your mitigation choices cause unforeseen and undesirable side-effects. Topics covered will include database files (shrink, grow, virtual log files, log size/management), consistency checks and corruption detection, fragmentation, statistics, backup/restore (options, granularity, strategies) and recovery models. The workshop will vary between 200-400 level covering ALL the key concepts of maintenance features. Finally, we'll spotlight some real-world examples where people made good and bad choices and discuss how you can repeat or avoid them, respectively. If you're wondering how to bring your database back under control, and keep it there, then this full-day workshop will help you tame maintenance problems whether you're a full-time system administrator or a reluctant DBA.

  • November 5th - Pre-con: SPR303: SQL Server Disaster Recovery: From Planning to Practice to Post-Mortem

Every DBA's nightmare is having down time and data loss and not knowing how to recover. However, designing and implementing a successful disaster recovery strategy is easier said than done. It's about asking all the right questions and figuring out all the best answers for your situation. This full-day workshop has three sections: planning, practice and post-mortem. Planning is a critical part of disaster recovery, but the most-often disregarded. Topics we'll cover here include: How do you choose technologies to fit requirements and effectively use key features of SQL Server 2005? How do technology choice affect workload performance? Putting a well-thought out plan into practice requires even more planning and in this section we'll discuss technology implementation, building step-by-step recovery/operation guides for when disasters happen, and, most importantly, testing your plan by simulating real problems. In the final section, we'll spotlight some real-world examples where people made costly mistakes and show you how they could have been avoided with a little planning and practice. If you've ever had nightmares about disaster recovery (or actually had a disaster!) and been at a loss for what to do, then this full-day workshop will give you the direction and technical details you need for success!

  • November 9th - Post-con: SPS302: SQL Server - Put Your Knowledge Into Action (Bring Your Own Laptop)

After a week of learning and watching demos - spend your last conference day putting your knowledge into action and diving deeper into the implementation details. Bring your own laptop to install our VPC environment setup with hands-on lab exercises to walk you through some of our most important features in Database Maintenance and Disaster Recovery. All labs will be ILLs (instructor-led labs) with supporting hands-on lab content *and* you will walk away with your own copy of the DVD to continue the exercises back at your office. You can attend without a laptop but your experience will be significantly better with one! This is meant as an advanced workshop and you're expected to bring a reasonable laptop configuration in order to participate: * Virtual Server or Virtual PC - already installed * At least 1GB of physical memory w/512MB dedicated to the VPC environment (2GB is preferred w/1GB dedicated to VPC) * 12 GB of physical disk space (20+ GB is preferred) * DVD Drive

Sessions

  • SDB351: Follow the Rabbit - Interactive Q&A on Availability

In this session, Kimberly Tripp and Paul Randal will have only 5-10 slides. The focus of this session is on mixing availability technologies to create the best overall architecture to minimize downtime and data loss. In general, we're going to focus on best practices and then open up to your questions so that you can drive the discussion! This session might not seem as structured as other sessions, but you'll be surprised at how informative and fun it is! Grab your lunch and come back - we'll probably still be hanging out!

  • SDB350: SQL Server Table Strategies - Designing for Performance and Availability

Often tables are designed based solely on the data that needs to be tracked (here's a column name, here's a data type - done!). Unfortunately, design does not usually take into account how the data is going to be used OR how SQL Server uses the data. Knowing the internals of table structures as well as the optmizations that come with good design will make your database truly scalable. Come to this session to learn some internals as well as various design strategies such as vertical and horizonal partitioning. Additionally, are there any other features that require changes in your design and thinking? For example, online index operations impact design because of the limitations that exist with partitioning and LOB columns. If you want to scale, you need to be here!

  • SDB347: SQL Server Indexing for Performance - Finding the Right Balance

In terms of performance tuning, there are few silver bullets. If I had to choose ONE area that improves performance the most (when designed appropriately!), it's indexing. However, indexing strategies depend on the data and even more so, the usage of the data. Come to this session to see what indexing strategies help the base table the most as well as how to optimize your worst performing queries.

  • SDB348: SQL Server Indexing Strategies - Are You Sure?

Knowing tips and tricks to indexing is extremely helpful and will help you to solve "known" problems. But what's lurking in the unknown? Is SQL Server using your indexes? Or, do you have a bunch of useless indexes? Finally, SQL Server 2005 has an answer! SQL Server 2005 DMVs (Dynamic Management Views) can provide you with valuable information about your current indexing strategies, what should be removed, and even what's missing. Do you know how to find this information, leverage it, and then programmatically respond to it? Come to this session to figure it out!

  • SDB349: Follow the Rabbit - Interactive Q&A on the Storage Engine and the Relational Engine

In this session, Kimberly Tripp and Paul Randal will have only 5-10 slides. Each slide cover topics for discussion as well as the reason(s) for why something might be behaving badly and/or things to try to solve your problems. In general we're going to focus on best practices and then open up to your questions so that you can drive the discussion! Paul will focus on the SE (Storage Engine) and internals and Kimberly will focus on the RE (Relational Engine) and query tuning/performance. This session might not seem as structured as other sessions, but you'll be surprised at how informative and fun it is!

At this event, you'll be able to get what most conferences offer (and that's breadth - in terms of session choices, etc.) but with the large number of workshops and the detailed planning that went into sequencing the conference sessions, you'll also get depth that no other conference offers. Serious amounts of tips and tricks for you to geek-out on with us and take home to immediately apply.

It's going to be a great week. We hope to see you there!!

Kimberly (and Paul)

OK, let me start by saying that I absolutely love when a feature improves in granularity options. Better granularity in locks means that contention is reduced and concurrency improved. And even though the overhead to manage smaller locks (and typically more of them) is usually higher - the improved concurrency benefits often significantly outweight the costs. Additionally, design is often simplified as more granular locks typically means you don't have to work as hard to minimize contention. Let me give you some history...

In the old days (ok, remember, I started working with SQL Server when I was 12 :) :), SQL Server used to have page-level locking (all releases prior to SQL Server 6.5 sp3). In SQL Server 6.5 sp3 they made an internal change to allow "insert row locks" but that was very targeted in what it improved (in terms of locking). However, in SQL Server 7.0 the locking architecture completely changed (as well as the SE and most of the RE) and that's where they introduced true row-level locking. This resulted in a significantly reduced complexity in table design. No longer did we have to choose clustered indexes to remove page-level locking (and therefore insert hotspots). And, in fact, some designs solely improved their performance by upgrading. The nice thing about internal changes like these is that they mean you can get away without knowing all of the internals, not worry as much about design and yet still get gains in performance. All of which is good.

However, if you do know the internals and you leverage this knowledge then you might be able to see even greater gains. With the change in locking from page to row (as well as based on other changes to the internal dependencies of non-clustered indexes on the clustering key), databases whose indexing strategies changed between 6.5 to 7.0 made the greatest gains in performance. How did they change - I've blogged about "the clustered index debate" a few times so I'll stay away from that one here... but, the key point is that while these changes might allow you to do more with less work - a bit more work to truly leverage the new features/changes might result in the best combination!

And so, that's what brings me to partition-level lock escalation. This is an absolutely necessary step to truly allowing SQL Server to treat partitions like mini-tables. Here are a few of the concerns I've had with regard to SQL Server 2005 table and index partitioning:

* lock escalation can still occur between the read-only and read-write portions of your partitioned table if the read-only portion is accessed by large queries that escalate (in SQL Server 2005 escalation is either row to table or page to table)
* indexes must be exactly the same for all partitions (not related to escalation but it does have bearing on my solution)
* index rebuilds are supported at the partition level; however, partition-level rebuilds must be performed OFFLINE. Only table-level index rebuilds can be performed online (again, not directly related to escalation but it's another problem around blocking)

And, this last one is very frustrating to me in general as I'm finding more and more environments moving to "real-time analysis" where they want to do queries on as-close-to-current data as possible. In fact, real-time data warehousing in a relational database is one of the primary areas of improvements for SQL Server 2008 with features such as partition-level lock escalation, improvements in indexed views, grouping sets, and star join optimizations - just to name a few.

So, in terms of partition-level lock escalation. Am I happy that it's going to be there - for sure! However, the other two issues mentioned above might not change. Having different indexes at the partition level is likely through a feature called "Filtered Indexes" which has not yet appeared in any CTPs but it has been discussed at conferences/events. So, we might solve 2 out of 3 but what about online index rebuilds at the partition level? At this point, I'm pretty sure that they won't be able to solve that for SQL Server 2008... As a result, I would suggest a slightly different architecture. Instead of using only a single partitioned table for both read-only and read-write data, use at least two tables. One table for read-only data and another for read-write data. If you think this might be defeating the purpose of partitioning... then look at these benefits:

* the read-only portion of the table (which is typically the *much* larger portion of the table - can still be managed with partitioning)
* the read-only portion - once separated from the read-write - can have additional indexes for better [range] query performance
* the read-only portion of the table can actually be partitioned into multiple partitioned tables - to give better per-table statistics (statistics are still at the table-level only so even if your partitioning scheme is "monthly" you might want to have tables that represent a year's worth of data...especially if your trends seem to change year to year)
* large range queries against the read-only portion of the data will only escalate to the "table" (which is now separated from the read-write data)
* the read-write portion of the data can have fewer indexes
* the read-write portion of the data can be placed on different disks (MORE fault tolerant disks) due to the importance/volatility of the data
* finally, and most importantly, the read-write portion of the data can be maintained completely separately from the read-only portion with regard to index rebuilds

So, then how do you make it appear as one table? Use partitioned views over partioned tables and consider using a synonym for the hot/insert table. At the end of each month (or whatever your partitioning strategy uses - daily, weekly, monthly, etc.), "switch" the read-write portion of the table into the read-only portion of the table. You should be able to do all of this with no data movement and the synonym used for inserts will mean that your applications don't need to change either.

In summary, I do like the partition-level lock escalation feature especially as it doesn't require rearchitecting your solution/design. However, by creating two or more tables where read-only data is isolated from read-write, you can leverage many other features (like online index rebuilds).

If you're more interested in hearing about lock escalation at the partition-level check out Paul's recent blog entry on it: SQL Server 2008: Lock escalation changes.

Have fun and thanks for reading!
kt

OK, have you ever been working on one thing...that led you to another (and another and another) and then you seem to have lost hours? OK, I know. That's our life [in the computer industry and I'm sure others!] - putting out fires and chasing strange behaviors that we eventually call "gremlins" when we really can't figure them out (especially when they don't repro). And, I know that we all want to be incredibly prepared but, sometimes bugs happen. And, sometimes bugs lead to serious problems possibly even data corruption/loss (which I've seen) and NO, I'm not directly relating this to anything about SQL Server. I'm just wanting to stress the necessity of a backup strategy (ah, a *tested* backup strategy) but, the bugs I've run into today are really not all that serious (they do NOT corrupt data). But, they do lead me to believe that far fewer changes are made to collations than I had thought? Or, that many of you change collations at the most granular level (probably at the column level?) and that database collation changes are done but without additional column level changes later.  

And, that's really the point of this blog post... for now, I'm going to recommend that you make changes at the column-level OR you don't make additional changes AFTER you've changed a particular database's collation. In other words, if you have a case insensitive server and a case sensitive database then things will probably work well. You can even leverage things like default_collation for temp tables. However, if you try to make addition changes to collations in other objects - such as views - it doesn't seem to work. Basically, I've run into problems creating views with different collations only when the database's collation is different than the server's collation. So far, that's the only thing that I've found that's wrong with what I've been doing. And, I didn't even figure this one out on my own - I did a live search on the error and found this: http://cc.msnscache.com/cache.aspx?q=72171562874629&mkt=en-US&lang=en-US&w=286a60c3&FORM=CVRE which seems like the same problem I'm having (and sorry for the cached page, I couldn't seem to get to the live page).

Regardless of this issue (is it a bug?), the real reason for this blog post is that a great discussion came up on the Regional Director tech alias (it's an internal thing we use to leverage each other's skills). The original question led to a few discussions and in the end, I think there are really two questions that I thought I'd discuss here:

#1) do you want a ONE-TIME way of checking a bunch of data to find rows that are lower-case (and shouldn't be)
#2) do you want to REPEATEDLY find rows based based on a case-sensitive search (where the data is stored case-insensitive).

In my first response, I answered #2. And, I'm going to start with that one here too. If you want to query a case-insensitive column with a case-sensitive search then changing the collation on the fly (with a where clause) works (although there are some performance issues related to this). So, I took an old example of mine (which was against pubs) and I decided that it needed a refresh (meaning, I wanted to update this to work against AdventureWorks). And, that's where half of my fun today started since this is where I've run into what I think is a bug. Anyway, let's start with what works:

-- First, I'll create a test database. Without a collation specified,
-- it will use the server's default collation.

CREATE DATABASE TestAdventureWorks
go

-- Verify the database collation
SELECT DATABASEPROPERTYEX('TestAdventureWorks', 'Collation')
go

-- database is set to SQL_Latin1_General_CP1_CI_AS as expected
-- this is a case-insensitive database

USE TestAdventureWorks
go

SELECT LastName collate database_default AS LastName
, FirstName collate database_default AS FirstName
, MiddleName collate database_default AS MiddleName
INTO MyTestContacts
FROM Adventureworks.Person.Contact
go

SELECT *
FROM MyTestContacts
WHERE Lastname = N'Adams'
go -- (86 row(s) affected)

SELECT *
FROM MyTestContacts
WHERE Lastname = N'adams'
go -- (86 row(s) affected)

SELECT *
FROM MyTestContacts
WHERE Lastname COLLATE Latin1_General_CS_AS_KS_WS = N'Adams'
go -- (86 row(s) affected)

SELECT *
FROM MyTestContacts
WHERE Lastname COLLATE Latin1_General_CS_AS_KS_WS = N'adams'
go -- (0 row(s) affected)

-- Next, create a view:
CREATE VIEW ContactLastNameCaseSensitive
AS
SELECT
LastName COLLATE Latin1_General_CS_AS_KS_WS AS CSName
FROM MyTestContacts
go

SELECT *
FROM ContactLastNameCaseSensitive
WHERE CSName = N'Adams'
go -- (86 row(s) affected)

SELECT *
FROM ContactLastNameCaseSensitive
WHERE CSName = N'adams'
go -- (0 row(s) affected)

And, everything works... in TestAdventureworks. In the *real* AdventureWorks, I get an error when I try to create the view:
Msg 2791, Level 16, State 5, Procedure ContactLastNameCaseSensitive, Line 3
Could not resolve expression for schemabound object or constraint.

So, this is the first issue. It seems as though you can't create the view if your database has a different collation than the server collation. Well, (again), I haven't spent all that much time on this one but I did repro what the chain on the forum seemed to have found.

Now, as for the second issue... the query can be EXTREMELY painful and slow if you run this against a large set of data. See, changing the collation on the fly will need to perform a row-by-row comparison of the data. So, to minimize that HUGE hit on performance - you have two options.

1) actually consider changing the column's collation so that it matches your queries AND then create an index (note: the actual use of the index will be determined by the selectivity of the data).

SELECT LastName collate database_default AS LastName
, FirstName collate database_default AS FirstName
, MiddleName collate database_default AS MiddleName
INTO MyTestContacts2
FROM Adventureworks.Person.Contact
go

ALTER TABLE MyTestContacts2
ALTER COLUMN LastName nvarchar(100) COLLATE Latin1_General_CS_AS_KS_WS
go

CREATE INDEX CSNameInd on MyTestContacts2 (LastName)
go

-- let's use a query that's highly selective (selective enough to use the index):

SELECT *
FROM MyTestContacts2
WHERE LastName = N'Barlow'
go -- (1 row(s) affected)

SELECT *
FROM MyTestContacts2
WHERE LastName = N'barlow'
go -- (0 row(s) affected)

2) create an index with a different collation... but this is harder than it sounds as the CREATE INDEX statement doesn't directly allow changing collation (however, it should!). But, you can do this by either creating another column (real or computed) with the case-sensitive collation and then indexing it OR you could do this through an indexed view (but that adds a few complexities as well). I think the computed column that's indexed is GREAT if the searches are generally highly selective. If they are not, then it is probably better to create a real column - as a computed copy of the inserted value - that is case sensitive. However, at that point, I'm not entirely sure why you're keeping the case-insensitive version around...unless it's to keep the actual inserted value (maybe for printing and/or display?). Regardless, here's how you can create an indexed computed column.

ALTER TABLE MyTestContacts
ADD
CSName
AS LastName COLLATE Latin1_General_CS_AS_KS_WS
go

SELECT *
FROM MyTestContacts
WHERE CSName = N'Adams'
go -- (86 row(s) affected)

SELECT *
FROM MyTestContacts
WHERE CSName = N'adams'
go -- (0 row(s) affected)

CREATE INDEX CSNameInd ON MyTestContacts (CSName)
go

SELECT *
FROM MyTestContacts
WHERE CSName = 'Barlow'
go -- (1 row(s) affected)

SELECT *
FROM MyTestContacts
WHERE CSName = 'barlow'
go -- (0 row(s) affected)

And, the index will be used if the query is highly selective.

OK, so that ends the answer to part 2 of the question (see how tangents can take us a bit off track :)....

Now, let's get back to question #1.

What if you want to do a one-time search through your data to find all of the lower case data? Well, there are a few thoughts here.... First, let's modify the ONE Barlow row to be lowercase barlow so that we have something to find:

UPDATE MyTestContacts2
SET LastName = N'barlow'
WHERE LastName = N'Barlow'
go -- (1 row(s) affected)

NOTE: This is an ABSOLUTELY horrible quiery as I didn't use any key to point to the exact row I wanted to modify. Had there been a lot of Barlow's I would have modified them all. This worked here because I knew there was only one row. But, all of your tables should have a primary key, etc. (not even going to begin this tangent :).

Now, having said that... let's see if we can find this row easily? You should be able to do this using Transact-SQL and using some type of wildcard pattern matching such as:

SELECT * FROM MyTestContacts2
WHERE Lastname like N'b%'
go -- (1 row(s) affected)

And, that works without any problems.

So, what about NOT an upper case B.

SELECT * FROM MyTestContacts2
WHERE Lastname NOT LIKE N'B%'
go -- (18768 row(s) affected)

SELECT * FROM MyTestContacts2
WHERE Lastname NOT LIKE N'%B%'
go -- (18765 row(s) affected)

tangent number 87 <g>: if you're wondering what the 3 rows are (as was I :)... they are 1 row of O'Brien and 2 rows of Smith-Bates. Here's that query:

SELECT * FROM (SELECT * FROM MyTestContacts2
WHERE Lastname NOT LIKE N'B%') AS Bs
WHERE Lastname like '%B%'
go

OK, so, I thought we were there... I thought we could go to what I thought was the next logical step.....

SELECT * FROM MyTestContacts2
WHERE Lastname NOT LIKE N'%[A-B]%'
go

And... well, we lose barlow from the result set. For some reason...when you do ranges of characters it seems to lose the case??? I remember that [A-Z] and [a-z] were different in some release? Is this a regression? Someone help me out with this one as I'm without a clue. In the end, the ONLY way I could get this to work is to do this:

SELECT * FROM MyTestContacts2
WHERE Lastname not like N'%A%'
AND Lastname not like N'%B%'
AND Lastname not like N'%C%'
AND Lastname not like N'%D%'
AND Lastname not like N'%E%'
AND Lastname not like N'%F%'
AND Lastname not like N'%G%'
AND Lastname not like N'%H%'
AND Lastname not like N'%I%'
AND Lastname not like N'%J%'
AND Lastname not like N'%K%'
AND Lastname not like N'%L%'
AND Lastname not like N'%M%'
AND Lastname not like N'%N%'
AND Lastname not like N'%O%'
AND Lastname not like N'%P%'
AND Lastname not like N'%Q%'
AND Lastname not like N'%R%'
AND Lastname not like N'%S%'
AND Lastname not like N'%T%'
AND Lastname not like N'%U%'
AND Lastname not like N'%V%'
AND Lastname not like N'%W%'
AND Lastname not like N'%X%'
AND Lastname not like N'%Y%'
AND Lastname not like N'%Z%'
go

And, well, that works. But, it is NOT pretty! The query's going to require a table scan anyway AND it is a one-time query. I'm OK with this as a solution to this problem BUT, am I missing something here? Please tell me there's something more clever here? Is this a bug?

I'm definitely interested in feedback on this one!
kt

Categories:
SQL Server 2005

DDL Triggers were a new feature of SQL Server 2005 and while seemingly simple, they are very powerful. DDL Triggers allow you to trap an attempted DDL operation to audit it, prevent it, or do anything you want to validate/verify/”authorize”/etc – you write the code. And, since a trigger fires as part of the transaction, you can roll it back. In many conference demos/webcasts, etc., I have provided a sample script that prevents ddl within a [production] database. That script has been really helpful/useful but recently I thought about an update to it… SQL Server 2005 has another new feature "execute as". While I definitely see many benefits, I’m also a bit concerned. To a certain extent, I feel that the potential for SQL Injection is actually higher. If a developer creates a poorly written/tested stored procedure (ok, therein lies the problem, really!) that includes dynamic string execution AND then uses "execute as" to essentially elevate a user with minimal privileges to a higher level (so that they don’t need to give the base object rights to the user), a malicious user could “inject” code in and actually succeed if the “execute as” user has rights to the injected code. In prior releases, and with the default behavior (execute as caller), this is not possible (which is good for security but bad for dynamically executed strings within stored procedures as base object rights are necessary).

Having said that, and since security is always a concern, my DDL Trigger only audited for the login of the user who executed the statement, not for the actual user that’s logged in. In other words, if EXECUTE AS is used (or SETUSER is used), then the context of the user executing is actually different then the logged in user. To see this shift in context, SQL Server 2005 added a new function: ORIGINAL_LOGIN().

(reading between the lines is even more frightening in that prior to SQL Server 2005, the original user could not be tracked from SETUSER. The good news is that SETUSER is ONLY allowed to be used by DBOs so it’s not as widespread as the potential for “execute as”).OK, so how can we put all of this together? We’ll want to add the ORIGINAL_LOGIN function into our audit table in our DDL Trigger. Even if you choose NOT to rollback, at least you’ll know who performed the operation (even if from a dynamically executed string!).

USE AdventureWorks;
go

--Create a login/user - just for this exercise

CREATE LOGIN Paul WITH PASSWORD = 'PxKoJ29!07';
go

CREATE USER Paul FOR LOGIN Paul;
go

sp_addrolemember 'db_ddladmin', 'Paul'
go 

CREATE SCHEMA SecurityAdministration
go

CREATE TABLE SecurityAdministration.AuditDDLOperations
(            OpID                int               NOT NULL identity
     
                                                  CONSTRAINT AuditDDLOperationsPK
                                                           PRIMARY KEY CLUSTERED,
            OriginalLoginName    sysname           NOT NULL,
            LoginName            sysname           NOT NULL,
            UserName             sysname           NOT NULL,
            PostTime             datetime          NOT NULL,
            EventType            nvarchar(100)     NOT NULL,
            DDLOp                nvarchar(2000)    NOT NULL
);
go

GRANT INSERT ON SecurityAdministration.AuditDDLOperations TO public;
go

CREATE TRIGGER PreventAllDDL
ON DATABASE
WITH ENCRYPTION
FOR
DDL_DATABASE_LEVEL_EVENTS
AS

DECLARE
@data XML
SET @data = EVENTDATA()
RAISERROR ('DDL Operations are prohibited on this production database. Please contact ITOperations for proper policies and change control procedures.', 16, -1)
ROLLBACK
INSERT
SecurityAdministration.AuditDDLOperations
                        (OriginalLoginName,
                         LoginName,
                         UserName,
                         PostTime,
                         EventType,
                         DDLOp)
VALUES   (ORIGINAL_LOGIN(), SYSTEM_USER, CURRENT_USER, GETDATE(),
   @data.value('(/EVENT_INSTANCE/EventType)[1]', 'nvarchar(100)'),
   @data.value('(/EVENT_INSTANCE/TSQLCommand)[1]', 'nvarchar(2000)') )
RETURN;
go 

--Test the trigger.

CREATE TABLE TestTable (col1 int);
go

DROP TABLE SecurityAdministration.AuditDDLOperations;
go

EXECUTE AS LOGIN = 'Paul' -- note: Remember, Paul is a DDL_admin
go

DROP TABLE SecurityAdministration.AuditDDLOperations;
go

REVERT;
go 

SELECT * FROM SecurityAdministration.AuditDDLOperations;
go

DROP TRIGGER PreventAllDDL ON DATABASE;
go

DROP TABLE SecurityAdministration.AuditDDLOperations;
go

DROP SCHEMA SecurityAdministration;
go

DROP USER Paul;
go 

DROP LOGIN Paul;
go
 

So, have fun testing with this one. 

Thanks for reading!
kt

OK...SP2, the SP2 refresh and then the parallel/subsequent GDRs has seemingly (and rightly so) confused some of us... However, thanks to the PSS Engineers blog (and specifically Bob Ward - Senior Escalation Engineer, Microsoft PSS), this blog entry clears up a lot of that confusion. The end result is that you should be at 9.00.3054 or 9.00.3159. 3054 is the correct one if you haven't had any special hotfix/GDRs directly from Microsoft PSS and 3159 is for those of you that have. For me, I think the best part was the reiteration of the fact that "Microsoft Update will notify you of this" and the comments made that "Microsoft Update is smart enough to recognize you need this specific version of the GDR2 fix...". The most interesting part of all of this is the reminder that SQL Server IS included in Microsoft Update. What's the most interesting is that most people are still using Windows Update and Microsoft Update is DIFFERENT. You need to (essentially) replace Windows Update with Microsoft Update (although it's not that simple - of course...). Basically, you need to install Microsoft Update and then remove Windows Update. So.... if you haven't done this - you should. At least on your main desktop/laptop machine (at first) and then on other machines from there. I can't remember when this originally came into place but a few folks asked me about the difference, etc. and how SQL Server fits in and well... it's all about Microsoft Update now not Windows Update (however/fyi, Microsoft Update looks and feels exactly like Windows Update but it includes Windows, Office, SQL and Exchange). If you want to find out more, check out the Microsoft Update FAQ here.

And, along the lines of maintenance... Paul Randal (of the SQL Server Storage Engine blog) would like to know if you have time to fill in a survey on YOUR VLDB maintenance practices. This is pretty important for them to know. He explains what they'll use it for and why it's useful to them. Be sure to check out his blog entry here.

Finally........... lots of final session writing/planning going on for TechEd. Bob Beauchemin and I are delivering a pre-conference workshop titled: Leveraging SQL Server Always-On Technologies to Achieve High Availability and Scalability. It's on the Sunday prior to TechEd and it's a new session for us. Here's the abstract:

PRCN06 Leveraging SQL Server Always-On Technologies to Achieve High Availability and Scalability 
System down time and lack of scalability for mission critical applications can result in loss of revenue and business creditability. Planned downtime is typically caused by hardware upgrade, application or OS upgrade, applying a service pack, or performing routine maintenance task. Examples of unplanned downtime are hardware or software failure, natural disasters, and human error. In fact, human error has been identified as the number one cause of downtime. SQL Server 2005 Always-On Technologies provides a full range of options for achieving and maintaining appropriate levels of availability. Because the product offers so many choices, it is difficult to choose features that provide the best availability solution for a given application. In this session, we provide an in-depth description of these technologies and delve into scenarios and best practices in deployment of the availability technologies. The high availability technologies covered include Database Mirroring, Database Snapshots, Peer-to-Peer Replication, Clustering, Online Indexing, Online Restore, Piecemeal Backup & Restore, Partial Database Availability, Table and Index Partitioning, Snapshot Isolation, DDL Triggers, and others. The second part of this session focuses on scalability and building systems that scale-out to multiple servers. Building a scale-out application with SQL Server 2005 may entail using techniques and features that are unfamiliar, or are new. This session provides in-depth information about the internal implementation of scale-out features such as Service Broker, Query Notifications, Distributed Partitioned Views, Scalable Shared Databases, and Peer-to-Peer Replication. The session also includes troubleshooting techniques using Profiler and the new dynamic management views.

As for content, we'll have our lecture content available to all attendees, we're going to giveaway AlwaysOn DVDs (more info coming up) AND Bob, Paul and I are going to hang out after the workshop to answer even more questions... So, if you're looking to burn budget for FY'07 AND you want to attend an information packed (and fun ;) pre-con workshop AND a great conference for breadth/futures (a bunch of Katmai sessions at the event too), then you should sign up for TechEd before it sells out......again. Also, there are a bunch of sessions at the conference that might interest you - Paul and I are doing a Chalk/Talk Q&A on VLDB Maintenance, I'm doing a demo fest on AlwaysOn, Paul's doing a session on Corruption Detection and Recovery, Bob's doing a session Windows PowerShell and SMO Together (oh, and he's listed as Robert Beachemin...not sure why???) ...and that's just to name a few!

Oh, and the AlwaysOn DVDs are cool because:

  1. they have a setup.exe that runs to create vhd/vmc files that allow you to access a predefined VPC image.
  2. Virtual PC is free and Virtual Server is free... you can use EITHER for the Virtual Environment.
  3. the VPC is a Windows 2003 Server setup with SQL Server 2000 and SQL Server 2005 (multiple instances) and allows you to access an environment that's excellent for learning and testing and...self-paced labs
  4. the DVD includes 9 lab manuals for roughly 16 hours of self-paced lab time AND they're really good labs with multiple parts, excellent links and even useful undoc'ed commands too (if I might say so myself as I wrote most of them :)
    1. Database Snapshots - 4 Exercises, 75-90 minutes
      • Exercise 1: Repartition the SalesDB Database
      • Exercise 2: Create and Examine a Database Snapshot
      • Exercise 3: Working with Multiple Snapshots
      • Exercise 4: Creating a Database Snapshot on a Mirror Database
    2. Data Recovery & Preventative Techniques - 4, exercises, 75-90 minutes
      • Exercise 1: Examining Foreign Key Relationships between Tables
      • Exercise 2: Point-In-Time Recovery
      • Exercise 3: Using the tablediff.exe Command-Line Utility to Compare ALL Data Modifications
      • Exercise 4: Using DDL Triggers to Prevent Tables Being Dropped
    3. Instant Initialization - 2 exercises, 30-45 minutes
      • Exercise 1: Enabling Instant Initialization
      • Exercise 2: Security Vulnerabilities Created by Instant Initialization
    4. Peer to Peer Replication - 5 exercises, 75-90 minutes
      • Exercise 1: Implementing a Replication-Ready Schema
      • Exercise 2: Configuring and Implementing Peer-to-Peer Replication Configuration Using the Replication Wizards in SQL Server Management Studio
      • Exercise 3: Using the Dual Database Monitor
      • Exercise 4: Adding a new Peer Server
      • Exercise 5: Monitoring Peer-to-Peer Data Flow after a Fault
    5. Table and Index Partitioning - 4 exercises, 75-90 minutes
      • Exercise 1: Range Partition Function
      • Exercise 2: Partition Scheme
      • Exercise 3: Partitioned Table
      • Exercise 4: The Sliding Window Scenario
    6. Snapshot Isolation - 5 exercises, 75-90 minutes
      • Exercise 1: Pessimistic Locking
      • Exercise 2: Activating Snapshot Isolation & Read Committed with Snapshot Isolation 
      • Exercise 3: Using Snapshot Isolation (SI)
      • Exercise 4: Using Read Committed with Snapshot Isolation (RCSI)
      • Exercise 5: Monitoring Snapshot Isolation & Read Committed with Snapshot Isolation 
    7. Online Operations - 2 Parts, 75-90 minutes
      • Part 1: Online Index Operations
        • Exercise 1: ONLINE Index Move (for better isolation)
        • Exercise 2: Partition an Active Table ONLINE
      • Part 2: Partial Database Availability and Online Piecemeal Restore
    8. Database Mirroring - 2 large sesions with TONS of exercises, 4+ hours
      • Part I: Database Mirroring in Action
        • Exercise 1: Configuring and Implementing the High Availability Database Mirroring Configuration – using Transact-SQL through a SQLCMD master script
        • Exercise 2: Using the Dual Database Monitor and Transparent Client Redirect
        • Exercise 3: Initiating Failover in the High Availability Configuration
      • Part 2: Understanding and Implementing Database Mirroring
        • Exercise 1: Configuring and Implementing Database Mirroring using the SQL Server Management Studio
        • Exercise 2: Configuring the Database Mirroring Monitor, Mirroring Threshold Alerts and WMI Event Alerts
        • Exercise 3: Converting to the High Protection Configuration and Comparing Performance between Synchronous and Asynchronous forms of Database Mirroring 
        • Exercise 4: Configuring and Implementing the High Availability Database Mirroring Configuration – using Transact-SQL through a SQLCMD master script 
        • Exercise 5: Initiating Failover 
          1. Part I: Manual and Automatic Failover in the Synchronous forms of Database Mirroring Configuration 
          2. Part II: Preventing “split brain” in the High Availability configuration 
        • Exercise 6: Converting to the High Performance Configuration and Forcing Failover with Potential Data Loss 
    9. Service Oriented Database Architecture - 5 exercises, 3+ hours
      • Exercise 1: Setting up simple Service Broker messaging
      • Exercise 2: Setting up Inter-instance Services
      • Exercise 3: Setting up dialog security and encryption
      • Exercise 4: Setting up application-specific functions
      • Exercise 5: Using Query Notifications

And........ if that doesn't motivate you - we might also giveaway a Manageability DVD that's packed with Tools demos/labs and some SP2 specific stuff such as customized reports (which we'll talk about in the last part of our pre-conference workshop). OK, so I hope to see you at TechEd.......... the pre-conference alone is worth it!

THANKS,
kt

Categories:
Events | Resources | SQL Server 2005

While at SQL Server Connections in Orlando, Stephen Wynkoop of SSWUG stole some morning time for an interview (morning time is not my best but we did get a lovely "I got my mug on SSWUG tv" mug so that made it OK :) :). We (Paul and I) had a great time chatting about Diaster Recovery, Backup/Restore, general best practices and well - games (specifically - the VERY addictive game of Blokus). Here's the interview link: http://www.sswug.org/columnists/editorial.asp?id=1135.

Enjoy!
kt

PS - If any of you pick up (and become completely addicted to) Blokus, let us know! It's great for 2 to 4 players and extremely fun when a 5 year old "wild card" sits in and throws moves that you just can't understand (but later come to really frustrate you :) :).

Categories:
Events | Opinions | SQL Server 2005

In the quest for more (and more and more ;) information, I've been told about a new link - from the SQL Server Books Online team... it's call the "SQL Server 2005 Books Online Scoped Search" and it allows you a "live" search format for accessing content in the SQL Server books online. And - because they're online - they are the most up-to-date. I'm not sure how frequently they update these BOL (vs. the downloads that we get) but my guess is that they update the online version frequently and then do a BOL refresh every now and then which includes that which is online.

Anyway, I always like to have the latest version on my laptop BUT it's nice to have a quick/easy way to find content online. Oh - and please make sure you give these guys feedback as this is a new site, still with the ability to make tweaks where necessary!

Have fun: http://search.live.com/macros/sql_server_user_education/booksonline

Cheers,
kt

PS- Check out Buck Woody's Blog here: http://blogs.msdn.com/buckwoody/ and subscribe here. He's a Technical Content Developer for the SQL Server Documentation Team and he's blogged items about the BOL, how to search, updates, etc.

Categories:
Resources | SQL Server 2005

OK, I've been complaining about finding resources - for a long time... AND, I've been complaining about how I can never tell if a whitepaper is on MSDN or on TechNet or on Microsoft.com or on x, y, or z. Well........ finally, I've done something about it. I've *started* to put together (and verify) a list of what I think are the top whitepapers out there. This is by no means a complete list AND I haven't read every one of the papers I've referenced. However, I've only linked to whitepapers written/published by reputable sources and I've checked every link to make sure it works. Primarily, this is the list of whitepapers that I reference the most - in classes, seminars, workshops, etc. And, it's not a blog entry - it's an actual webpage so it should [hopefully] be easy to find. I plan/hope to do this for blogs and other useful stuff BUT, it takes quite a bit of time verifying each of the links (and of course, searching/finding the darn thing when someone has broken the link :).

So.... after the first 5+ hours of work - here it is: http://www.sqlskills.com/whitepapers.asp.

Thanks for reading (and wow - you have a lot more reading to do now :) :) :),
kt

Categories:
Resources | SQL Server 2005

Last week while Paul and I were in the UK delivering a one day seminar on Crucial Database Maintenance Techniques, we met David McMahon from the Next Generation User Group. They're doing some exciting things in the UK and even for the wider community - for example - podcasts. Paul and I were interviewed for one and it's ready for download here.

Enjoy!
kt

Categories:
Opinions | SQL Server 2005

Another great DotNetRocks interview has been completed. It's Paul Randal's session on Disaster Recovery, DBCC, Index fragmentation (and defrag) and [unfortunately for me] a lot more. All I can say is that I was ambushed...

thanks Richard
   thanks Carl...

Enjoy: http://www.dotnetrocks.com/default.aspx?showNum=217
kt

Categories:
Events | Resources | SQL Server 2005

I had a discussion earlier today (with Paul Randal) about many misconceptions that exist about upgrading databases and more importantly, about "downgrading" databases. Really, the issue is that I've heard people get frustrated when they find that things like backup/restore works FROM SQL 7.0/2000 TO SQL Server 2005 but not the other way around - even if the database is in SQL Server 2000 (80) compatibility mode. First and foremost, compatibility mode only affects parsing, query processing, and general data manipulation; it does not affect physical storage (well, there's more to it than that but that's a general overview). When you upgrade a database to SQL Server 2005, you WILL benefit immediately from changes in the storage engine, etc. regardless of compatibility mode. Compatibility modes are there to give you time in updating/upgrading your code - if/when necessary. Most code will work when upgrading but some code may not be supported because of changes to keywords, syntax changes, etc... The best thing to do is check your application compatibility with the Upgrade Advisor. I did a a couple of webcasts on Installation/Upgrade as part of my 11-part series on TechNet. See the blog entry for the entire series here. Part 3 and part 4 are focused on Installation and Upgrade and their associated blog entries have a lot of additional links (including links to the Upgrade Advisor as well as a series of things you might want to do before you upgrade). Also, be sure to checkout the upgrade site off of the main Microsoft SQL Server site.

How to move USER databases around - a quick list of what's supported between versions

Backup/Restore from 7.0 to SQL Server 2000
Detach from 7.0, copy the files, then attach to SQL Server 2000
Backup/Restore from 7.0 to SQL Server 2005
Detach from 7.0, copy the files, then attach to SQL Server 2005
Backup/Restore from 2000 to SQL Server 2005
Detach from 2000, copy the files, then attach to SQL Server 2005

Why use Backup/Restore?

PROs

  1. Because you have a backup! This will allow you to go back to the version from which you came. However, without any changes made on the uplevel version.
  2. Because it doesn't require the database to be taken "offline" when the backup is performed (note: that this is both good and bad - bad because you don't really know the exact point in time to which the database reconciles...which may not matter if you're just testing).
  3. Because the backup will be the size of data only and will not include database free space. Free space is not backed up (e.g. a database with a 100GB data file with only 20GB of data should yield a file that's roughly 20GB in size). I say "roughly 20 GB" because the internals of a backup require that the transaction log records for the activity that occured during the backup process are also backed up with the full database (or differential) backup. This is actually the basis for why transaction log backups are not supported during a full/differential backup in SQL Server 2000 (they are in SQL Server 2005). However, this is the reason why the transaction log cannot be cleared while a full or differential is ALSO running in SQL Server 2005.

CONs

  1. You don't know the exact point in time to which the database reconciles (it will be the time that the backup completed) AND logs CAN be restored uplevel as well. NOTE: If you're interested in creating an exact point in time version of the database - consider putting the database into "restricted user" mode or "single user" mode (so that user transactions are not allowed during the backup). Again, this may not be a concern.
  2. It takes time to complete the backup (there are four phases of a restore: create/initialization, copy, redo, undo). Make the create/initialization *much* faster by enabling Instant Initialization. See my Instant Initialization blog post for more details.

Why use detach/attach?

PROs

  1. It's simple, it's fast... but once detached then the database is OFFLINE.
  2. You know the exact point in time to which it reconciles because no transactions are allowed into the database once it is offline. Again, this may not be a concern.

CONs

  1. You must copy the entire file - including the free space to the other location and the network copy might be the most expensive (meaning time consuming) part of the entire process. However, once copied, the files do NOT need to be created on the destination because on attach, these files will be used.
  2. The database is offline once detached and during copy.
  3. If you don't COPY the files and instead you attach the detached files, you will have ABSOLUTELY NO WAY of getting back to the version from which you detached. (ah, this is probably the single most important reason for why I prefer backup/restore!)

Summary for "How to move USER databases around"
Between these versions "upgrades" are supported ONLY to the uplevel version. There is NO single (or simple) feature that can be used to get back to the version from which you started (without exporting/importing all of the data). There is also no undocumented back-door to do this either (no trace flags, no DBCC commands, NA DA!!! as per Paul).

What about System Databases?
This is a whole other can of worms to open and the easiest thing I can say here is that you generally should not move/upgrade system databases across machines. These are upgraded through "in-place" upgrades of SQL Server (on the same machine) or through manual migrations (to different machines) of the users/objects (SQL Agent Jobs, user-defined system procedures in master, logins in master, etc.). This is not an easy process (manual migration) but may prove to be a better choice over an upgrade in place if something were to go horribly wrong (which is unlikely but I'm a "what's the worst case scenario" person when it comes to availability :). The other benefit of NOT upgrading in place - and instead MOVING databases from one version to another on an upgrade - is that you get to complete some basic "spring cleaning". New hardware, freshly formatted, freshly installed/configured OS, clean disks, etc. This can often alleviate some of the strangest, hard-to-determine-problems, that have plagued you for weeks/months. Like I said, this is a whole other can of worms to open!

But - if you're interested in moving system databases around on the SAME machine, here's a great KB that covers the required options, syntax, rules and restrictions: How to move SQL Server databases to a new location by using Detach and Attach functions in SQL Server

And - if you're interested in transferring logins and passwords between instances (for upgrade or for sync'ing two servers used to create a standby partnership - with Database Mirroring and/or Log Shipping), here's a great KB article that includes links to other articles even uplevel transfering of logins (like 2000 to 2005): How to transfer logins and passwords between instances of SQL Server

And - that's it for this week (probably)... two in a row is not likely to become three in a row (just setting expectations :) :) :),
kt

Categories:
Resources | SQL Server 2005 | Tips

Instant Initialization is a new feature of SQL Server 2005 that is based on an NTFS feature that was added to Windows XP (and therefore is also available in Windows 2003 Server). It's a feature that's seemingly simple; it allows file allocation requests to skip zero initialization on creation. As a result, file allocation requests can occur instantly - no matter what the file size. You might wonder why this is interesting or why this make a difference? Most file allocation requests are small requests, with small incremental changes (like .doc files, .xls files, etc.) but database files can be rather large. In fact, they should be rather large as pre-allocation of a reasonable file size is a best practice to reduce file fragmentation. Additionally, autogrowth causes performance delays (more so in 2000 than 2005) but it's generally something that you want to avoid when possible. As as result, database creation times can take minutes to hours to days, depending on file allocation request. But - it's not just for database creation. ALL file requests can leverage this feature: file creation for a new database, adding a file to an existing database, manually or automatically growing a file and (IMO - the best) restoring a database where the file (or files) being restored does not already exist. The reason I think the last feature is the best is that it can reduce downtime if a database is damaged and allow you to get back up and running more quickly. This is especially important for databases that cannot leverage partial database availability, which is an Enterprise Engine feature. So, to give you some motivation, here is a test that I performed just to have some interesting and comparable numbers.

Performance Test with Zero Initialization
Hardware: Dell Precision 670 Dual Proc (x64) with Dual Core, 4 GB Memory, RAID 1+0 array w/4-142 GB, 15000rpm disks
   CREATE DATABASE with 20 GB Data file = 14:02 minutes
   ALTER DATABASE BY 10 GB = 7:01 minutes
   RESTORE 30 GB DATABASE (EMPTY Backup) = 21:07 minutes
   RESTORE 30 GB DATABASE (11GB Backup) = 38:28 minutes

Performance Test with Instant Initialization
Hardware: Dell Precision 670 Dual Proc (x64) with Dual Core, 4 GB Memory, RAID 1+0 array w/4-142 GB, 15000rpm disks
   CREATE DATABASE with 20 GB Data file = 1.3 seconds
   ALTER DATABASE BY 10 GB = .4 seconds
   RESTORE 30 GB DATABASE (EMPTY Backup) = 5 seconds
   RESTORE 30 GB DATABASE (11GB Backup) = 19:42 minutes

SQL Server can leverage this feature for DATA file requests ONLY; the transaction log must be zero initialized because of its circular nature... which brings me to why this is not "on by default" or more specifically - HOW do you get this feature. First, there's absolutely no syntax change required - SQL Server will use it if it has access to it (so what does that mean?). The SQL Server service must have been granted the Windows permission - "Perform Volume Maintenance Tasks". By default, Windows Administrators have this permission but as yet-another-best-practice, we recommend that your SQL Server run under an account that is a "lower privileged" account (i.e. NOT an administrator). Other ideal options include running as "network service" or running as a dedicated user/domain account that has very few permissions except to SQL Server and it's required resources. A lot of folks recommend using network service for its simplicity (it doesn't have a password and it has limited network/local rights) and I agree with this as long as it's truly dedicated to SQL Server. If network service is used by other services on the same machine then you could compromise security of your SQL Server (or the other services) with the elevated permissions that SQL Server grants or visa versa by the permissions that other applications may have granted to network service. Again, I'm not against network service BUT I would check your local permissions to see if there's anything that jumps out at you. If you've installed other applications/services/etc. you may have already compromised the security of the network service account. I would love to know if anyone has a quick/easy way to check windows permissions to see what may have been granted in addition to the default permissions OR even a link to where the defaults are listed online... I've had trouble doing exactly this when searching, etc... feel free to post links/comments in your comments!. Anyway, with a dedicated user account, you can make sure that it's not compromised because you only use it for SQL Server. But, even these have negative issues - like required passwords that networks invalidate after n days and that you must change on a server/service basis. From a management perspective, this can be difficult.

SIDE NOTE: Managing the service account password is a lot easier in SQL Server 2005 with the SQL Server Configuration Manager (SQL-CM). The SQL-CM allows you to change the password to a service without an active connection (meaning even if the service isn't started) and it invalidates the login token so that password changes don't require a restart of the service. SQL-CM also has a command-line interface and is scriptable with WMI. The WMI Provider allows server settings, client and server network protocols, and aliases to be scripted through the WMI Provider by means of simple VBScript code (or by using the command-line tool). What you could end up doing is creating a script that changes the password of your services on all of your servers (for example, when a password policy is enforced that requires that the passwords of service accounts be changed). I've recently completed a whitepaper that highlights the Management tools (it's really just an overview but even then it turned out to be quite large as we looked at the tools in many different ways). I'll certainly let you know when the whitepaper is published (which should be within the next couple of weeks).

Granting the permission "Perform Volume Maintenance Tasks"
To use instant initialization, your SQL Server service must be running with an account that has the required privilege. If your SQL Server service is running as a local administrator this permission already exists. For a service account which is not a local administrator (again, recommended!), the necessary privilege to grant is Perform Volume Maintenance Tasks. This permission can be granted by an administrator through the Local Security Policy tool (Start, All Programs, Administrative Tools) and once granted, SQL Server automatically uses instant initialization. IMPORTANT NOTE: If this permission is given while SQL Server is running, then SQL Server must be stopped and restarted. However, once the server is restarted, no other syntax or permissions are needed. All data file creation/extension options will use instant initialization for data files created on NTFS volumes when SQL Server 2005 is running on Windows XP or Windows 2003 Server.

Why isn't this on by default?
OK, so after all of this... the gains that you see and the lack of changes to syntax, etc. You're probably wondering why this isn't on by default? It's a security issue. The biggest vulnerability is with SQL Server Administrators who are NOT also Windows Administrators. Windows Administrators have access to local files and can easily see all files stored on the local server. For files that are not encrypted (and are not already open to another process), an Administrator can open and/or modify these files using an appropriate editor. For files that are encrypted, an Administrator can at least view the encrypted information using a hex editor. By granting “Perform Volume Maintenance Tasks” to a SQL Server instance, you are giving administrators of the instance the ability to read the encrypted contents of a recently deleted file (ONLY IF the file system decides to use this newly freed space on the creation of a new database - created with instant initialization) with the undocumented DBCC PAGE command.

SIDE NOTE: The format for DBCC PAGE is undocumented in the Books Online but you will find tips and tricks on many “official” Microsoft blogs. The SQL Server Storage Engine blog (http://blogs.msdn.com/sqlserverstorageengine/ has some very good blog posts on internals and often describes undocumented commands. Specifically, check out the blog entry titled: How to use DBCC PAGE. The first three components are fairly straightforward: database id (or name), file id, and page id. The fourth component is the tricky one: printopt (or print option). The print options for DBCC PAGE are as follows (taken almost verbatim from the SQL Server Storage Engine blog - and Paul said I could :):

0 - print just the page header
1 - page header plus per-row hex dumps and a dump of the page slot array (unless it’s a page that doesn't have one, like allocation bitmaps)
2 - page header plus whole page hex dump
3 - page header plus detailed per-row interpretation (in this case, this option is not available - even if the data you are trying to read is from a previously deleted database because the metadata is not accessible only the raw page data)

Bear in mind, even if you can access this [encrypted] information, making sense out of this data will be challenging if not incredibly difficult.

In production environments, database files should NOT be located on file server drives - especially those where restricted and/or sensitive files are stored. As a result of prudent security measures, the true impact of using instant initialization is low. However, because this vulnerability exists, this feature is off by default.

Want to try this?
I've written a lab on Instant Initialization AND it has an interesting sequence of exercises (using multiple instances):

  • You create a database on instance: SQLDev01
  • Populate a large chunk of pages with very contrived (and easy to find) "junk" data

USE TestWithZeroInitialization
go

 

CREATE TABLE JunkData
(
   
JunkDataID int identity,
   
JunkDataValue char(8000) 
      DEFAULT REPLICATE('Junk', 2000)
)
go

 

SET NOCOUNT ON
go

 

DECLARE @Counter int
SELECT
@Counter = 0
WHILE @Counter < 20000
BEGIN
   
INSERT JunkData DEFAULT VALUES
   
SELECT @Counter = @Counter + 1
END
go

Drop the database from instance: SQLDev01

  • Create a new database on a different instance: SQLDev02 (which has been configured to use Instant Initialization) and hope that it uses the freed space by having dropped the first database
  • Start walking various pages (using DBCC PAGE) to see if you can view the "junk" data from the dropped database.

DBCC PAGE ('TestSecurityExposure', 1, 200, 2)
DBCC PAGE ('TestSecurityExposure', 1, 400, 2)
DBCC PAGE ('TestSecurityExposure', 1, 600, 2)
DBCC PAGE ('TestSecurityExposure', 1, 800, 2)
DBCC PAGE ('TestSecurityExposure', 1, 1000, 2)
DBCC PAGE ('TestSecurityExposure', 1, 1500, 2)
DBCC PAGE ('TestSecurityExposure', 1, 2000, 2)
DBCC PAGE ('TestSecurityExposure', 1, 2500, 2)
DBCC PAGE ('TestSecurityExposure', 1, 3000, 2)
DBCC PAGE ('TestSecurityExposure', 1, 3500, 2)
DBCC PAGE ('TestSecurityExposure', 1, 4000, 2)
DBCC PAGE ('TestSecurityExposure', 1, 4500, 2)

 

In most cases the very first output (for page 200) will return Junk data. If this is not the case, simply drop the TestSecurityExposure database and recreate it again. Sometimes it’s a timing issue and sometimes it could be a background process (like Windows Update) that uses the expected pages. Regardless, if you do get the same pages again - our contrived data should be easy to find.

You can certainly create the environment on your own and see if you can get it to work. OR, you can get a copy of our AlwaysOn DVD that has the appropriate lab environment. I tend to give away the AlwaysOn DVD at events I speak at (on Availability/Disaster Recovery) but I'm also happy to send a few out over snail mail (it needs 2GB of memory and 10GB of disk space for the virtual environment AND you need to have Virtual PC or Virtual Server installed - which are freely downloadable from Microsoft). Paul asked for DBCC CHECKDB information here (to get a free DVD sent to you) and I'm going to ask for instant initialization numbers and how this has helped. Post a comment here and then send me an email with your snail mail address information as well. I'm willing to do this for the first 10 responses... go!

Have fun...and thanks for reading!
kt

PS - For those of you in our UK (London) Event tomorrow, we're giving away the AlwaysOn DVD (and an even cooler SQLskills pen... lol). Now there's motivation if the content doesn't (NOT!).

OK - I feel like I know a fair amount about SQL Server but sometimes I also feel like I don't :) :) I'm continuously amazed at how big a product SQL Server is... today was one of those days when I felt "I don't"!

I've been wanting to know more and more about the new features in the tools and the direction in which the tools are headed so... I setup a meeting with Paul Mestemaker (it helps that I live in Redmond and I'm working on some SP2 resources for the team :); it was a great meeting. Some exciting new features and some great new directions in which the tools are headed. I like the way they're thinking and I especially like the options that are now in place to discover, use and customize the "reports" feature within Management Studio (just to name one!). What I learned (that was the highest on my "I didn't know that" list) is that quite a few gems are released as part of the Feature Pack for SQL Server. The Feature Pack "is a collection of standalone install packages that provide additional value for SQL Server 2005. It has been updated for SP2." From that description, it doesn't leap out at me as exciting AND I often know about many of these tools through other channels - but usually it's just "a tool at a time". The thing that's nice about the FP page is that it seems to be a nice and central, single location for ALL of the "add-ons" for SQL Server. It includes things like the Upgrade Advisor (which I typically point people to individually on it's main page) and will include (once it ships) things like the BPA (SQL Server 2005 Best Practices Analyzer) BUT it also includes things like a standalone download for SQLCMD so that you don't need to install all of the tools if you just want this lightweight client for automation. Additionally, it includes the SQL Server 2000 DTS Designer Components if you want to edit/modify/maintain DTS packages in 2005 before you rewrite/convert them to SSIS.

So - the point is that there's lots of great stuff out there and sometimes it just takes another person, a blog entry, or a few minutes hitting a company site to see what's new. I'd strongly suggest that you and your team pick a morning - maybe once or twice a month (and round robin who brings the coffee/doughnuts :)) to just browse around and see if there's anything new on your hardware, software and other supporting sites - especially those that don't already have a blog/rss feed (or other form for notifications). No one needs to know everything but knowing where to look can really make all the difference when you do need something (or when you have a concern/problem). And browsing a few sites (occasionally) might make the difference in applying a hotfix/patch before something becomes a big problem. Staying current with hotfixes, service packs, bios updates, firmware updates, etc. is difficult so make it a team effort.

Speaking of service packs, here's the primary page for SQL Server 2005 SP2: http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/servicepacks/sp2.mspx
And don't forget the Books Online Update as it is NOT installed when you update an instance to SP2: http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx

Also, be sure to watch the SQL Server Manageability Blog (aka Paul Mestemaker's blog) moving forward as he'll have the first news about BPA and many other tips/tricks with regard to the tools.

Finally, (and this is great timing too), Paul Randal - prolific author at the SQL Server Storage Engine Blog - blogged about all of the active "SQL Server Product Team"-related blogs here.

And........ if that doesn't keep you busy, I'm not sure what will! :)

Instant Initialization technical details are next and then I'll get back to the Clustered Index Debate. Thanks for reading!
kt

Enterprise only.

OK - I really need to blog more and well - I'm starting today by blogging a "quickie" blog entry on something that I just learned recently and that most of us (who speak/write/whatever on SQL Server) have been saying incorrectly...even marketing :). What's been said is that the new SQL Server 2005 feature "Instant Initialization" is an Enterprise Only feature (remember that Enterprise Only includes ALL skus that have the enterprise engine (EE) - the EE is in Enterprise Edition, Enterprise Eval and Developer Edition). Well... that's not the case. And - personally, I never tried it on standard edition as most of my customers are enterprise customers OR we're doing development work on the Developer Edition. All I have to say here is COOL! Now - I'll post another entry (shortly) that tells you all about Instant Initialization as I think it's a very simple and important feature to allow (no, it's not necessarily on by default - this is part of why I need another blog entry).

As for upcoming events - there are 3 in March.

March 6 - Reading, UK
   
One day workshop on Crucial Database Maintenance Techniques hosted by Tony Rogerson of sqlserverfaq.com.

March 8-10 - Lalandia, Denmark
   
SQL Server OpenWorld hosted by Miracle Denmark.

March 25-29 - Orlando, Florida - USA
   
SQLConnections hosted by Penton Publishing and SQL Server Magazine.

Many of the above events are focusing on HA/DR and Database Maintenance and are copresented with Paul Randal. He blogged about these events here.

And - I'll be back soon! I promise!! (notice that the time between blogging is decreasing - in general :)
kt

Categories:
Events | SQL Server 2005

Well, I've promised to blog more and I'm really going to try to do so. This morning I got the perfect question/comment (in email) to respond to and after working through a response that was taking me upwards of 3 hours (you'll learn later why I have 3 "spare" hours :)......... I figured that it was time to turn the response into a blog post. ;)

Background: The Clustered Index Debate
In the years since the storage engine was re-architected (SQL Server 7.0+) there's been constant debate on how to appropriately choose the clustered index for your tables. I've generally recommended an ever-increasing key to use as a clustered index and many find that counterintuitive. The primary reason people feel it's counterintuitive is that it creates a hotspot of activity. [If "hotspot" is not a familar term - a hotspot is solely an active place within your table.] Hotspots were something that we greatly tried to avoid PRIOR to SQL Server 7.0 because of page level locking (and this is where the term hot spot became a negative term). In fact, it doesn't have to be a negative term. However, since the storage engine was rearchitected/redesigned (in SQL Server 7.0) and now includes true row level locking, this motivation (to avoid hotspots) is no longer there. In fact (and probably even more counterintuitive), the opposite is true. Hotspots (specifically hot PAGES not hot ROWS) can be very beneficial because they; minimize the number of pages needed in cache, improve the likelihood of the required page already being in cache and in general, they minimize the overall amount of cache required. So, this is why many of us have changed our recommendation on where to create the clustering key in 7.0+. Instead of focusing on range queries we now focus on placing the clustering key on an ever-increasing key. In earlier releases, focusing on range queries for the clustered index reduced hotspots for insert/update and this in fact was the PRIMARY motivation to choose them, NOT range query performance! But - there are even MORE reasons to choose an ever-increasing key and they are based on internals as well. These internals are based on the significant changes made in the storage engine for 7.0+. For a quick start on these, I went through them in the Blog entry here.

And, today's email is not uncommon. This is the basis for the title clustered index debate. In general, there are still a lot of questions related to creating clustered indexes to improve "range query" performance. Don't get me wrong, there's definitely a benefit in performance for some range queries but the first thing to remember is that you get only one CL index per table (therefore only one type of range query can benefit). In the real world, t's not likely that you want to see your data exactly in the same way all the time. Therefore it's very challenging to come up with the "right clustered" index if you're using range queries as your strategy. Even worse, the affect of choosing the clustering key to improve range queries causes problems for modifications against that table (INSERTs/DELETEs and UPDATEs). So.............. this is what started my day today. A great email from a reader that brought up these points. The question/comment (modified to hit only the highlights and to protect their identity :) was this:

The most important characteristic for a Clustered Index key is to satisfy range queries. More often than not, if a sufficient range of data will be scanned, the Optimizer will choose the Clustered Index over all others due to the excessive cost of Bookmark Lookup operations. As such, the table KEY is a more suitable clustered index candidate than any surrogate (few every query a database by range of surrogate keys).  [kt note: this second sentence is not entirely true... SQL Server will certainly choose a clustered index over non-clustered that require table scans but there are A LOT of algorithms that SQL Server can use instead of either of these and my examples later show this... non-clustered covering seekable indexes, non-clustered scanable indexes, index-intersection, etc. ] 

Now, when the default behavior for SQL Server was designed such that the PRIMARY KEY was chosen as the default clustered index, it was exactly for this reason.  It is the business key.  It would satisfy uniqueness (by definition of logical KEY).  And, it is well suited for a wide variety of range scans.  However, this is when the PRIMARY KEY is defined on the Business Key of the data.But, when you introduce the usage of surrogate keys (i.e., IDENTITY) as a physical implementation, and thus transfer the PRIMARY KEY definition to it, two things must be considered.  First, the Business Key this IDENTITY will be a proxy for must still exist as it is still apart of the logical design.  As part of the physical design, the logical key needs to be implemented as a physical constraint to maintain logical uniqueness.  Second, just because a proxy has been defined does not make it a natural candidate for the clustered index.  The business key still maintains this distinction.What is often cited as the “reason” for IDENTITY PRIMARY KEY clustered index definitions is its monotonic nature, thus minimizing page splits.  However, I argue that this is the only “reason” for defining the clustered index as such, and is the poorest reason in the list.  Page Splits are managed by proper FILLFACTOR not increasing INSERTS.  Range Scans are the most important “reason” when evaluating clustered index key definitions and IDENTITies do not solve this problem.Moreover, although clustering the IDENTITY surrogate key will minimize page splits and logical fragmentation due to its monotonic nature, it will not reduce EXTENT FRAGMENTATION, which can cause just as problematic query performance as page splitting.

In short, the argument runs shallow.

Luckily, this email arrived with perfect timing for me as I'm sitting in a "bootcamp" event on Always On technologies and I'm not speaking this morning (my colleague Bob Beauchemin is doing lectures on Scale Out technologies: Scalable Shared Databases, Service Broker, DPVs, etc.). Anyway, in addition to listening to Bob, I've decided to continue the blog series on "the clustered index debate". The first and most important point to stress is that minimizing page splits is NOT the only reason nor is it the most important. In fact, the most important factors in choosing a clustered index are that it's unique, narrow and static (ever-increasing has other benefits to minimizing splits).

The Clustered Index Debate Continued
First, there are many angles to look at wrt to "the clustered index debate" and it's not until all of the issues are reviewed, that this strategy (a monotonically increasing key) becomes obvious. So, I think it will probably take a couple of blog posts to really prove this. I'll start up this debate again here...... When you look at a general purpose table (which is most) where the table has ALL DML (S/I/D/U) then you are best off with an ever-increasing key (again, you have to look at the overall impact of all operations against the table - not just select... because I/D/U will also impact select in the long term). So, I'll break this down into each DML operation here. If you don't look at the overall impact, then large tables can end up having a tremendous number of problems once they're put into production. I've certainly heard this concern/debate before (and most people are skeptical at first glance) but when you look at the situation overall, you'll find that "finding the right balance" includes not just looking at range queries. In fact, here's a quick list of the things/tests/numbers/scenarios that help to prove my strategy:

  • Inserts are faster in a clustered table (but only in the "right" clustered table) than compared to a heap. The primary problem here is that lookups in the IAM/PFS to determine the insert location in a heap are slower than in a clustered table (where insert location is known, defined by the clustered key). Inserts are faster when inserted into a table where order is defined (CL) and where that order is ever-increasing. I have some simple numbers but I'm thinking about creating a much larger/complex scenario and publishing those. Simple/quick tests on a laptop are not always as "exciting". But - this is a well documented issue (IAM/PFS lookups) and poor performance on a heap is also referenced in this KB: PRB: Poor Performance on a Heap. note: this KB is quite dated and I don't actually agree with everything in this article however, the general concern of poor performance for inserts is still true on SQL Server 2005.
  • Updates are often faster (when the row needs to be relocated) and for the same reason (IAM/PFS lookups) BUT there are many types of updates and not all updates cause records to be relocated. Here are a few things to think about wrt to updates:
    • Updates that are completely in-place (some examples are where the update is updating a fixed-width column OR to variable-width columns where the row size doesn't change, etc.). These types of updates don't really care.
    • Updates that cause record relocation (where the row size changes) are definitely better by having a clustering key because the record relocation (which will be handled by a split) is defined by the clustering key
    • Updates to the clustering key are the WORST (in this case) which is one of the key reasons for having a cl key that is static (so we have to keep this in mind when we choose a clustering key).
  • Deletes aren't nearly as big of a concern BUT deletes in heaps create more gaps and more gaps creates more work in PFS/IAM lookups and while this helps to reduce wasted space, it still requires the time to find the space........ hence the slowed performance of Inserts/Updates. I've also written some blog entries that cover very interesting test cases for large scale deletes and why you'd want to consider partitioning to optimize for the "sliding window scenario" in this blog entry: MSDN Webcast Q&A: Index Defrag Best Practices - Fragmentation, Deletes and the “Sliding Window” Scenario and it's the LAST one!.
  • Selects.............. now this is the hardest one to go through in just a couple of bullets (ah, I guess this will lead to another one or two posts :) BUT I'll start by saying that the best way to tune the vast majority of range queries is through non-clustered [covering] indexes. But, it's also important for me to stress that I do NOT advocate covering every query (it's impossible to do). What's important to realize in terms of covering is that SQL Server 7.0 and up continues to include internal algorithms to improve performance when you don't have the "perfect" non-clustered covering seekable index and instead still gives better performance than going to the base table (or performing bookmark lookups - as mentioned in the mail...and I completely agree that these [bookmark lookups] can be evil!). To start this discussion, I'll give one of my favorite examples of a large-scale aggregate. The absolute best way to improve the performance is through an indexed view but the data can be gathered through many other algorithms - ideally through a non-clustered covering index that is in order by the group by and that includes the column(s) being aggregated. For example, take this query:

SELECT c.member_no AS MemberNo,
 sum(c.charge_amt) AS TotalSales
FROM dbo.charge AS c
GROUP BY c.member_no

On a charge table of 1.6 million rows here are the performance numbers to handle this aggregation:

  • Clustered table scan (CL PK on Charge_no) with a hash aggregate = 2.813 seconds
  • Index scan (non-clustered covering but NOT in order of the group by) with a hash aggregate = 1.436 seconds
  • Index scan (non-clustered covering in order of the group by) with a hash aggregate = .966 seconds
  • Indexed view = .406 seconds

Now this was a pretty small table (narrow rows and only 1.6 million rows) AND I didn't have any concurrent activity. The concurrent activity would have caused this to be even slower for hash aggregates, etc. Regardless, it proves the point (at least generally). Now, if I wanted to improve this range query then I'd have to cluster on the member_no column (and this is an ideal example because I often hear people say that clustering on a foreign key column helps to improve range/join queries - which can be true as well)......... But - this strategy has a few problems in addition to a few benefits (and we have to look at everything to be sure of our choice/decision). First, member_no is not unique (in the charge table) so SQL Server has to "uniquify" the rows. The process of "uniquification" impacts both time (on insert) and space (the rows will be wider to store each duplicate row's uniqufier). Also, theoretically it could change (in this case that's not true). Anyway, the time it takes for the clustered index is 2.406 seconds which is better than the clustered on the PK (of course) but if I were to also start modifying the rows (which creates splits) or even just insert 15% more rows........ then my table would become fragmented. At that point, the query performance should get worse in the table clustered by member_no table and it will continue to get even worse in the table clustered by charge_no (because of the worktable created in tempdb by the hash aggregate) BUT it won't be all that much worse in the non-clustered index examples (especially the covering index that's in the order of the group by - because this doesn't require a worktable).........

  • CL on member_no = 4.906 seconds
  • CL on charge_no = 6.173 seconds
  • Index scan (non-clustered covering but NOT in order of the group by) with a hash aggregate = 3.906 seconds
  • Index scan (non-clustered covering in order of the group by) with a hash aggregate = 1.250 seconds
  • Indexed view = .516 seconds

This is a great start to furthering the clustered index debate but I do have to admit that it's a counterintuitive and difficult issue to tackle because often isolated tests lead you to different conclusions. In this case though, the non-clustered indexes are better for this range query and the indexed view is the best (but I wouldn't consider the Indexed unless this were more of a read focused database rather than read/write). [and - of course, that statement warrants yet another blog post :)]

So, depending on the tests that you do - especially if you focus only on selects and you don't have modifications (i.e. fragmentation) - then they will make "creating the clustered index for range queries" appear to be best. Again, I'm not just saying this to prevent fragmentation, I'm saying this because I wouldn't use the clustered index OR a non-clustered index with bookmark lookups to handle this query. I'd consider a non-clustered covering that's seekable OR even a non-clustered covering that's scanable before I'd even choose the clustered (and that's what the optimizer would prefer as well). In the end it's really a bit of an art and a science to "finding the right balance" of indexing.

Oh - and if you arbitrarily add a column to use for clustering (maybe not as the primary key) that can help but many would prefer to use actual data... which means [potentially] creating your primary key with a new identity [or similar] column and this can impact your business logic (absolutely). I'm certain that certain tests can show that range queries are faster and it's absolutely correct that business application/usage can be a concern but when you look at the big picture (and the impact on I/D/U) then the benefits of the monotonically increasing key significantly outweigh these concerns. Simply put, a small/narrow key can help join performance and an ever increasing key can also help lookups for rows! (yes, definitely more coming)

Happy Friday! Have a great weekend. I'll try to continue more threads on this debate shortly!
kt

Ok - a strange title indeed but it's been a strange couple of months. It all started with a "much needed" vacation and I thought that would get me through the persistent "cold" that I was having all through my travels. Anyway, vacation didn't help and I came back to find that what I had was a sinus infection. Janaury has been filled with antibiotics, sleep and well......still a lingering cough even though the month is over today. The long story short is that I'm starting to come out of it and I promise to start blogging a lot more frequently starting now. In fact, I have 3 or 4 entries in the queue that I'm plotting for upcoming posts.

To get you started - there are some great and NEW resources that were posted just this week by some of my SQLskills colleagues. If you read their blogs then you've probably already seen this but for completeness, I'm going to post them here:

Bob Beauchemin's Blog Entry: http://www.sqlskills.com/blogs/bobb/2007/01/30/TheFirstOfMyScaleoutWhitepapersIsAvailable.aspx
Bob's Whitepaper: Planning, Implementing, and Administering Scaleout Solutions with SQL Server 2005

Liz Vitt's Blog Entry: http://www.sqlskills.com/blogs/liz/2007/01/30/AnalysisServicesPerformanceTuningWhitepaperHasArrived.aspx  
Liz's Whitepaper: Analysis Services 2005 Performance Guide

Ok - there's my first post of 2007 and NOT my last. More to come. Thanks for reading and HAPPY NEW YEAR (at least I got that in January :),
Kimberly

OK, it's been a heck of a long time since I blogged... and for that I apologize. I'm also WAY overdue in my posting my demo scripts from a TON of conferences BUT... now everything has been posted. Check out the past events page on SQLskills and you can find the demo scripts that you're looking for......lots of fun stuff and TONS of scripts to play with and test.

Now - as for the reason(s).... many are business and for that I blame the following (yes, 17 flights [yes, one boarding pass is missing] over ONE 5 week trip with 7 events and 5 continent changes):

The other reason(s) are personal...suffice it to say that the last 6 months have been some life changing times for me and what I'm finding (or trying to find) is that ever important balance between work and life. During this holiday season (and always), I wish you and your loved ones well and I hope that you too can find (and cherish) what's most important to you.

So, you won't see anything else from me for this year but I do hope to be better (and more frequent) with blogging in the New Year and I especially hope to see you again at an upcoming conference. Let me leave you with the most exciting picture I've witnessed this year...it was during my one day of site seeing in Cape Town - where I went cage diving with Great White Sharks (and got horribly sea sick - which is rare for me) but where I was able to witness these amazing and powerful creatures....

Have a happy and safe holiday season!
kt

Hey there everyone - Been a LONG time since I last blogged (sorry!)... key reason (fyi) is that I'm trying to find the ever-challenging work/life balance during the best months of the year (here in Seattle July/Aug are GREAT! months - September is almost always good too). Anyway, it's been a few weeks and I thought I'd catch you up... It all started with some travel (of course) and I was in Chicago for a SQLskills Immersion event and then off to London for another event with SQL Server FAQ (aka Tony Rogerson). I returned during the CRAZY travel restrictions and had to check two laptops (sigh) BUT they both made it back without damage after I purchased kitchen towels, bed sheets, a blanket and a duvet (and two new [cheap] suitcases) in which to pack them. When I got back, I relaxed! I've been up to see the Athabasca Glacier and Johnston Canyon and just this week I took off a day to do the "Lotus Experience" Advanced Driving Course at Pacific Raceways/ProFormance Racing School. I've always LOVED driving so driving 700 miles to Johnston Canyon didn't bother me at all (especially with great company and fantastic tunes - have you heard the latest Snow Patrol?) and the driving course in the Lotus was not only great fun but VERY informational. I've taken the Advanced Driving course before and I think EVERYONE should consider continuing their education in driving - things like Collision Avoidance, Advanced Braking (understanding proper driving under ABS), "high eyes" and so many other things...just make you safer on the road. OK, enough about all that fun/practical stuff.....let's get back to SQL.

Some great new resources are out:

And - it's now time to start enjoying Labor Day weekend, there ain't no labor going on here this weekend. Enjoy - and check out those links next week. ;-)

Cheers everyone,
kt

Been thinking a lot about something that was mentioned in a few of my most recent posts... Especially when I get comments like "that's another item to add to our checklist" or "that's a good trick to add to our arsenal" and well, I thought in this blog entry I'd ask for your tricks that fall under the umbrella of designing for performance.

For example - do you change collations? I had a recommendation here.
For example - do you have a view that you want ordered? I had a recommendation (with caution) here. But - Adam Mechanic came back and said that he's used that trick to improve performance... and, I'm sure that's the case as well!
For example - do you have stored procedure parameters that are giving you grief? I had a series of recommendations in my Optimizing Procedural Code category here.

In fact, sometimes the best form of "hint" to SQL Server is NOT an optimizer hint but instead a more subtle change to the join (derived tables for example) or the infamous subquery -> join rewrite or the join -> subquery rewrite. I'm always asked "which is better - a subquery or a join" and I always answer YES. ;-)  OR taking a complex process and breaking it down into temp tables (I'd try to create views instead of temp tables first and see if the optimizer figures it out but there are cases when sometimes they just don't). Remember, it's not the optimizer's job to find the absolutely BEST plan; it's their job to find a good plan fast. And - they typically do. Really, no general "tricks" work ALL of the time and often they don't help at all but there are LOTS of things that I'm sure you've done and you really want to tell someone about it. How about here? I'm going to try to compile these tips/tricks into a best of...

Categories:
Design | SQL Server 2005 | Tips

Hey there everyone - If you're into Analysis Services, Integration Services, Reporting Services and BI in general - you'll want to check out Elizabeth Vitt's new blog. Liz has been specializing in BI since SQL Server started adding BI-centric components. She's got a wealth of information to share and many great insights into performance tuning as she's working on a BI Performance Tuning resource that will probably hit 100 pages (from current guestimates).

And - no surprise from Liz, she's out the gates running with her first entry on Influencing Aggregation Candidates.

Subscribe now!

And a big welcome to blogging for Liz!
kt

Categories:
Resources | SQL Server 2005 | Tips

Well, it's been a GREAT week here in Switzerland while working with my partner Trivadis. Today, we wrapped up a two-day course on Designing for Performance (in Geneva) while on Monday/Tuesday we did a two-day course on Indexing for Performance in Zurich. The food, the wine, the cheese, the butter, yum! Oh... and the questions/comments/technical focus, etc. has all been great. :) I'm flying home today (Sat) so wish me luck on having internet access at 36,000 feet again (probably not...I'm flying United instead of Lufthansa - and it's only Lufthansa that has FlyNet). Wow - can you imagine where we're going to be in only a couple more years? Internet access everywhere! (hmmm.. how do we escape? well, that's another blog entry for another day :)

Anyway, one of the great things about teaching is that I get to meet all sorts of people and work through all sorts of interesting problems... And - this blog entry is based on a discussion I had with [a very blogless ;-] Meinrad Weiss - a Trivadis employee/consultant AND a fellow RD. (I was bullied into blogging by CV so now I do my part and do the same to others)

I can't remember how it started but somehow a discussion started on Top 100 PERCENT being used in views. I mentioned that while this was a good trick in SQL Server 2000, it has been REMOVED from SQL Server 2005 (meaning that TOP 100 PERCENT does NOT order the data within a view). Theoretically, I agree with this decision - data within a view should NOT be ordered. A view should SOLELY define a tabular set. It is up to the query which is accessing the view to define the presentation of the view. Using TOP within a view should be limited to ONLY when it is used to further define the data set (i.e. TOP 10 PERCENT... ORDER BY TotalSales DESC makes perfect sense).

Now, having said that - it was a cool trick - but with Pros/Cons. The obvious Pro is simplicity in access. While adding the ORDER BY to the query against the view really isn't all that difficult, it does make it a bit easier for quick/simple query access. BUT - there's a HUGE con here too. If the view starts getting used for other purposes (like in joins to other tables), then the being ordered before the joins, etc. can cause you an additional step that is NOT necessary. As a result, performance was compromised.

Long story short, I generally recommended against it but it was still cool. So - then Meinrad started playing and came up with - what about 99.9 on a table that has < 100 rows OR 99.99% on a table that has < 1000 rows, etc. And - yes - that DOES work, because SQL Server rounds to 100%. So, you are back to getting 100% of your data, ORDERED within a view. But - you need to set your percentage to an appropriate percentage based on rows - but what if you don't know the row count? How about TOP n where n = the max value for a bigint (9,223,372,036,854,775,807)?? That should always work...and it does.

OK - so what's the point? Yes, we DO have a workaround for the removal of TOP 100 PERCENT in SQL Server 2005 - but be CAREFUL - you are potentially shooting yourself in the foot. If this view is NEVER used for anything but SELECT * FROM View, you're OK. If you start adding joins, etc. then you might get into trouble. In the showplan below - the data returned is EXACTLY the same.

SELECT C.ContactName, Sub.*
FROM OrderSubTotalsViewOrdered2 AS Sub
   
JOIN Orders AS O ON O.orderid = Sub.Orderid
   JOIN Customers AS c ON c.customerid = o.customerid
WHERE C.City = 'Madrid'
ORDER BY SubTotal DESC
go

SELECT C.ContactName, Sub.*
FROM OrderSubTotalsViewNOTOrdered AS Sub
   
JOIN Orders AS O ON O.orderid = Sub.Orderid
   JOIN Customers AS c ON c.customerid = o.customerid
WHERE C.City = 'Madrid'
ORDER BY SubTotal DESC
go

BUT - the first plan of execution queries against the ORDERED set and the second against the un-ordered. Check out the showplans below:

This is a VERY COMPELLING reason to BE CAREFUL ordering data within a view. While the trick does work, please use it sparingly.

If you want to play with the views created above, you'll need a copy of the Northwind Database and you'll need this script: Top 100 Percent in SQL Server 2005.sql (4.46 KB).

Have fun!
kt

PS - I'm adding my blog to Technorati so I need to post their link to start generating my profile... here we go: Technorati Profile

Categories:
SQL Server 2005 | Tips

So, I've now spent the last couple of hours playing with Database Mail and HTML formatted messages being sent to the SQLskills subscribers. It's been a fun learning experience as I think I've found a bug with the email account name length...let's just put it this way - don't be too descriptive with your account names.

Outside of that - it's amazing how well queue based email works. The old mapi based mail would take a LONG time to complete the batch mail processing but now - with queue based mail it's done in seconds.

The best part is that I'm on a flight over the Atlantic right now...on my way to Frankfurt. I think this is the MOST productive flight I've ever had!

Have a great weekend,
kt

Categories:
Events | Opinions | SQL Server 2005

I've posted all of my demo content from TechEd 2006 and wow - it was a lot of fun! I created many new and fun demos as I tried to keep everyone awake through the sessions (cause it seems like there were way too many evening events - didn't it? ;-)). All of the content is posted here.

Finally, can I just say that Paul (and team) has been on a roll. They're blogging machines. If you're not reading this blog... you're DEFINITELY missing out.

Have a great weekend... I'm off to Switzerland today.
kt

Categories:
Events | Resources | SQL Server 2005

Well, if you're wondering why I've been so quiet this week... it's a myriad of events all coming together and/or being finalized right in time for TechEd. In working really hard (especially crazy was today) for some final TechEd content, I realized that a lot of people don't really know what goes on behind the scenes of some of these really huge events. Brian Marble has been blogging about this and you can learn some interesting things by checking out his blog. And for some fun statistics related to TechEd, here is an idea of the quantity of food and drink that will be consumed:

  • 1,250,000 pieces of "Mikes & Ikes" will be consumed over the course of the Tech Ed 2006 week
  • 18,750 pounds of salad will be prepared and offered at meals
  • 83,700 ice cream novelty/fruit and yogurt bars have been ordered for this function
  • The total amount of fruit ordered will fill 3/4 of full size tractor-trailer
  • 60,000 eggs will be eaten by attendees at breakfast (this is equal to 4,800 dozen cartons of eggs)
  • It will take 4 semis to transport the 150,000 bottles of water consumed
  • 1.6 million ounces of coffee will be poured and consumed (conservative estimate)
  • More than 50,000 pounds of carbohydrates will be consumed at Tech*Ed (Atkins who?)
  • 1,500 table cloths will be used and re-set on a daily basis (7,500 for the week)
  • A minimum of 2,000 antacid tablets are likely to be consumed at this event

As for the technical content, well that's not too shabby either. There are over 900 breakout sessions, chalk talks, ILLs (Instructor-led labs), HOLs (Hands-on labs) and general/keynote sessions. There's a lot of technology that comes together for a show like this and there's even a DVD that's available after the show with all of the breakout sessions on it. The key point is that there's a lot going on and I'd have a hard time believing that you couldn't find something to do during every timeslot (for me there are multiple time slots where I'm torn between delivering my own session and attending another...but, I have a feeling I know where I'll end up :).

One thing that you can do in almost every timeslot is an HOL (Hands-on Lab) and for SQL Server there are more than 10 of them. Each HOL is focused and technical and each covers a specific technology or topic. For TechEd 2006, I've written two of the HOLs: DAT007 and DAT010. Specifically, DAT007 is Database Mirroring in SQL Server 2005 SP1 and DAT010 is Table and Index Partitioning. These HOLs cover everything from design to implemenation to failover to monitoring - for Database Mirroring and for Partitioning the lab goes from design to implementation to performance to the sliding window scenario. They were a lot of fun to write and I hope a lot of fun to go through. If you're interested in hearing more about them, Mark Penaroza did a couple of interviews about them. He blogged about it here and mentioned that the interviews are available on Commnet (the Microsoft TechEd attendee website). I've also posted the interviews here (DAT007 Interview (4MB mp3 file) and DAT010 Interview (2.75MB mp3 file)) so that you can get some insight into the things we're doing to help get you started and ready with these new technologies.

Finally, since TechEd is sold out, I know that not all of you will be there. As a result, there's "Virtual TechEd". Virtual TechEd is a site dedicated to getting some of the content and resources out to folks that just couldn't attend. The Virtual TechEd site is here: http://virtualteched.com/default.aspx

So, I think that's it for now. Still enjoying the comments you're making on the last blog entry about the version you're running and why. Seems like we all have the same problem - time and money ;). Keep those comments coming!

Thanks for reading,
kt

Categories:
Events | Resources | SQL Server 2005

Ha... did that get your attention? Well, what I really hope to do is make everyone aware of what's made the Developer Community rounds this week. On Wednesday, Microsoft announced "Data Dude" (aka the Visual Studio sku for Database Developers). This was an announcement that may be glanced over by many DBAs thinking it's just another tool for developers...what can it offer me? And, well, that's where I think there are some VERY cool things to point out. I've been following Data Dude for a couple of months now (ah... a little birdy told me :) and at first I wasn't sure how much it would impact me. However, after starting to get a better feel for their future directions, I've realized that even though I'm not their initial and/or direct target audience that I'll definitely find some great uses for their product. In fact, in getting ready for their announcement and in chatting with a few press folks, I wrote up a small amount of text. Some of this was quoted in the eWeek article here but there's a few more things that I really think you'll (yes, even DBAs) be interested in. This is the second half of the content that was quoted:

For Administration and Operations, I especially like their direction with regard to unit testing and sample data generation. I work with a myriad of customers who do not let development/operations perform testing/tuning on real production data (even a copy) due to data sensitivity requirements/policies. As a result, performance testing can be horribly flawed. With the ability to generate large sample volumes of statistically "real" data, real-world tuning will be possible without compromising data sensitivity. This is the area that I'm most interested in initially but refactoring and schema comparisons are very interesting as well. One of my favorite sayings is "The sooner you start to code, the longer the program will take." (Roy Carlson) as schema changes can be challenging at best and often things can be missed (data types, columns names, etc.). Often the changes are made on alternative systems and then they need to be integrated in - often through comparing schemas and with hand-created alteration scripts. With the ability to have intelligent refactoring, application and database logic can be fixed through a straightforward and flexible interface. This will help to minimize later errors or even harder to recognize performance problems caused by implicit conversions.

So, in the end, I'm not their primary target but I think I'll probably get really into it and try to consider a variety of ways to leverage it for Operations/Administrations teams for "after the fact" tuning cases. However, I do think teams will be even more productive if they adopt it earlier in their production lifecycle.
Now, if you're even slightly interested. You can get a lot more information about Data Dude already. Here's a beginning list:

And, if you're going to be at Microsoft TechEd 2006 - in Boston in a couple of weeks, there's a LOT more information coming. In fact, my pre-conference workshop co-presenter Brian Randell has authored some Hands-on Labs for Data Dude and those will be available in the HOL area. He's on DotNetRocks this week (to be released on DNR's site next Tuesday) and he's chatted with them in the past about Virtual Server/Virtual PC (for hours...now you know how I feel ;). Honestly though, we use VPC/VS a lot in our HOLs and Brian is REALLY knowledgeable about how to optimize them, compact drives, etc. Anyway, here's the link to his last show on DNR and here's Brian's blog entry on Data Dude. And, just as a small hint... you should consider making a Sched+ for the "Live from TechEd" show from DNR. All I have to say is that it might have some great guests on it (maybe even related to Data Dude, hint, hint)!

Now, the last thing that's the most exciting for me to announce is that there are some new bloggers as a result of the Data Dude annoucement. FINALLY, one of my best friends - Gert Drapers - has started blogging (don't forget his already awesome content site: http://www.SQLDev.Net). If you're at all interested in geeky database development stuff, subscribe now! And - many of his team members are great friends too (Richard and Matt!) and I'm very excited to see them blogging as well (it's just that I've been begging Gert to do it for the past couple of years ;)). Anyway, it will be great fun watching this team grow and watching this product evolve.

Here are the Data Dude team blogs:

Here's the official Visual Studio Team Edition for Database Professionals site.

The times are changing............. for the better!
kt

Categories:
Resources | SQL Server 2005 | Tips

But - it was a lot more laid back this time... Once again, it was fun! Thanks Carl. Thanks Richard.

Here the link for the show: http://www.dotnetrocks.com/default.aspx?showID=181 and of course, the general link to DNR is http://www.dotnetrocks.com.

Enjoy,
kt

Categories:
Events | SQL Server 2005

In part 11 of the TechNet webcast series for the ITPro, I spoke briefly about the Oracle Migration Assistant and the recent release of the Community Tech Preview versions of both the Access and Sybase Migration Assistants... A few of you asked for their download location and there were even a few replies that folks had found them... However, after looking around, I figured out that what was found were the OLD and very outdated Access Upsizing Wizards (and that's not this!). The new SQL Server Migration Assistant tools are truly Migration tools - tools that can help convert code, change data types, etc. More than anything they're targeted at being more complete and feature rich than just an "upsizing tool" which is excellent for what it is but still requires a lot of additional work.

Having said all of that, I have the details about the TRUE Migration Assistants.

SSMA for Access download instructions:

Download instructions

  1. Open the download page here.
  2. Select ‘Receive File from Microsoft’.
  3. Enter the Password: w$%dIcKP_TZrf
  4. Download and run ssma-for-access-xxx.msi

SSMA for Sybase download instructions:

Download instructions

  1. Open the ftp download site page here with the following username and password:
         username: SSMA4Syb2
         password: i456$Lk
  2. Download the msi for Sybase or the msi for the Sybase Enterprise Portal (ep).

    IMPORTANT: If you access the files from Internet Explorer, please verify IE Browser Settings using Tools, Internet Options, Advanced, under the Browsing section:

CHECK - Enable folder view for FTP sites
UNCHECK - Use Passive FTP (for firewall and DSL modem compatibility)

So... have fun with those downloads and if you run into any issues/concerns - be sure to post in the SQL Server Migration Assistant newgroup.

Cheers,
kt

OK, so... I don't know how many of you use different collations but if you do then you know that there are two truths:

1) They're very flexible
2) They can cause you a bit of grief (changing collations and tempdb)

Flexibility

As of SQL Server 2000 (or heck, maybe it was 7.0?), database collations could be changed at installation OR set/changed later. You can set the collation when a database is created (if not set, the database will use the server's default). You can set the collation when a table is created (if not set, the table will use the database's default). You can set the collation when a query is executed (which doesn't really make sense unless it's in a WHERE clause or ORDER by clause). And - you can set the collation in a view or stored procedure to do things like case sensitive searching - on the fly. However, neither of these will perform well over large results sets (at least not without indexes) so, I'd be careful of doing any adHoc changes to collations (even in views/sps - without appropriate indexes)!

Anyway, the key point is that they're very flexible. In many international databases/localized databases, column collation differs by table (in order to do efficient sorting, etc.) and different language data may be separated (either with a column that described which language/country code is used OR in different tables).

Grief in Changing Database Collations

Actually, changing database collation is *very* simple. Literally, it only takes an ALTER DATABASE to do. For example, the following code runns flawlessly:

USE master
go

DROP DATABASE TestCollation
go

CREATE DATABASE TestCollation
COLLATE
SQL_Latin1_General_CP1_CI_AS
go

sp_helpdb TestCollation
go

ALTER DATABASE TestCollation
COLLATE Latin1_General_CS_AS_KS_WS
go

sp_helpdb TestCollation
go

BUT... if you go from case sensitive to case insensitive... be careful! It is important to realize that ALL of your tables AND data will need to be checked against the new collation. In fact, changing database collation will not be allowed if the objects/data would no longer adhere to your unique constraints, etc. Check out this more complete script (ChangingDatabaseCollation.sql (2.85 KB)), if you want to see what happens.

Grief with temporary objects

So.. the other area (and this seems to be the one where everyone has trouble), is with temporary objects. If you create a temp table and your database has a different collation other than TempDB (which has the same collation as the system - based on installation), then comparisons/lookups/joins - may have problems. A simple trick to get around this is to use database_default. Check out this sample and you'll see how it works:

CREATE DATABASE Test
COLLATE Icelandic_BIN
go

USE Test
go

CREATE TABLE #test1
(
   col1 varchar(12)
)
go

CREATE TABLE #test2
(
   
col1 varchar(12) COLLATE database_default
)
go

USE Tempdb
go

CREATE TABLE #test3
(
   
col1 varchar(12) COLLATE database_default
)
go

sp_help 'tempdb..#test1' -- Will use TempDB's collation
exec sp_help 'tempdb..#test2' -- Will use Test's collation (Icelandic BIN)
exec sp_help 'tempdb..#test3' -- Will use TempDB's collation
go

So simple, so obvious... and, well - I just found out about that one?! I used to recommend that you explcitly set the collation for every column. Now, that still works - but, it doesn't offer you any flexbility. So, you could get around that with dynamic string execution but that can also get very complicated, very quickly. So... database_default is a VERY simple and clean way of doing this.

Have fun,
kt

Categories:
SQL Server 2005 | Tips | tempdb

The SQL Server team has a few *very* interesting blogs and the Engine Team just started blogging - check it out here: http://blogs.msdn.com/sqlserverstorageengine/ (thanks for the heads up Sunil).

For completeness, here are the bulk of the other SQL team blogs - which I leveraged (aka stole - thanks Euan!) from Euan Garden's EXCELLENT list (his blog roll) of SQL Server Team Blogs.

SQL Server Team Blogs

Excellent CORE/Related SQL Server Team Blogs

Now there's some entertainment for the [holiday] weekend ;). Hope that all of you enjoyed a bit of rest and relaxation this weekend.... now, back to work!

Cheers,
kt

Categories:
Resources | SQL Server 2005

Hey there everyone - The series has completed and I know that many of you struggled to get access to the surveys... Microsoft has asked me to post links to the surveys...so, for completeness, I decided to create this blog entry to have links for every session, every blog link (resources, demo scripts, etc.) and the survey links. I really did have a lot of fun on the series and I hope we can do this again!

TechNet Webcast Series

Session 1: A Fast-Paced Feature Overview and Series Introduction (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry is here
   Session's survey is here.

Session 2: Security (Level 200)
   Presenter: Bob Beauchemin, SQLskills.com, 
   Session's corresponding blog entry is here
   Session's survey is here.

Session 3: Understanding Installation Options and Initial Configuration (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry is here
   Session's survey is here.

Session 4: Upgrade Considerations and Migration Paths (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry is here
   Session's survey is here.

Session 5: Effective Use of the New Management Tools (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry is here
   Session's survey is here.

Session 6: New Application Design Patterns for Scalability and Availability and the Operational Implications of Service Broker (Level 200)   
   Presenter: Bob Beauchemin, SQLskills.com, 
   Session's corresponding blog entry is here
   Session's survey is here.

Session 7: Technologies and Features to Improve Availability (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry is here
   Session's survey is here.

Session 8: Implementing Database Mirroring, Part 1 of 2 (Level 200)
   Presenter: Mark Wistrom, SQL Server Program Manager - Microsoft Corp., 
   Session's corresponding blog entry is here
   Session's survey is here.

Session 9: Implementing Database Mirroring, Part 2 of 2 (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry is here
   Session's survey is here.

Session 10: Recovering from Isolated Disasters and Human Error (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry here. And a second blog entry here.
   Session's survey is here.

Session 11: Best Practices in Building Robust, Recoverable, and Reliable Systems (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry here.
   Session's survey is here.

And that's about it! I hope you really enjoy the series... and if you like that one, you might want to checkout the entire 10-part series on MSDN. The link to the blog entry that has all the links (like this one) is here.

Have fun,
kt

Well... 11 of 11 has completed. Friday was our last chat - until next time ;). It was a summary event where I took a slightly different spin on things focusing on grouping technologies by the amount of effort that's needed to implement them. Simply put, we looked at the technologies in order of what gives you the biggest bang for the buck. We ended the session with a ton of great questions (as always!) and there was even a question on the origin of foo (make sure to also see fubar).

First, there were a few links that I wanted to provide from the session, I'll start with those:

And, we also talked about Migrations:

Finally, capacity planning:

  1. Calculate the amount of space needed for your tables (calculate this as rows per page and then required pages as MB)
  2. Calculate the amount of space needed for your indexes (you can use sp_spaceused to get a current ratio of index to data and then use that OR you can estimate 1-3times your current data in indexes...yes, if you have 10GB of tables - you should estimate 10-30GB for indexes)
  3. Calculate in your estimate on future growth
  4. Take your single largest table and multiply by 1.5 for free space. (Use 2.5 IF you're going to use ONLINE index operations). So, if the single largest table is 3GB then I'd add 7-8GB for free space)
  5. Add a "just in case" extra 10-20%
  6. And, I didn't mention this BUT you should also include alerts to help you monitor space usage and significant changes to your free space!

And that wraps up the series. Wow - I can't believe how many of you joined in for questions as well as stayed on until the end. It's really great that so many of you are still having fun with SQL Server as well. I look forward to another series with you...at some point! In the interim, here are a few places where I'll be:

SQLskills Immersion Events - in the US... will be announced shortly. The BEST place to be when we announce the dates for these events is a subscriber on SQLskills. Subscribing is FREE and the announcements are going to be later this month. Here's a link to directly subscribe on SQLskills: http://www.sqlskills.com/login.aspx.

Thanks again for attending the series! It was great fun. I'll post a final blog entry with ALL of the links as well as all of the survey links. I know that they're going to send me these so that you can get easier access to them.

See you next time,
kt

In part 9 of our webcast series titled: Implementing Database Mirroring, we covered the steps from setup to failover to monitoring. There were lots of great questions and I think we could easily go back and do a couple more hours on database mirroring, failover combinations - including manual failover and client application questions. Having said that, there were a few interesting scenarios that came up that I thought I'd add a bit more details about here. For simplicity I created sections...

Where to go for more information on Database Mirroring and SQL Server SP1

Database Mirroring between Editions

Database Mirroring is supported in both the Standard Edition (SE) and the Enterprise Engine (EE) Edition(s): Enterprise, Enterprise Eval and Developer. In the EE Editions all configurations (synchronous and asynchronous) are supported: High Availability (sync), High Protection (sync) and High Performance (async). In the SE, only the synchronous forms of Database Mirroring are supported: High Availability and High Protection. One thing that is true however, (and I learned this as well - durin the webcast in Part 9 - thanks to the question submitted and Mark being present...thanks Mark!), is that even while synchronous mirroring is supported in both SE and EE, you can only create a mirroring partnership between servers of the same edition.

Database Mirroring between Platforms

Database Mirroring is supported in both the Standard Edition (SE) and the Enterprise Engine (EE) Edition(s): Enterprise, Enterprise Eval and Developer. In the EE Editions all configurations are supported: High Availability, High Protection and High Performance. In the SE, only the synchronous forms of Database Mirroring are supported: High Availability and High Protection but not the asynchronous High Performance configuration. One thing that is true however, (and I learned this as well in Part 9 - thanks Mark!), is that even while synchronous mirroring is supported in both SE and EE, you can only create a mirroring partnership between servers of the same edition.

Combining Database Mirroring with Other Technologies

The Books Online has a section targeting exactly this discussion. Review this section in the SQL Server 2005 Books Online (April Update): Database Mirroring and Other Features and Components. Additionally, I've provided a few comments for you to review as well as links to some of the specific BOL topics that exist on these combinations.

Database Mirroring with Failover Clustering

These two technologies CAN be combined but there are multiple things with which you should be aware. First, a failover of a cluster is SLOWER than a failover of a Mirror pair... as a result, it is likely that your secondary server will come online as the new principal in the time that it takes your principal (which is on a cluster) to recover. In a lot of cases, this is good because this keeps you online longer and results in less downtime but it may also be undesireable when your primary is now running at your alternate operations site - which is unstaffed. So, in some cases you may want to prevent automatic failover and instead only use the secondary mirror when you absolutely have to (i.e. NOT just when the cluster fails). If this is the case then you might prefer running with the High Protection configuration of Database Mirroring instead of the High Availability configuration.

This will allow you to manually failover when desired.

As another option - you can increase the timeout for Database Mirroring failover to 90 seconds. If the cluster comes back online within 90 seconds then the automatic detection/failover of the High Availabilty configuration will not occur unless the cluster does not come back online (as the principal) within x seconds. You can configure the Database Mirroring Failover timeout by using ALTER DATABASE.

ALTER DATABASE dbname SET PARTNER TIMEOUT x

Please note, this is only one timeout of many. There are many different types of timeouts in the system that can cause a failover. However, a hard error code generally starts the failure procedure sooner.  Mark pointed this out in his failure detection slides in our TechNet webcast series, Part 8.

Review this section in the SQL Server 2005 Books Online (April Update): Database Mirroring and Failover Clustering.

Database Mirroring with Replication

These two technologies CAN be combined together but not all configurations are supported and where supported, there are specific setup requirements. From the BOL: Replication supports mirroring the publication database for merge replication and for transactional replication with read-only Subscribers or queued updating Subscribers. Immediate updating Subscribers, Oracle Publishers, Publishers in a peer-to-peer topology, and republishing are not supported.

 
Review this section in the SQL Server 2005 Books Online (April Update): Replication and Database Mirroring

Database Mirroring with Log Shipping

These two technologies CAN be combined together but it will require a bit of manual configuration to continue log shipping when a mirror becomes the new principal.

Review this section in the SQL Server 2005 Books Online (April Update): Database Mirroring and Log Shipping.

And - there are others in the BOL. Please reference the sections listed above for more details.

And - with that - we're caught up with our resources and references for this series. Part 11 - the LAST one - is this Friday, May 19. I look forward to your being there LIVE. Register here and come ready with your questions, this one is going to be VERY focused on best practices, ideas/architectures and your questions. Those of you that are there LIVE will help to direct the session.

Thanks!
kt

In the last few minutes of the webcast (part 10), I goofed up one line of code and didn't realize it until today. As my very last demo (and there were at least 10 different scenarios/concepts/demos yesterday) in my webcast, I decided to show a Database Snapshot on a Mirror database. It was the second database snapshot that I had created so my first database snapshot demo was just fine. However, when I went to create the database snapshot on the mirror, I inadvertently left off the most important part "AS SNAPSHOT OF AdventureWorks". The irony is that I tried to query some tables and just ended up (because we were right at the end of the webcast ;)) saying that I probably wasn't getting the table names right. Ha - there were no tables... I hadn't created a database snapshot, I had created just another database - so the only tables I was seeing were the catalog views.

Anyway, just for clarity, I corrected the "Demo Scripts" zip that's associated with Part 10 BUT if you've already downloaded it then you'll have the old (and incorrect) version of this script (SnapshotOnMirror.sql). And, for completeness, I'll put the code that I executed during the webcast here:

USE AdventureWorks
go

USE master
go

CREATE DATABASE AdventureWorksSnap
ON
( NAME = N'AdventureWorks_Data',
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\AdventureWorksSnap_Data.mdfss')
-- , SIZE = 167872KB , MAXSIZE = UNLIMITED, FILEGROWTH = 16384KB )
go

and the code that I should have executed here:

USE AdventureWorks
go

USE master
go

CREATE DATABASE AdventureWorksSnap
ON
( NAME = N'AdventureWorks_Data',
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\AdventureWorksSnap_Data.mdfss')
-- , SIZE = 167872KB , MAXSIZE = UNLIMITED, FILEGROWTH = 16384KB )
AS SNAPSHOT OF AdventureWorks  <<<< ----------
go

USE AdventureWorksSnap
go

SELECT * FROM person.contact
go

So, quick demos right at the end of the webcast might not have been my best idea ;). But - I'm surprised none of you called me on it?! I'll blame it on this for now.

Have a great weekend,
kt

OK - today's session was quite fun... lots of demos and quite a few "tie-ins" where I tried to bring together many things that we've touched on in our series. And - that's really the point of the series - creating a reliable, robust, scalable and available environment takes MANY different features. You really need to architect a complete solution in order to handle the many potential problems that may occur. And, unfortunately, it's a never ending process; you're never done and you're never going to get everything (sorry!). You will need to re-evaluate, monitor, and manage your system as long as it runs to keep it reliable, available and fast. Something will come up...someday...that you didn't think about, evaluate and/or prevent. But, then you'll know and then you'll put something into place to keep it from happening again.

So - to tie back into some of the other sessions and resources, here is a list of everything to date in the series as well as a few specific references I made during the session.

Demo Scripts are here: 20060512_TechNetWebcast-Part10.zip (25.46 KB) (updated on Sat, May 13 at 2:55 PDT)
Credit Database zip is here. NOTE: This is a 48MB zip which expands to a 175MB backup and restores to a 700 MB database (with a lot of free space for testing, etc.).

TechNet Webcast Series

Session 1: A Fast-Paced Feature Overview and Series Introduction (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 2: Security (Level 200)
   Presenter: Bob Beauchemin, SQLskills.com, 
   Session's corresponding blog entry, here

Session 3: Understanding Installation Options and Initial Configuration (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 4: Upgrade Considerations and Migration Paths (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 5: Effective Use of the New Management Tools (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 6: New Application Design Patterns for Scalability and Availability and the Operational Implications of Service Broker (Level 200)   
   Presenter: Bob Beauchemin, SQLskills.com, 
   Session's corresponding blog entry, here

Session 7: Technologies and Features to Improve Availability (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 8: Implementing Database Mirroring, Part 1 of 2 (Level 200)
   Presenter: Mark Wistrom, SQL Server Program Manager - Microsoft Corp., 
   Session's corresponding blog entry, here

Session 9: Implementing Database Mirroring, Part 2 of 2 (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here.

Session 10: Recovering from Isolated Disasters and Human Error (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   You're reading it! :-)

Recovery Models and Backup/Restore

  • MSDN Webcast Parts I and II cover Recovery Models and some issues/best practices related to changing recovery models. Check out the blog entry here which has links to the sessions and their associated blog entries.
  • MSPress Title: SQL Server 2000 High Availability, Chapter 9: Database Environment Basics for Recovery is here. The MSPress page for this title is here.
  • SQL Server Magazine Article on Isolated Disasters and Recovery (using RESTORE with STANDBY/STOPAT to investigate when a database became damaged) is here. Check out a consolidated list of all of my SQL Server Magazine Articles here and SQL Server Magazine here.

Table and Index Partitioning

RAID 0+1 and RAID 1+0

There was a question that came up on this and the question basically asked - which is better. Well, this is a hard question to answer because they both have pro's and con's BUT before I get to the pro's/con's there's also another [more important] issue; these two get confused and swapped all the time. In fact, many vendors USED to refer to these interchangeably and even just lumped them together as RAID 10. Today, most people don't do this and most people also try to refer to the underlying technlogy instead of the numbers. Having said all of that, RAID 1+0 is Striped Mirrors and is my general recommendation because it tends to be more reliable than 0+1 and can tolerate more drive failures than 0+1. RAID 0+1 is Mirrored Stripes - which generally outperforms RAID 1+0 but cannot tolerate the loss of more than one drive and because of that it's more vulnerable. In the end, I'd suggest a simple "educational" site here (it's on a commercial site but it has a nice - and short - description of the different types of RAID arrays).

See you next week - for our LAST part in this series - Part 11: Best Practices in Building Robust, Recoverable, and Reliable Systems (Level 200).

Thanks for reading, listening and continuing to ask great questions!
kt

I completely spaced in blogging about a recent interview I did...poolside, in Orlando, FL when I was at SQL Connections back in April. I had the pleasure of meeting Chuck Boyce (a DBA from Philly, PA) who feverishly works in his spare time to spread the word about technology and specifically about all things SQL. His blog is here and he does a great job of summarizing good links and useful resources - almost daily (just so you don't have to!) on his "WHERE Clause" resource blog posts. You should check that out while you have your morning coffee. A great way to quickly find some useful stuff.

Additionally, Chuck has a radio program (What's Happening in SQL Server) that he does for SSWUG (Steven Wynkoop's excellent SQL Server Worldwide User Group). The entire list of SSWUG Broadcasts are here and specifically, the chat that we did poolside is here.

So - sorry that took me so long to remember... I wish I could blame it on too much sun (and/or drinks) poolside but......... sadly, I can't.

See you Friday on our 10th part of our TechNet Webcast series. Wow, we're on the home stretch!

Thanks,
kt

Last week Mark Wistrom (Program Manager in the SQL Server Team at Microsoft), delivered part 8 of our TechNet webcast series. Most of the resources needed to prepare for this session - as well as learn more about Database Mirroring - have already been posted in the blog entry for part 7 (as homework!). However, there were two things that we wanted to post from Mark's session:

(1) The case study that was presented during the session is here.
(2) The Q&A that was created by a few of Mark's team who were answering during the session (and then Mark did a scrub of it as well to clean it up- THANKS Mark) is here (29.1 KB).

Enjoy!
kt

Well, Part 7 has completed and we're on the home stretch... focusing on part of the new Always On technologies of SQL Server 2005. We've made our way through quite a few discussions and my main point for the sequence - as defined - was to make clear that keeping a system available takes a myriad of choices, features, configurations - and more. In fact, even once you think you've done it you still need to monitor, manage and re-evaluate your configuration if unexpected events occur and bring your system offline and/or unavailable in any way. And - well, that's also a big part of my focus... what does "availability" mean to you? Do you believe that only unplanned downtime counts or that *any* impact to the system's availability counts as "downtime"? (btw - I'd really like to know!)

Regardless, that's been our primary focus for the series... I believe that the Enterprise Edition of SQL Server 2005 can keep your system available through a very wide number of system hiccups, damage and even more catastrophic disasters. In the previosu sessions we looked at migration and installation (ensuring a proper configuration - right from the start), we covered creating a secure environment (which also impacts availability), we looked at "finding the right tool for the job" and then we started looking into alternative designs that may help to improve availability by scaling out our design. If you missed any of the sessions you might want to go back and see what's what! Here's the list of sessions at a quick glance:

Session 1: A Fast-Paced Feature Overview and Series Introduction (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 2: Security (Level 200)
   Presenter: Bob Beauchemin, SQLskills.com, 
   Session's corresponding blog entry, here

Session 3: Understanding Installation Options and Initial Configuration (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 4: Upgrade Considerations and Migration Paths (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 5: Effective Use of the New Management Tools (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 6: New Application Design Patterns for Scalability and Availability and the Operational Implications of Service Broker (Level 200)   
   Presenter: Bob Beauchemin, SQLskills.com, 
   Session's corresponding blog entry, here

Session 7: Technologies and Features to Improve Availability (Level 200)

Session 7 was a great deal of fun - we covered 11 different technologies (12 if you count partitioning) and discussed the architectural reasons to choose each teachnology - as well as the barriers it provides protection against. We talked about a lot of technologies and a lot of resources:

  • Remote Mirroring - Always consult your hardware vendor and make sure they support block size preservation and write-order preservation. Ideally, RM should be combined with Failover Clustering - when that's the case you have a "Geographically dispersed failover cluster" which removes the single point of failure in Failover Clustering. See the Windows Server Catalog, specifically for the Geographically Dispersed Cluster Solution category.
  • Failover Clustering - A combination of hardware and software to provide protection against server failure. Only solutions from the Windows Server Catalog, specifically for the Cluster Solution category for supported solutions in failover technologies.
  • Database Mirroring - See the homework references below as well as come back for the next two upcoming sessions where we cover DBM for two weeks.
  • Log Shipping - While this is still supported *and* while there are still some excellent uses for Log Shipping, this is not a "favorite" solely for failover. If you're looking for a "warm" failover solution (warm = no automatic detection, no automatic failover) with less potential for data loss - you should consider the "High Performance" configuration of Database Mirroring. If you would like to continue using Log Shipping for a more latent secondary (a log load delay) for managing disasters (either investigation or recovery) of data from an older "version" of the database then LS is an option but Database Snapshots can also help in *some* cases. This technology is well documented as well as written about.
  • Peer to Peer Replication - I demo'ed and discussed this in session 1 as well as referenced a few helpful links for TechNet sessions, etc. See the session and "blog" links as listed above.
  • RAID - Redundant Array of Independant Disk
  • Partial Database Availability, Online Piecemeal Restore and Database Snapshots - come back for Session 10 where I'll cover these and demo these!
  • Raid.edu - a short - but interesting overview of all the different raid types.
  • MSPress title: Microsoft SQL Server 2000 High Availability, Chapter 9: Database Environment Basics for Recovery
  • SQL Server 2000 and SQL Server 2005 support for mounted volumes
  • MSDN "Developer/Design" Webcast Series: Blog entry with all of the links
    • Online Index Operations, Part 5
    • Snapshot Isolation, Part 6
    • Partitioning, Part 8
  • Scalable shared databases are supported by SQL Server 2005
  • Oracle Real Application Clusters and Industry Trends in Cluster Parallelism and Availability

Finally, be ready to watch Mark's session on Friday, April 28. Here's your homework for Session 7:

  1. Review: Release notes and information for SQL Server 2005 Service Pack 1 
  2. Watch: TechNet Webcast: How to Increase Availability Using Database Mirroring in SQL Server 2005 (Level 200) 
  3. Read: Database Mirroring in SQL Server 2005 

And the details for Session 8:

TechNet Webcast: SQL Server 2005 for the IT Professional (Part 8 of 11): Implementing Database Mirroring in SQL Server 2005 (Part 1 of 2) (Level 200)
Presenter: Mark Wistrom, Program Manager, Microsoft Corporation

Database mirroring was released for testing when Microsoft SQL Server 2005 shipped in November. As the first service pack has shipped, it's time to get prepared for database mirroring in production! In this session, understand the barriers of what database mirroring will protect against, what constitutes a "failover", what the performance criteria are and how the monitoring has been brought together for release. Attend this first part of two - as the eighth webcast in the SQL Server 2005 for the IT Professional series to obtain better insight for when database mirroring should be implemented as well as what to expect moving forward in service pack 1 (SP1). Part 9 will cover implementation from start to finish - as an end to end demo.

Start Time:   Friday, April 28, 2006 9:30 AM (GMT-08:00) Pacific Time (US & Canada) 
End Time:   Friday, April 28, 2006 11:00 AM (GMT-08:00) Pacific Time (US & Canada) 

See you in Part 9: TechNet Webcast: SQL Server 2005 for the IT Professional (Part 9 of 11): Implementing Database Mirroring in SQL Server 2005 (Part 2 of 2) (Level 200) on May 5th.
kt

In doing my final preparations for part 7 of my TechNet webcast series on Building Robust, Reliable and Recoverable Systems, I decided to (once again) review my abstract. I do this as a last step to make sure I cover everything I said I would cover. Here's the abstract:

TechNet Webcast: SQL Server 2005 for the IT Professional (Part 7 of 11): Technologies and Features to Improve Availability

Find the right technology for the job in this seventh webcast of the SQL Server 2005 for the IT Professional series. Join us to learn which technologies provide the right solution for a specific problem, as well as the pros and cons of each technology. Designing a system to protect you against the faults most likely to occur is the first and most important strategy, but finding the right combination to minimize both downtime and data loss is critical. This webcast covers many of the “AlwaysOn” technologies at a glance: remote mirroring, failover clustering, database mirroring, log shipping, [peer to peer] replication, RAID, partial database availability, piecemeal online restore, database snapshots, snapshot isolation, and online index operations.

Start Time: Friday, April 21, 2006 9:30 AM (GMT-08:00) Pacific Time (US & Canada) 
End Time: Friday, April 21, 2006 11:00 AM (GMT-08:00) Pacific Time (US & Canada)  

So, in re-reading this it certainly sounds like a lot to cover. But - rest assured, this session is what we're going to use to lead into the rest of the series. Parts 8-11 go into more detail on some of the new and more complex topics covered in that list. For example, parts 8 and 9 cover Database Mirroring and part 10 covers Partial Database Availability, Online Piecemeal Restore and Database Snapshots. Also, for a few topics, I'll point you to some great resources to keep you going in learning these other technologies. In the end, my goal for Friday is make sure you understand the best use case for each of these technologies. Once you know when it's best to use them, you can really begin to architect the *right* solution for your system! Parts 8-11 will focus more on implementation and demos!

If you're wondering what your options are and how to get better direction on the architecture to implement, join us on Friday: http://msevents.microsoft.com/cui/WebCastEventDetails.aspx?EventID=1032290562&EventCategory=4&culture=en-US&CountryCode=US

Oh, and in the actual abstract, there's a typo...not sure if we'll have time to cover log hipping. ;) ;)

Talk to you on Friday!
kt

Well, Friday brought another flood of great questions from everyone as we moved our way through many of the new 2005 tools. The one thing that I really wanted to stress was that *many* SQL Server 2005 tools (SQLCMD, SSMS and SQL Profiler) offer important features that can be leveraged today, even if your primary production servers are still SQL Server 2000. I did move through the tools quickly and showed quite a few new features; there are a lot of excellent resources to help you dive in deeper now that you're interested, ready and know some of the rewards of starting now. Here are a few of those resources:

For deleting old database backup history, there are a couple of stored procedures in msdb that can be used:

  • sp_delete_backup_and_restore_history
  • sp_delete_backuphistory
  • sp_delete_database_backuphistory

For cycling errorlogs, use: sp_cycle_errorlog.

And - lots of other questions that I primarily answered online in the last 40+ minutes. We had a great group and I hope everyone had fun. For the second half+ of the series we're going to focus on architectures and solutions - mostly related to disaster recovery and avoidance. However, the next part of the series is going to branch into a new (and *very* interesting) area of SQL Server 2005 - Service Broker. There are many impacts of Service Broker on the SQL Server system AND you might find a few applications of the technology within your own application as well. Have a great time with Bob for part 6 and I'll be back for part 7 next Friday.

See you soon!
kt

And another one bites the dust! Wow - what a great group today... soooooo many questions! For those of you that weren't there - the lecture was 80 minutes and the additional Q&A went on for another 45 minutes. So - as a result, there were *a lot* of additional resources needed. Let me get started with all of those right away.

To prepare for moving to SQL Server 2005 there are a few EXCELLENT resources with which you should start:

Phase 1 - Prepare to Upgrade/Migrate

Phase 2 - Database-level Testing

  • Copy Database Wizard or
  • Backup/Restore or
  • Detach/Attach

Phase 3 - Server-level Testing

  • Consider upgrade in-place or
  • Make sure that you manually migrate all EXTERNAL objects, logins, jobs, error messages, etc.

Phase 4 - Testing/Updating after the upgrade/migration

  • Update statistics immediately
  • Test application code, database compatibility modes, session settings
  • Check for "broken code" in terms of system table changes
  • MOST of this should have already been done and assessed in Phase 1 but better to be safe!

And - finally - the other things we talked about and the rest of the links are here:

And - that's it for this week. See you next Friday when we chat about the new Management Tools and how to effectively use them!

Thanks for listening/watching and asking GREAT questions,
kt

OK - so Bob Beachemin delivered Part 2 and I was back for Part 3. We had lots of folks on board with this session (more than 400) and as a result, I had a lot of questions. More than anything it seems like a lot of you wanted to know which versions of which came with what and could go with what (in terms of OS)... so, even getting started - and probably installing at home to play around ;). I was expecting tons of questions on the technical tidbits of installation options so you sure kept me on my toes!! Here are probably two very useful MSDN Links to SQL Server BOL Topics:

Hardware and Software Requirements for Installing SQL Server 2005 - In fact, this has a GREAT matrix of all the different platforms and which versions can be installed where!

Features Supported by the Editions of SQL Server 2005 - The BOL topic is very detailed.

And - in addition to those, there were quite a few more topics discussed during the webcast. The rest of the blog entry focuses on those questions! I hope this helps... enjoy!

Resource Links for all On-demand TechNet Sessions in our series for the ITPro

Part 1 - A Fast-Paced Feature Overview and Series Introduction
   On-demand link
   My blog entry for the session

Part 2 - Security
   On-demand link
   Bob's Blog Entry for the session

Part 3 - Understanding Installation Options and Initial Configuration (Level 200)
   On-demand link
   Blog entry link (well, you're already here ;)

Session 3 Resource Links as discussed during the session:

Submitting Product Feedback

  • MSDN Product Feedback Center: http://lab.msdn.microsoft.com/productfeedback/
  • Tips for Submitting Feedback on the Feedback Center (tips were related to Visual Studio but there are some great general tips about how to file useful feedback!): http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=138235&SiteID=1
  • Add the Help Toolbar and connect to Product Feedback from within the SQL Server 2005 Tools. In SQL Server Management Studio, select View, Toolbars, add Help. Once the Help toolbar is visable, select the Send Feeback button which should be in the "Ask a Question" section at the end of the toolbar.

If you're thinking about downloading - check out the newly bundled SP1 downloads:

SQL Server 2005 RTM Enterprise Eval Edition
SQL Server 2005 RTM Express Edition

SQL Server Express Edition with SP1 (SQLEXPR.EXE)
SQL Server 2005 Express Edition with SP1

SQL Server Express Edition with Advanced Services (SQLEXPR_ADV.EXE)
SQL Server 2005 Express Edition with SP1 + Advanced Services includes SQL Server Management Studio Express (SSMSE), support for full-text catalogs, and support for viewing reports via report server.

SQL Server Express Edition Toolkit (SQLEXPR_TOOLKIT.EXE)
SQL Server 2005 Express Edition Toolkit (SQL Server Express Toolkit) provides tools and resources to manage SQL Server Express and SQL Server Express Edition with Advanced Services. It also allows creating reports by using SQL Server 2005 Reporting Services (SSRS).

SQL Server Management Studio Express (SQLServer2005_SSMSEE.msi)
SQL Server Management Studio Express (SSMSE) provides a graphical management tool for managing SQL Server 2005 Express Edition and SQL Server 2005 Express Edition with Advanced Services instances. SSMSE can also manage relational engine instances created by any edition of SQL Server 2005. SSMSE cannot manage Analysis Services, Integration Services, SQL Server 2005 Mobile Edition, Notification Services, Reporting Services, or SQL Server Agent.

And - that's it for this week. See you on Friday, March 31 when we'll chat more about Upgrade and Migration. Here's the link to register for this upcoming session: http://msevents.microsoft.com/cui/WebCastEventDetails.aspx?EventID=1032290477&EventCategory=4&culture=en-US&CountryCode=US

See you soon,
kt

Hey there everyone - Well there was lots of excitement around our first session...so much so that apparently a Live Meeting server went down and caused MANY of you to get booted-out or even blocked-from attending (figures, right!).... Ugh (talk about the irony here - a series on high availability that isn't available because a server crashes...hhmmm, I think I know where to go for my next potential customers ;) ;). Regardless, I'm glad that at least a couple hundred of you did get in. For the more than 1000 others that were registered but unable to get in - I truly want to apologize!

The good news is that we now have the on-demand link available and for all of you who registered, it should have been sent to you via email. Also, as promised, I've attached the resources and demo scripts we talked about today.

Partial Database Availability Demo Scripts: PartialDBAvail-DemoScripts.zip (4.19 KB)
Database Mirroring Demo Scripts: DatabaseMirroring-DemoScripts.zip (3.74 KB)
Replication
Demo Scripts - Since this demo was completed through the UI, here are some useful references on Replication:

Other Resources:

SQLCMD Resources: My blog entry after Michiel Worries' Webcast (includes links to webcast, etc.)
TechNet Resource Center: SQL Server 2005 Mission Critical High Availability
Demo: Windows Server System Reference Architecture Design Considerations for SQL Server 2005 High Availability
Whitepaper: Choosing a Database for High Availability: An Analysis of SQL Server and Oracle

Also, to get you ready for SQL Server 2005 - check out the Upgrade/Migration Resource Center: Upgrading to SQL Server 2005

And... that should keep you busy between now and next week!

Have fun,
kt

Hey there everyone - Sorry for the delay in blogging. Lots of great stuff to chat about but right now I'm in the throws of a lot of event planning! I hope that some of you will be able to attend one or more of these GREAT upcoming events:

Webcasts

A TechNet, 11-part Series starts on March 10. Read more about it here.

Workshops

  • Tuesday, 14 March in Reading, UK - SQL Server: Indexes from Every Angle. Read more about it here.
  • Thursday, 16 March in Edinbrugh, Scotland - SQL Server 2005: Pratical Guide to Recovery and Availability. Read more about it here.
  • Thursday, 6 April in Orlando, FL - SQL Server 2005 Availability Strategies: Building a Reliable VLDB in Depth. Read more about the SQL Connections conference here.
  • Sunday, 11 June in Boston, MA - Making the Most of SQL Server 2005: Developing World Class Database Applications. Presenting with Brian Randell. Read more about the Microsoft TechEd conference here.

And - we're (SQLskills) planning other events too. The webcast series is a great place to start and hopefully, I'll see you at one of the other events!

Categories:
Events | SQL Server 2005

First - for what is logging needed?

This seems like an easy question - with possibly an easy answer... it's to aid in transaction durability and help in recovery - when the system loses power. Simply put, the transaction log is a way for SQL Server to ensure that a transaction "survives" a power failure. While a transaction is processing, information about that transaction exists within memory. Once that transaction is complete, log rows are written to the log portion of the database on disk. In the event of a power failure - and when SQL Server restarts, SQL Server performs restart recovery (two phases - REDO and then UNDO). Restart recovery happens everytime SQL Server starts; this ensures that completed transactions are persisted into the data portion and that no incomplete transactions end up within the database. For this discussion the specifics about log rows are not important - just that they are enough to "redo" the operations from *just* log information...in the event of a power failure. The information that's needed to recover the log in the event of a power failure - is really just - what's on disk. The data portion is probably out of date (how much so?) and the information kept in the log is used to bring the data up to date. A good question at this point is - how out of date is my data? The answer depends on a background process that runs almost solely to minimize this restart recovery process; it is called CHECKPOINT. A checkpoint occurs to make the data and log more current (but not neccesarily transactionally consistent). What this means is that periodically what is in memory is "synchronized" to disk. Since users do NOT directly read from disk, the data portion of the database (on disk) does not need to be up to the minute. Users accessing data ONLY read from cache - which is current - so only the data in memory needs to be accurate. It is VERY possible that at any given time that not only is your disk out of date but it's not even transactionally consistent. This is NOT a problem. If memory were to be lost (i.e. a power failure) then SQL Server would perform recovery on restart. Restart Recovery runs everytime SQL Server starts. In fact, if you think ONLY about restart recovery needing to bring a database "forward" after a power failure then you could argue that SQL Server would not need information to stay within the log after it's been "synchronized" with the data portion of the database - as long as the transaction(s) had been completed. And - YES - that's true. You can choose to clear the information from the log by changing your recovery model. Where you might have a problem is when you have a more significant failure - such as the loss of a hard drive (and even more interesting - which hard drive: a data drive or a log drive).

Key points:

  • The Log is a "write-ahead" log
  • The data on disk is NOT guaranteed to be accurate without the transaction information in the log
  • The Transaction Log (on disk) ensures transaction durability
  • Restart Recovery happens everytime SQL Server starts

OK - so that's it for now... In the next blog entry, I'll tackle "what affects logging."

Thanks for reading,
kt

Categories:
Opinions | SQL Server 2005

If you're interested in scale-out improvements for reporting and read-only scenarios...check this out: KB 910378. This KB is actually a feature release KB and describes a new feature of SQL Server 2005 which allows multiple servers to simultaneously share the same database files on a SAN. This is NOT possible for read/write databases, only read-only databases; however, it does allow you to leverage multiple server's hardware to perform complex reporting locally - using that node's memory, tempdb, etc.

If you implement this - report back (no pun intended) as I'd love to hear your good/bad experiences!

Have fun,
kt

Categories:
SQL Server 2005

Many of you have probably already downloaded the refreshed Books Online but if not - you should! LOTS AND LOTS of updates/good stuff in there.

Check it out: http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx

That should keep you busy for a bit! ;-)

Happy New Year!
kt

Categories:
SQL Server 2005

If you're reading *my* blog then you're probably just as interested in the upcoming SQL Server 2005 launch as I am. There are many launch events scheduled around the world and I hope you'll find time to attend one - even if remotely. IT's ShowTime - from TechNet - will be broadcasting Steve Ballmer's Launch Keynote on Monday - LIVE - at 5pm GMT. Get all of the details about the launch broadcast by clicking the picture above.

To complement the LIVE broadcast, IT's Showtime has also dedicated a whole SQL Server section of presentations here: http://www.microsoft.com/emea/itsshowtime/sqlserver.aspx

Enjoy!

Categories:
SQL Server 2005

Hey there everyone! I know I still owe you a few Q&A entries (for sessions 7, 8 and 9) but I wanted to get this blog entry out there so that you can play a bit with some of the resources. This series was targeted at developers but really helps to "bridge the gap" between development and administration by always remembering the impacts of what you implement (and techniques to help you to see if you do). More specifically, everything you do and/or design, has the potential for a negative impact to something else - there's no free lunch, eh?

So, this series focused more on Scalability but always remembered the impact to availability and/or reliability. This last session brought together the three primary areas to remember while developing a scalable and reliable architecture:

  1. Know your data
    • Design for Performance - Sessions 1, 2, 3, 6, and 8
  2. Know your users
    • Indexing for Performance - Sessions 4, 5 and 9
    • Optimizing Procedural Code - Session 7
    • Controlling Mixed Workloads and Concurrency - Session 6
  3. Users lie
    • Profile - to make sure that you're tuning what's really happening as opposed to what you think was going to happen! - Session 9

This last session had some great questions and as a result, a few new resources were used. Here are a few of the things we talked about:

Event Notifications and DDL Triggers

DMVs

Webcast links for the entire series!

Part 1: Creating a Recoverable Database
For the MSDN Download for Part 1, click here.
For the SQLskills Blog Entry for Part 1, click here.

Part 2: Creating a Reliable and Automated Backup Strategy
For the MSDN Download for Part 2, click here.
For the SQLskills Blog Entry for Part 2, click here.

Part 3: Designing Tables that Scale, Best Practices in Data Types and Initial Table Structures
For the MSDN Download for Part 3, click here.
For the SQLskills Blog Entry for Part 3, click here.

Part 4: SQL Server Indexing Best Practices
For the MSDN Download for Part 4, click here.
For the SQLskills Blog Entries for Part 4
Resource links blog entry, click here.
Q&A blog entry, click
here.

Part 5: SQL Server Index Defrag Best Practices
For the MSDN Download for Part 5, click here.
For the SQLskills Blog entry, click here.

Part 6: Mixed Workloads, Secondary Databases, Locking and Isolation
For the MSDN Download for Part 6, click here.
For the SQLskills Blog Entry for Part 6, click here.

Part 7: Understanding Plan Caching and Optimizing Procedure Performance
For the MSDN Download for Part 7, click here.

Part 8: Data Loading and Aging Strategies - Table and Index Partitioning
For the MSDN Download for Part 8, click here.

Part 9: Profiling for Better Performance
For the MSDN Download for Part 9, click here.

Part 10: Session Summary - Common Roadblocks to Scalability
For the MSDN Download for Part 10, click here.
Transcript can be found here.

So, the series comes to an end (even though I still have more work to do). I have to say that it was a lot of fun and I enjoyed everyone's questions. And /start shameless plug here/ starting in March, SQLskills will begin a 10-12 part series on TechNet. The series will include sessions from my colleague Bob Beauchemin as well as me. This will definitely be more Operations and DBA focused but for all of you developers - it may help you better understand the system, High Availability and a myriad of New Features in SQL Server 2005.

I hope to see you there - or at least your DBA... ;-)

Thanks again everyone,

Kimberly

Effectively Designing a Scalable and Reliable Database

A Primer to Proper SQL Server Development

SQL Server Mixed Workloads, Secondary Databases, Locking and Isolation, Part 6 of 10

Presented by Kimberly L. Tripp, SQLskills.com

Q: Can I view a recording of this webcast?Part 6 can be replayed by clicking here.

Q: Where can we get the demo scripts AND the sample database: Credit? The demo scripts are in this zip (20050916MSDNDemoScripts.zip (6.11 KB)); here in this blog entry. However, at the series completion, I will also create an entry under Past Event Resources for the entire webcast series.  To download the ZIP of the Credit Database Backup click here. Once unzipped, restore this backup to SQL Server 2000 or SQL Server 2005. The backup is a SQL Server 2000 backup and can be restored to either version! If restoring to SQL Server 2005, you might want to change the destination for the data and log file as the path will probably be different.

Q: Where are the links to all prior Webcast Q&As from this series?

Part 1: Creating a Recoverable Database
For the MSDN Download for Part 1, click here.
For the SQLskills Blog Entry for Part 1, click here.

Part 2: Creating a Reliable and Automated Backup Strategy
For the MSDN Download for Part 2, click here.
For the SQLskills Blog Entry for Part 2, click here.

Part 3: Designing Tables that Scale, Best Practices in Data Types and Initial Table Structures
For the MSDN Download for Part 3, click here.
For the SQLskills Blog Entry for Part 3, click here.

Part 4: SQL Server Indexing Best Practices
For the MSDN Download for Part 4, click here.
For the SQLskills Blog Entries for Part 4
Resource links blog entry, click here.
Q&A blog entry, click here.

Part 5: SQL Server Index Defrag Best Practices
For the MSDN Download for Part 5, click here.
For the SQLskills Blog entry, click here.

Q: How can I replay previous sessions? I thought we were going to get emails for replaying -- but I haven't received any replay emails. You will receive replay emails ONLY when you register for these sessions through MSDN. We’ve come to find out that there are other ways to register but it’s only through MSDN that we know for sure you will receive the replay information.

Technical Questions

Q: I know you have covered indexes and backups in other webcasts, but here is my question I use heavily temporary tables. My TempDB grows up to 5 GIG. Should I backup or truncate the transaction log in order to bring it back to its normal size? No, there is no need to specifically maintain the transaction log of the TempDB database. If the transaction log (and subsequently, the database) grows large – there could be multiple reasons for that and instead of thinking in terms of trying to manage the log, I’d look at long running transactions and/or large transaction. You can use Profiler to help you see long running and/or large transactions.

Q: Can I perform a database snapshot to another server? No, database snapshots must be created on the same server as the database on which the snapshot is being based.

Q: Can I snapshot by filegroup? No, however if what you want to do is create a snapshot which does NOT include certain files – you can take those filegroups offline and then create the snapshot. In the snapshot the only file/filegroups available will be those which were online when the snapshot was created…even if those files/filegroups are brought online after the snapshot was created.

Q: Could a reader be blocked on the snapshot DB while SQL updates the changed page? No. The copy on write mechanism is really a copy before write mechanism and the pages will be copied before the write and essentially before the locks, etc. The only possible “blocking” could be caused by the excess I/Os that need to be performed. However, the I/Os are performed only on the FIRST change to the page after the snapshot is created – so it’s minimal!!

Q: Are DMVs in SQL Server 2005 only?Yes, DMVs = Dynamic Management Views and these are a feature of SQL Server 2005.

Q: Is read uncommitted the lowest/least in terms of data consistency? Yes, read uncommitted is also known as “dirty read.” A dirty read is a read against an “in-flight” transaction; this transaction could be rolled back. As a result, the query that read that data would be inaccurate.

Q: What is the effect of versioning on fragmentation and performance (I'm assuming I create a split of the page is full)? Actually, I’m not sure I’m following this one… But – I think I can answer it by just giving you some insight into how things work. Versioning – in terms of the data overhead added to the data row – does add a 14 byte value to help store the offset. This overhead is added ONLY once, to each row, after one of the snapshot isolation options is turned on (either or both – the READ_COMMITTED_SNAPSHOT or ALLOW_SNAPSHOT_ISOLATION). When this 14 byte value is added to each of the rows, the additional 14-bytes might cause the page to split. Again, this is only a one-time addition. The trick to optimizing this structural change is to change the database option and then rebuild your indexes. This will make the data contiguous and versioning will have no additional affects on the data row. Now, if what you were thinking is that the versions were stored in the data row – then this is NOT the case. The version store comes from the TempDB and as a result, there is no additional overhead (over the 14-bytes) needed within the data row.

Q: What if we're not using transactions? Will repeatable read still lock the table during the read (particularly if the select is long)? Repeatable reads locks – and holds – the resources as they are read. So, YES, in the case of a select statement, you will acquire and hold the read locks for the life of the transaction.

Q: How much additional overhead does versioning require from the SQL Server engine? Most of the overhead comes from TempDB but there’s also a bit of overhead in looking up the version. There are no direct numbers associated with the overhead but in a lot of cases you should think in terms of roughly 10% additional costs for your transaction… So, I guess the best point is that you will have slower overall performance when implementing row version; however, you might solve a lot of your blocking problems. Slower but not blocked is better than not running at all – even when it does run at all quickly. J  In all seriousness though, if blocking is NOT your primary problem, you will add overhead without a possible benefit.

Q: So, is it the new transaction data or the old transaction data held in the snapshot store (seems like it might be different for statement vs. transaction level snapshotting)? It’s always the BEFORE image. The general process of the write is called “copy on write” but I think of it better as copy before write.

Q: Can we optimize the snapshot store (different physical device, file group(s), etc.)? No. However, you should look at optimizing TempDB. There are multiple things that you might want to consider. I discuss those thing in this blog entry here.

Q: If I don’t need locking why shouldn’t I use read-uncommitted? Hmm, you can… you just need to be aware of the fact that the data is “dirty” and is not guaranteed to persist.

Q: Where does SQL store all the row versions (with snapshot isolation turned on)? The version store is in TempDB.

Q: How do I view all of the Report options from the summary page? I am looking at Adventureworks (compatibility level = 90), but all that I see is the General report. Ah, ha! The new summary windows were added to after the beta II April CTP. So, what this tells me is that you’re running a build lower than 9.00.1187.07. At this point, I’d go for the September CTP which is build 9.001314.06.

Q: This question is from previous webcast... Is there anything new with SQL 2005 that does datetime support data types? Time datatype or Date datatype only? No, SQL Server 2005 only includes the datetime datatype for date/time data. However, by using “custom types” you can create your own types which are date only or time only (just for one example). There were separate SQLCLR types of date only and time only in SQL Server 2005; however these were non-native types and subsequently removed. Instead, they will be shipped as examples in a resource kit which ships after RTM.

Q: Kimberly, the downloadable zip file from your blog for at least the first session will only unzip to a "C:" drive (which my system doesn't have ;^) Could you please re-zip it to allow election of the drive to which it should unzip? This one still perplexes me. I didn’t set any options that would restrict this…  

For the next session, we’re going to cover how SQL Server keeps plans, where you can look to see what's in cache AND how you can know better if the stored procedure's plan should be kept...or not? If you’re interested in hearing more - here’s the registration link:

MSDN Webcast: A Primer to Proper SQL Server Development (Part 7 of 10): Understanding Plan Caching and Optimizing Procedure Performance

 See you on Friday!

kt

Brian A. Randell's Blog: http://www.mcwtech.com/CS/blogs/brianr/default.aspx
MCWTechnologies Website: http://www.mcwtech.com/

Kimberly L. Tripp's Blog: http://www.SQLskills.com/blogs/Kimberly
SQLskills Website: http://www.SQLskills.com

Presentation Resources
Presentation in PDF form
Kimberly's Demo Scripts
Brian's Demo Scripts and Code

Running SQL Server 2000 tools and SQL Server 2005 tools side-by-side
We talked about re-registering all of your COM components and I didn't have a slide for this. So, if SQL Server 2000 Enterprise Manager crashes when you try to access database properties OR SQL Server 2000 Query Analyzer doesn't seem to do the color coding correctly, then you need to re-register your COM componenets in BOTH of the following directories:
   c:\program files\microsoft sql server\80\tools\binn 
   c:\program files\microsoft sql server\mssql\binn

To re-register the components, execute: FOR %i IN (*.dll) DO regsvr32 /s %i

Resources and Presentations on Indexing Best Practices
First, start by reviewing the blog entries listed in the Indexes category here.
As for the Webcasts - there are 6 from which to choose! Each webcast has an associated Q&A posted to my blog - make sure to look for the Q&As. Usually they are posted within 1 week (give or take :) from the actual webcast.
   MSDN Webcast: Indexing for Performance - Finding the Right Balance (SQL Server 2000), recorded 11 June 2004
   MSDN Webcast: Indexing for Performance - Index Maintenance Best Practices (SQL Server 2000), recorded 19 July 2004
   TechNet It’s Sh0wtime Webcast: Index Creation Best Practices with SQL Server 2005, recorded at Tech Ed Amsterdam, July 2005
   TechNet It’s Sh0wtime Webcast: Index Defragmentation Best Practices with SQL Server 2005, recorded at Tech Ed Amsterdam, July 2005
   MSDN Webcast Series: Part 4 of 10, Best Practices in Indexing, recorded 26 August 2005
   MSDN Webcast Series: Part 5 of 10, New Features in Indexing and Index Maintenance Best Practices, recorded 2 September 2005

MSDN Webcast Series: Building Highly Reliable and Available Systems with SQL Server 2005
Watch one on-demand and/or sign up to attend one of the remaining!

Part 1: Creating a Recoverable Database

      For the MSDN Download for Part 1, click here.

      For the SQLskills Blog Entry for Part 1, click here

 

Part 2: Creating a Reliable and Automated Backup Strategy

      For the MSDN Download for Part 2, click here.

      For the SQLskills Blog Entry for Part 2, click here.

 

Part 3: Designing Tables that Scale, Best Practices in Data Types and Initial Table Structures

      For the MSDN Download for Part 3, click here.

      For the SQLskills Blog Entry for Part 3, click here.

 

Part 4: SQL Server Indexing Best Practices

      For the MSDN Download for Part 4, click here.

      For the SQLskills Blog Entries for Part 4

         Resource links blog entry, click here.

         Q&A blog entry, click here.

         Part 5: SQL Server Index Defrag Best Practices
         
      For the MSDN Download for Part 5, click here.
               
For the SQLskills Blog entry, click here.

         Part 6: SQL Server Mixed Workloads, Secondary Databases, Locking and Isolation
         
      For the MSDN Download for Part 6, click here.

         Part 7: Understanding Plan Caching and Optimizing Procedure Performance 
         
      To register to attend, click here.

         Part 8: Data Loading and Aging Strategies 
         
      For the MSDN Download for Part 6, click here.

         Part 9: Profiling for Better Performance 
         
      For the MSDN Download for Part 6, click here.

         Part 10: Most Common Roadblocks to Scalability and Reliability 
         
      For the MSDN Download for Part 6, click here.

Profiling SQL Server and Creating a Server-side Trace
INF: How to Create a SQL Server 2000 Trace (283790)
HOW TO: Programmatically Load Trace Files into Tables (270599)
How To: Stop a Server-Side Trace in SQL Server 2000 (822853)
INF: How to Monitor SQL Server 2000 Traces (283786)
INF: Stored Procedure to Create a SQL Server 2000 Blackbox Trace (281671)
BUG: BOL Incorrectly States That Users Do Not Need to Be Sysadmin to Use Profiler or SQL Profiler SPs (310175) 
   NOTE: This is ONLY a SQL Server 2000 limitation.
INF: Job to Monitor SQL Server 2000 Performance and Activity (283696)
Support WebCast: SQL Server 2000 Profiler: What's New and How to Effectively Use It

Great KB to Start with for Troubleshooting
HOW TO: Troubleshoot Application Performance with SQL Server

What about Whitepapers - we referenced quite a few!
Get a Lean, Mean Dev Machine with the Express Editions of Visual Basic and SQL Server 2005 by Brian A. Randell 
SQL Server 2005 Beta 2 Transact-SQL Enhancements by Itzik Ben-Gan 
SQL Server 2005 Partitioned Tables and Indexes by Kimberly L. Tripp
SQL Server 2005 Snapshot Isolation by Kimberly L. Tripp
SQL Server 2005: the Database Administrator’s Guide to the SQL Server Database Engine .NET Common Language Runtime Environment by Kimberly L. Tripp

Other Whitepapers, Websites, and Webcasts
Blog Entry: 8 Steps to Better Transaction Log Throughput
MSDN Whitepaper: An Overview of SQL Server 2005 for the Database Developer
MSDN Whitepaper: Processing XML Showplans Using SQLCLR in SQL Server 2005
MSDN Whitepaper: Using CLR Integration in SQL Server 2005
MSDN Whitepaper: XML Support in Microsoft SQL Server 2005
MSDN Whitepaper: XML Options in Microsoft SQL Server 2005
MSDN Whitepaper: What's New in FOR XML in Microsoft SQL Server 2005
MSDN Whitepaper: XML Best Practices for Microsoft SQL Server 2005
MSDN Whitepaper: Usage Scenarios for SQL Server 2005 Native Web Services
MSDN Whitepaper: Managed Data Access Inside SQL Server with ADO.NET and SQLCLR
MSDN On-demand Webcasts 
MSDN Live Webcasts 
SQL Server 2005 Hands-On Labs
         
SQLCLR Hands-On Lab Manual
Microsoft SQL Server TechCenter on TechNet
Sample Book Chapters for SQL Server 2005 is a list of chapters posted from a variety of authors for books related to SQL Server 2005.
Hosting the .NET Runtime in Microsoft SQL Server on the Association for Computing Machinery (www.ACM.org). To access this article you need membership in SIGMOD, the ACM, or you can purchase just this article for download.
Service Oriented Database Architecture by David Campbell, also on the Association for Computing Machinery (www.ACM.org). To access this article you need membership in SIGMOD, the ACM, or you can purchase just this article for download.

Gert E.R. Drapers' website
Microsoft SQL Server Development Customer Advisory Team
PDC Information Site

Well, if that doesn't keep you busy, I don't know what will!

Enjoy!
Kimberly

Categories:
Events | Resources | SQL Server 2005

This is a much needed and much overdue blog entry... In 8 Steps to Better Transaction Log throughput, I mentioned a customer that was helped by TWO typical optimization problems I see. In that blog entry, I said I would write two blog entries - that one on transaction log optimization and another on common tempdb optimizations. Well, I forgot...until I was reminded with an email this morning (thanks Marcus!).

First - a bit of understanding of TempDB - what goes there?

  • Internal temporary objects needed by SQL Server in the midst of other complex operations. For example, worktables created by a hash aggregate will be stored in TempDB or interim tables uses in hash joins (almost anything that shows as "hash" something in your query plan output is likely to go to tempdb).
  • User objects created with either # (for local temporary objects), ## (globabl temporary objects) or @ (table variables)
    • # = Local temporary object
      Local temp objects are objects accessible ONLY in the session that created it. These objects are also removed automatically when the session that created it ends (unless manually dropped).
    • ## = Globabl temporary object
      Global temporary objects are objects that are accessible to ANYONE who can login to your SQL Server. They will only persist as long as the user that created it lasts (unless manually dropped) but anyone who logs in during that time can directly query, modify or drop these temporary objects. These objects are also removed automatically when the session that created it ends (unless manually dropped) OR if being used by another session when the session that created it ends, when the session using it finishes using it (and it's only as long as any locks are held). If other sessions need more permanent use of a temporary object you should consider creating a permanent objects and dropping it manually.
    • @ = User-defined Table Variable
      User-defined Table Variables were introduced in SQL Server 2000 (or, wow - was it 7.0?) and provide an alternative to temporary tables by allowing you to create a variable defined as type TABLE and then you can populate and use it in a variety of ways. There has been A LOT of debate over whether or not you should always use table variables or always use temp tables. My response is that I ALWAYS avoid the word always! My point is that table variables are NOT always better nor are temp tables always better. There are key uses to each. I tend to like temp tables in scenarios where the object is used over a longer period of time - I can create non-key indexes on it and it's more flexible to create to begin with (SELECT INTO can be used to create the temp table). I also have the ability to use the temporary table in nested subprocedures because it's not local to the procedure in which it was created. However, if you don't need any of those things then a table variable might be better. When it is likely to be better - when you have smaller objects that don't need to be accessed outside of the procedure in which it was created and when you only need KEY indexes (a table variable ONLY supports the indexes created by a create table statement - meaning PRIMARY KEY and UNIQUE KEY).
  • Objects created by client applications - this is possibly a large part of your problem... Profiling can help you to determine if there's a lot of TempDB usage from your client applications.

OK, so now that you know what goes there - how do you make it optimal?

First and foremost, TempDB is in cache just as any other database is in cache. TempDB does not spill to disk unless you are low on cache and/or if you have a lot of inflight transactions in TempDB. Although TempDB is not persisted from shutdown to restart - it still needs to do some logging and therefore you should consider its optimization a lot like other databases.

Things you should do for TempDB (that are a lot like what you should do for every database):

  1. Isolate the data and log portion of TempDB.
  2. Place them on clean, defragmented disks.
  3. Pre-allocate them so they don't need to do a lot of autogrowth.
  4. Make sure you have sufficient memory to support active objects (check for disk activity to the disks that contain TempDB files).
  5. Make sure that transactions are written efficiently so that there are no unusually long running transactions that are unnecessarily holding resources (and therefore locks and therefore log activity).

And - if you need to move TempDB, you should review this KB Article: Moving SQL Server databases to a new location with Detach/Attach

Things you should do SPECIFICALLY for TempDB (especially if you're running on a multiproc machine):

Before I say what... let me tell you why? TempDB has a large number of objects being created all the time. For an object to be created, space must be allocated to it. Space allocation is determined by looking at some of the internal system pages (the GAM, and SGAM). In the end, it is these pages that start to have significant contention (with just one file) in a VERY active TempDB. To minimize that contention you can create multiple files.

  1. Consider creating multiple files for TempDB (even if on the same physical disks) so that there is less of a bottleneck when objects are being allocated. Make sure to read associated KB.
  2. Consider setting a trace flag to have object allocation grab extents rather than pages. Make sure to read associated KB.

BOTH of these last two are described in detail by a KB article: FIX: Concurrency enhancements for the tempdb database.

OK - so that should really help! Moving forward (meaning SQL Server 2005), having multiple files can still help for TempDB.

Effectively Designing a Scalable and Reliable Database

A Primer to Proper SQL Server Development

New Features in Indexing and Index Maintenance Best Practices, Part 5 of 10

Presented by Kimberly L. Tripp, SQLskills.com

Q: Can I view a recording of this webcast? Part 5 can be replayed by clicking here.

Q: Where can we get the demo scripts AND the sample database: Credit? The demo scripts are in this zip (20050902MSDNDemoScripts.zip (8.52 KB)); here in this blog entry. However, at the series completion, I will also create an entry under Past Event Resources for the entire webcast series.  To download the ZIP of the Credit Database Backup click here. Once unzipped, restore this backup to SQL Server 2000 or SQL Server 2005. The backup is a SQL Server 2000 backup and can be restored to either version! If restoring to SQL Server 2005, you might want to change the destination for the data and log file as the path will probably be different.

Q: Where are the links to all prior Webcast Q&As from this series?

Part 1: Creating a Recoverable Database
For the MSDN Download for Part 1, click here.
For the SQLskills Blog Entry for Part 1, click here.

Part 2: Creating a Reliable and Automated Backup Strategy
For the MSDN Download for Part 2, click here.
For the SQLskills Blog Entry for Part 2, click here.

Part 3: Designing Tables that Scale, Best Practices in Data Types and Initial Table Structures
For the MSDN Download for Part 3, click here.
For the SQLskills Blog Entry for Part 3, click
here.

Part 4: SQL Server Indexing Best Practices
For the MSDN Download for Part 4, click here.
For the SQLskills Blog Entries for Part 4
Resource links blog entry, click here.
Q&A blog entry, click here.

Q: How can I replay previous sessions? I thought we were going to get emails for replaying -- but I haven't received any replay emails. You will receive replay emails ONLY when you register for these sessions through MSDN. We’ve come to find out that there are other ways to register but it’s only through MSDN that we know for sure you will receive the replay information. Regardless, you can always find the “on-demand” version of the sessions here.

Related Resources

MSDN Webcast: Indexing for Performance – Proper Index Maintenance MSDN Whitepaper: Microsoft SQL Server 2000 Index Defragmentation Best Practices TechNet It’s ShOwtime Webcast: Index Defragmentation with SQL Server 2005 

Technical Questions

Q: In your script, what is "HA Requirements"? HA = High Availability. This is the requirement that your table stay online and available. Some companies are trying to achieve 99.999% uptime, this is especially challenging when even maintenance operations take a table offline.

Q: If you create extra indexes is there a easy to configure utility that you can run across an application after it has run for a few months to list keys that were never or infrequently used? Use one of the new DMVs: sys.dm_db_index_usage_stats. To see the complete list of DMV objects, use the following query:SELECT * FROM sys.system_objects WHERE [name] LIKE 'dm[_]%'

Q: What is DMV again? Dynamic Management View. These are new objects which give information about in-memory objects and state information.

Q: What are the parameters and their usage – for sys.dm_db_index_physical_stats? (DatabaseID, ObjectID, IndexID, PartitionNumber, Mode)

DatabaseID = [ NULL | 'DatabaseID' ] NULL: returns information for ALL databases, if NULL is used no other options can be supplied. This returns ALL indexes for all objects in all databases. Easy but possibly slow.
DatabaseID: smallint type. Refers to the ID for a specific database. DB_ID() or DB_ID('DatabaseID') can be used. The latter allows you to run this from ANY database as long as you have access. However, 3-part naming must be used.
ObjectID = [ DEFAULT | NULL | 'ObjectID' ]DEFAULT/NULL: return ALL base data: CL, Heap, LOB for the specified database.
ObjectID: int type. Refers to the ID for a specific object. OBJECT_ID('TableName') can be used. When using OBJECT_ID, you can use 1/2/3-part naming. Be sure to use 3-part when executing outside of database.
IndexID = [ DEFAULT | NULL | 'IndexID' ]DEFAULT/NULL: All indexes
IndexID: tinyint type. Refers to the ID of a specific index.
PartitionNumber = [ DEFAULT | NULL | # ]DEFAULT/NULL/0: return ALL partitions
#: returns only the details about specific partition. When a PartitionNumber is specified then an IndexID must also be specified.
Mode = [ DEFAULT | NULL | 'SpecificMode' ]DEFAULT/NULL/LIMITED: return FAST scan and use only an IS (Intent Shared) Table-level lock. This lock blocks ONLY eXclusive TABLE-level locks and schema changes. Excellent, relatively unobtrusive way to get fragmentation details.
LIMITED: IS Lock. Same as SQL 2000 WITH FAST, only page counts and EXTERNAL fragmentation displayed. Does not detail INTERNAL fragmentation and page density.SAMPLED:  IS Lock. For tables less than 10,000 pages (~80MB), all details are produced. For tables of more than 80MB, two samples are done (1% and 2%) at every nth page. The samples are compared and if close, 2% sampling output returned. If not close, then up to 10% will be sampled.DETAILED: S Lock. Entire table analyzed for both internal and external fragmentation. Returns one row for each level of the index from the leaf level (level 0) all the way up to the root level. This can help you determine fragmentation in the non-leaf levels but at the expense of holding a shared table level lock.
Q: How often should you run DEFRAG on your SQL server box? Should this be a part a regular schedule? Taking down SQL is their any other consideration? First, the only thing that’s not available is the table being REBUILT. Defragging an index does not take that table/index offline. So, more than anything, it depends on what you’re trying to achieve. If you want achieve better availability on SQL Server 2000 then you might choose to defrag rather than rebuild – to keep your tables available.

Q: How often do you get such perfect tables in practice? A table is always completely clean and contiguous after a rebuild. To periodically fix a table, you should use consistent and automated rebuild strategies.

Q: Do you have suggestions for developers using MSDE when customer’s demands can vary? Vary from few transactions to a large customer with many transactions. The general best practices in database and table design scale from the low end all the way up to the high-end and in the end – helps your database scale!

Q: Can you touch on rules of thumb for "pad index"? If fragmentation in the leaf level is minimized through proper index maintenance and fillfactor – then fragmentation in the non-leaf levels should be low as well. You rarely need to specify padindex unless you have widely varying distribution of data and really want to leave larger gaps because of strange densities of data.

Q: Do most of these "Index Rules" apply to Indexed-Views? Yes! All indexes can become fragmented after data modifications... Your scripts should always look for fragmentation across all scripts.

Q: Can you discuss fragmentation WRT horizontal partitioning, especially range partitioning on the primary key? SQL Server 2005 offers more granular rebuild options –but not necessarily online. In many cases, you might want to design a read-only partitioned table and keep the volatile portion of the table (especially if only one partition), in its own separate table – possibly using a partition view (or an inline table valued function) over these two tables.

Q: If I'm selecting from a table with a where FirstName = ... and LastName = ... and I have 2 indexes, one on LastName and another on FirstName. Are they both used? With an AND – maybe. The optimizer will look at the Index statistics to determine if either of them selective enough to use only one index. If neither is selective alone and a better index does not exist (a better index for AND would be one that includes BOTH of the columns in the SAME index – as a composite index), then SQL Server may choose to join the indexes (index intersection).

Q: URLs on the Resources slide can't be read. Could you type then into the Q&A, please? When the session is available for download (which is what happens when MSDN put this online), then you can access the URLs there as well. Typically, I place all of the links at the beginning of the Q&A – resources section. I’ll make sure to do this consistently!

Q: How does an uniqueidentifier used as a clustered primary key effect performance? This is best answered by session 4. In short, a non-sequential GUID can cause a lot of fragmentation.

Q: What is ExtentFragmentation as reported by DBCC SHOWCONTIG and is it less important than Logical Fragmentation? Extent Fragmenation refers to how many extents are next to each other. This is a bit more important than Logical Fragmentation as logical fragmentation shows whether or not the pages are next to each other.

Q: How much danger is there in the defrag processes? What kind of backup procedures do you suggest when you defrag? More frequent transaction log backups. A defrag generates a lot of log information. However, it does so in mini transactions. As a result, transaction log backups can occur concurrently with the defrag process and even though the defrag is not complete, the transaction can still be cleared because the defrag process runs as small transactions instead of one long running transaction. This also improves concurrency because the locks are released throughout the process.

Q: Defraging a large index can cause the log file to grow quite large. Is there a way to minimize this other than frequently log backups? Yes, you’re correct – defraging a large index WILL grow the log file quite large! As for minimizing this activity in the log – no way to do that. But – you’re correct in increasing the frequency of log backups!

Q: With very large tables, how much available disk space (both transaction logs and data drive) do you need to have to rebuild? Does it take less space to defrag than to rebuild? Well, this is really a multipart question… First, log space for rebuilds is mostly dependent on the recovery model. If you are running the FULL recovery model then creating and/or rebuilding indexes will take enough log space for the entire rebuild to complete. If you are running in the BULK_LOGGED or SIMPLE recovery models then this operation will run as a bulk operation and will be minimally logged. While this will take less time and significantly less log space, you are giving up some features when switching recovery models. I would strongly suggest reviewing the second session to see if this is appropriate.Now, as for data space – a rebuild will always require at least the table size in free space and possibly as much as double (if an online rebuild is being performed). Typically, when space estimates are being done (when capacity planning the database) I always recommend taking the largest table size and multiplying it by 2 or 3 – in order to make sure you have enough space for rebuilds. There is space needed for sorting as well – this can come from the database OR from tempdb (using the SORT_IN_TEMPDB option).Defraging doesn’t move an object so it doesn’t take additional data space BUT it does require more overall log space because it runs as mini transactions instead of just one.

Q: Should we look at different fragmentation stats if there are multiple files in the same filegroup? No, you still want to review average fragmentation. However, you may have more “fragments” in a table that spans filegroups; this does not necessarily mean that your table is fragmented.

Q: Are there any good third party tools for checking fragmentation and performing maintenance? Unfortunately no revolutionary ones (that I know of and/or can recommend)...but I still have high hopes :)

Q: How do you determine the appropriate fill factor? Unfortunately, there isn't a magic number... but, you can test your guestimate by seeing how fragmented the table becomes between your regularly scheduled defragmentation routines.

Q: Does it matter if I build the clustered index before/after rebuilding the non-clustered indexes? You should always create the clustered index before creating non-clustered index but as for rebuilding - you can rebuild your indexes independently as a rebuild of the clustered does not (generally) cause a rebuild of the non-clustered. There is one exception to this rule in SQL Server 2000 – in 2000 ONLY, SQL Server will automatically rebuild the non-clustered indexes when a non-unique clustered index is rebuild. Why? Because the uniqueifiers are rebuilt.

Q: Will doing a defrag followed later by a rebuild decrease the work of the rebuild? Not really. A defrag doesn’t move the object – only a rebuild does. However, you might minimize the cost of the sort…

Q: How does cache map to table pages, i.e., does free space in table pages have a 1:1 correspondence to wasted cache? SQL Server reads the 8K page from disk into memory. The number of bytes that are wasted on disk are also wasted in memory. This is often the motivation for vertical partitioning! You might refer back to session three for more details on row/page structures!

Q: If switching a varchar cluster to a bigint and vice-versa in 2000, what would the best order of drop/create index? Actually, this is the reason that CREATE with DROP_EXISTING was created… so that you could “change” the definition of the clustered

DROP TABLE test
go
CREATE TABLE test
(
      testid      int               not null,
      col1       varchar(100)      not null
)
go
CREATE CLUSTERED INDEX testind ON test(col1)
go
CREATE CLUSTERED INDEX testind ON test(testid) WITH DROP_EXISTING
go
sp_rename 'test.testind', 'NewIndexName', 'INDEX'
go
sp_helpindex test
go
Q: What about instances of one name only? (like Madonna, Cher, etc. ;^) Well, this is a good question and this is something that you might need to plan for in design. In these cases, you might allow NULLs in the lastname column and then make sure to search both when a lookup is performed. To be honest, I probably won’t do all that much to find these special first names – unless you wanted to do searches across both columns without knowing whether or not this is a first or last name. You might do something like this in a lookup
SELECT * FROM NamesTable
WHERE LastName = @variable
      OR (FirstName = @variable AND LastName IS NULL)
Comment: Just wanted to say I appreciate the blog you have put together.

Thanks for the thanks! It's a lot of work but I think it's great as a reference!! Even for me! To be honest, I can't always remember where to find things either! J

Thanks! So – we’re half way there – 5 more to go! And, lots more questions coming I’m sure J For the next session, we’re going to cover Isolation and options in Isolation in SQL Server 2005. If you’re interested in hearing more isolation, locking/blocking – here’s the registration link:MSDN Webcast: A Primer to Proper SQL Server Development (Part 6 of 10): Mixed Workloads, Secondary Databases, Wait States, Locking and Isolation See you on Friday!

kt

Well...another custom Immersion Event ends........ four days, 12 modules, nothing but SQL Server 2005: internals, availability and tuning. This week I delivered a four-day course titled: Building Highly Scalable, Available and Reliable Systems with SQL Server 2005. The course was delivered for the SQL Server customer team - to a few of their top TAP (Technology Adoption Program) customers who are really pounding on SQL Server 2005. These customers have some of the largest implementations and/or some of the more interesting architectures - with high volume activity and/or large data warehouses. The course was quite intense, running from 9-5 each day with guest speakers starting at 5 and then evening events that started at 6:30/7 on Monday/Wednesday evening. We covered High Availaiblity Architectures, Recovery, Transaction Processing and Logging, Piecemeal Partial Database Recovery, Partial Database Availability, Partitioning, Online Index Operations, DMVs and much, much more! Everyone is in transit today...class was Monday through Thursday.

And - speaking of "evening events," Wednesday night was quite fun (except when I crashed the Kart). Despite the crash...I'll be back. It was great fun! Turned out that for the evening Ron took first, James took second and my good friend Gert took third. I'm pretty sure that Gert and I will be going back again soon as I want to creep up that scale and knock him off that stack. Despite the crash, I did have a few good lap times :). We'll be back!

Check out a picture of our group at the end of the night of Karting: ChampsKarting.JPG (243.31 KB)

So, all in all - it was a great week! Even more amazing, I think this group will stay in touch! We all learned a lot from eachother.

So - this blog entry is more of a "thanks" entry! A thanks to the excellent questions and all of the excitement around SQL Server 2005...it's getting close! And the architectures are real, working and in production...in fact, one of the customers in the class this week went live at 2am on Wed and a few others are already live.

Stay tuned! More SQL Server 2005 features to highlight and details about the upcoming launch and maybe I'll even blog a bit about Upgrade issues.

Categories:
Events | SQL Server 2005

Effectively Designing a Scalable and Reliable Database A Primer to Proper SQL Server Development
Indexing Best Practices, Part 4 of 10
Presented by Kimberly L. Tripp, SQLskills.com For a list of the non-technical Resources related to this webcast, please review the following blog entry:MSDN Webcast Q&A for Resources: Indexing Best Practices, Part 4 of 10  Technical Questions

Q: When a clustered index is added to a table what happens physically to the data pages? SQL Server moves the data to the new location defined by the ON clause of the index creation statement. The ON clause can specify a different filegroup or even a partition scheme and the heap will move to the new location.

 Q: I'm not getting the follow-up emails with the replay link. How do I get on this email list for replay?

You *must* register through MSDN. If you are using a partner site then that is probably why you’re not getting the email. IF you are registering through MSDN and still not receiving the email, please contact eventsup@microsoft.com with your information.

 Q: Does the copy affect the transaction log?

Yes. A create index or rebuild index statement does require the statement to be a single transaction. If you are running with your database set to the FULL recovery model then this single transaction will create a lot of log activity. You could switch to the BULK_LOGGED or SIMPLE recovery model and get less log activity but switching to the SIMPLE recovery model should be avoided as it will break the continuity of the transaction log. I would recommend two resources to get more information about Recovery Models and their impact on performance, logging and recovery.

MSDN Webcast Q&A: A Primer to Proper SQL Server Development, Creating a Reliable and Automated Backup Strategy, Part 2 of 10

Review the sample chapter: Chapter 9: Database Environment Basics for Recovery from the MSPress title: Microsoft SQL Server 2000 High AvailabilityQ: Can one switch back and forth between simple and full recovery mode?

Ideally, you would switch ONLY between the FULL Recovery model and BULK_LOGGED Recovery model but not with SIMPLE. The two resources listed above will also help to clarify this point.

 Q: What about re-building a clustered index? Will it be that slow too? Is it recommended to do it off-hours?

Rebuilding a clustered index takes a lot of the same requirements that creating the CL index does. It can also be slow and yes – as a result, you might want to do this off hours. You can speed up the process by changing recovery models but switching recovery models can also have an impact on recovery. See the two resources above for more details to help clarify this point. And, in part 5, I’ll cover more of the fragmentation details.

 Q: When is it desirable to move a table?

This is not necessarily an easy one to answer simply. But, I often recommend separating transaction processing data from read-only data as well as possibly placing a large table on its own filegroup (especially true with partitioned objects). But, I don’t often recommend separating indexes. Here’s a link to a SQL Server Magazine article I wrote – it does give some guidance. In general I don’t separate too many objects. However, if you’re building a database that you will put into a read-only state you might want to work a bit harder to “shift” objects to filegroups in such a way that each of those objects gets a nice and clean and contiguous chunk of space within the file. Another time that it’s useful to “move” an object is when you load data into a heap and then want to build the clustered index on another filegroup – where that heap will become a partition of a partitioned table. This is good when the clustered index will move the object into a filegroup that does not have other objects in it. Again, the clustered index will be nice and clean and contiguous in the destination filegroup.

 Q: So, if I understand correctly… a good approach to creating a table would be to have a clustered index independent of the primary key. So, ideally you could use a GUID for the PK and then have a clustered index in order to minimize the fragmentation.

The most ideal combination is when the clustered index is on an identity column – and even better if this is the Primary Key. However, if you do want to keep a GUID as the primary key then yes, you can create a clustered index which is independent of the primary key.

Q: In general, do you think folks have more problems setting up indexes for OLTP, OLAP/DWH, or Web apps?

In general, I think Decision Support environments can have more indexes with less of a negative… OLTP is a bit harder to find the right balance. The hardest is combination environments…

 Q: What was the diff between 0 and 1 on the third argument of DBCC IND?

The third parameter is for Index ID. 0 = Heap, 1 = Clustered Index, 2-250 are non-clustered index ids.

 Q: Should we keep the install defaults like Collation, and others? Or do you have recommendations?

Generally speaking, yes – I would recommend keeping the defaults. However, I do recommend that some development environments – especially those who are producing software for sale. When the database can be installed/setup on other servers that may have different code pages, etc., you might want to use binary case-sensitive sort. This will make your application more consistent – even with case-sensitive servers…and you never know if one of your customers may have one! It will also help to make you a better coder.

 Q: What design considerations should I take into account when planning my primary keys, indexes and clustered indexes for databases that will be used with replication?

There are no specific requirements for Replication other than the fact that you must have a Primary Key… but the same best practices discussed today apply and should be used in replication as well.

 Q: I am using SQL Server 2000. I have a table that is used a lot on the web server. The scan density is 97%. Should I adjust the indexing and/or Primary Key?

I tend to think more about how fast it drops rather than the actual number. But – 97% is pretty good in general. What you might want to do is a rebuild IF you have off hours time to rebuild (in SQL Server 2000 rebuilding can be done offline ONLY). Otherwise, you might just want to setup a defrag procedure. However, this is NOT

 Q: Why are so many SQL commands undocumented and how do you get to know how to use them if they are undocumented?

First and foremost, I want to strongly urge you to take your time with commands that are undocumented. If the commands are really useful for day-to-day operations, they’re probably already documented. Occasionally, development teams add additional functionality during development to test their functionality. These commands – while proving useful in some cases – do not generally go through the same quality assurance procedures that fully documented commands go through and as a result, I would be very cautious before using most undocumented commands.

 

Having said all of that, you’ll find a lot of useful tidbits in articles, on webcasts, in books, etc.  Just make sure you test the command thoroughly and don’t expect to get support on it. In general, undocumented commands are NOT supported.

 Q: What is the command for scan density on SQL Server 2000?

DBCC SHOWCONTIG. If you’re interested in learning more about defragmentation on SQL Server 2000, please review this MSDN Webcast: SQL Server 2000 Performance Tuning: Index Defragmentation Best Practices

 Q: In SQL Server 2000, DBCC SHOWCONTIG, can you explain extent scan fragmentation and if we should pay attention to it?

We’re going to go over this more with our next session on Index Defrag but Extent Scan Fragmentation describes whether or not the extents that an object owns are “next” to each other. If there are gaps then this implies that there are other objects “interleaved” with this one. This is not ideal, defragmenting the object will help to improve scans and should.

 Q: The size of the database doubles when a clustered index is added?

Once the index is created the space needed to build the index is released. However, while the index is being created you need roughly 2-3 times the TABLE size to build it. If you have a table which is 1GB then you should plan for at least 2-3GB of free space in order to build or rebuild a clustered index. However, it’s only the table’s size that doubles and it’s only during the index creation NOT after the index has been created.

 Q: Aren't GUIDs too large to use for an efficient index?

No, an index can be useful even if it’s a large percentage of the size of the table – mostly, because of the order of the columns and the criteria used in the query. Hard to give a fast answer to this but in general 16 bytes isn’t so bad – much larger may even be acceptable.

 Q: Does level 1 of a clustered index point to rows, or does it describe the minimum and maximum row on a page?

Basically, it’s a pointer to the page and it has a pointer to the lowest value on the page. Each pointer in the non-leaf always points to the “first” value on the page (meaning the lowest).

 

Q: Why would you be inserting using an index other than using the primary key? Assuming the primary key is appropriate, I wouldn’t. However, there are a lot of environments where the primary key is chosen and kept – for a variety of reasons. SQL Server creates the clustered index on the Primary Key by default but it is not required’ you can create the clustered index on something else. It’s the clustered index which defines the insert order – not the Primary Key; however, they are often one and the same.

 Q: Should all indexes be clustered if unique? Would this increase performance on search results from SP?

I think the question should really be should all clustered indexes be unique. Generally, the answer to this is yes. But to answer the second part – would this increase performance on an SP – the answer here is not necessarily. Stored procedures have many reasons for being – or not being – optimal. Plan choice, optimization and recompilation issues all play a major factor in whether or not a stored procedure is optimal. However, in session 7, I will talk about plan caching and stored procedure optimizations!

 Q: If there is no natural clustering key, what is the advantage of "coming up with one". Or, said another way, what is the basic reason why I want a cluster rather than a heap?A heap does not inherently have order to it (that’s the point) but when the records are not ordered then SQL Server needs to find a location in which to insert the row. This causes a lookup in something called a PFS (page free space) as well as a lookup in the table’s Allocation Map (actually called an Index Allocation Map in SQL Server 2000 and a Heap or B-Tree Map in SQL Server 2005). To help the insert performance you generally want to have the insert location defined. However, just having a clustered index is not the best answer as there are other internal dependencies on the log. Because of these internal dependencies you typically want the clustered index to be unique, narrow and static.  If you’re interested in reading more about the reasons behind having a clustered index which is unique, narrow and static:Ever-increasing clustering key - the Clustered Index Debate..........again! MSDN Webcast Q&A: Index Defrag Best Practices - Clustering Key Choice In some cases this means adding a column to a table solely for the purposes of clustering on it. Q: Is it a good practice to just have non-clustered indexes?Well, that’s a different way of answering the prior question…and while heaps can be useful for high performance loading they are generally not as useful for day to day OLTP or combination OLTP/DSS tables. So, my recommendation is almost always – the RIGHT clustered index + additional supporting non-clustered indexes. Q: In large OLTP databases is it recommended to move reporting to a separate database which has a lot of non-clustered indexes and no clustered indexes?No… Again, it’s not really a binary thing (should you always have a clustered, or not always have a clustered). Really, there are cases for both BUT in most databases – even DSS (Decision Support System) – a clustered index helps to define a narrow and unique value that all indexes have in common. This can help other operations…  So, even in a DSS environment I generally recommend a clustered, as well as additional supporting non-clustered. Having said that there are cases where a single clustered – setup for order by and range queries can be useful but that’s generally only when the large majority of queries all want * (all columns) and you typically want the exact search and order by in almost every case. Q: I've read that you shouldn't cluster a column that is an Identity column because it hurts performance for OLTP as all writes are on same page/area of disk? Your thoughts?

There are some cases where insert performance can be compromised – in very high volume OLTP systems (typically more than 400 inserts per second but this can vary) but in general, inserting into the same “hot” spot of a table improves performance because the activity is isolated and the needed pages are already in cache. Lots more on this on in other Q&As.

 Q: If I add an identity column to my table to have a good clustered key, it will never be used in a query. Would I naturally also include the primary key, even though it would make the clustered key less narrow?

You don’t need to include the primary key is the clustered key is already unique, narrow, static and ever-increasing.

 Q: Does this mean that vertical partitioning of a table is only helpful when the table is scanned?

No, there are still many benefits in vertical partitioning – such as with locking, scanning. I’d refer you to review the Webcast that was part 3 in this series, for more details.

 

Q: Can you use DTA against SQL 2000 databases?

Yes. You can use either the DTA or the ITW against SQL Server 2000. If you're looking for details about how to use the SQL Server 2000 Index Tuning Wizard you can review this whitepaper: Index Tuning Wizard for Microsoft SQL Server 2000.

 Q: Is DTA in the Express version, too?

DTA does not come with the Express Edition but you can certainly do tuning on a higher version and then use that tuned database in Express.

 Q: Can you explain briefly Index View?

It’s a materialized view – in the sense that the data as defined by the view – is actually materialized and stored on disk. This has both pro’s and con’s and needs to be evaluated fully before you implement these. Please check out these whitepapers for more details:

 Improving Performance with SQL Server 2005 Indexed Views by Author: Eric Hanson and Contributors: Gail Erickson, Lubor Kollar, Jason WardImproving Performance with SQL Server 2000 Indexed Views by Author: Gail Erickson and Contributors: Lubor Kollar, Jason Ward contributor Q: Do you think one can become expert in SQL Server and other aspects of Design/Programming or does one need to specialize?

Wow, I think it this is a tough question; more than anything I would recommend learning all of the options available so that you can better architect the right solution. Once you think you have a good solution, then move forward to really learn that area/topic/features. I’ve been using a phrase with SQL Server 2005 a lot – Jack of all trades, master of some...

 Q: Can you repeat the new function INCLUDE on index?

In SQL Server 2000 the maximum size of an index key is 900 bytes or 16 columns – whichever comes first. This key helps to keep an index’s b-tree smaller and more scalable. In SQL Serer 2005, the leaf-level of the index can include additional non-key columns. This continues to help keep the tree structure scalable but also allows you to cover more queries.

 Q: I have installed SQL Server 2005 CTP, but can only find SQL Configuration Manager. I don't see the management tools that you are using. Any suggestions?

Actually, no. This seems a bit strange?! I might try uninstalling and re-installing, as well as reviewing the setup logs to see if there are errors within them. For SQL Server 2005, all installation log files are stored in C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG.

 Q: if your non clustered index either includes all or part of the clustered index are the clustered index columns added again to the non clustered index?

NO! SQL Server only adds the column(s) of the clustering key – if they are missing.

 Q: Did you mention the cool timebound option in DTA?

Yes, but only briefly. Vipul Shah discusses DTA in greater detail in these two webcasts.

 TechNet Webcast: Performance Diagnosis in SQL Server 2005 (Level 300) by Vipul ShahTechNet Webcast: SQL Server 2005 Database Tuning Advisor (Level 300) by Vipul Shah Q: What is the fill factor and how is it used?

I think I’m going to defer you to the next session! Now, there’s some motivation. J

 

And here are a few comments that I’ve received! I just wanted to say thanks for your comments and thank yous… J It's your questions that keep me on my toes and current. This is still fun so keep it coming.

 

Comment: Thank you again, particularly for thoroughness and enthusiasm.

Comment: Let me add my voice to the chorus of "thank you's", Kim. You're  presentations and verbal Q&As are great, and your online ones are a truly generous gift to the community. No other webcaster has come close to doing what you do. Many thanks from all of us.  

Comment: I must compliment you on your Q&A in your blog - I've just checked it out - couldn't be more comprehensive!

 

So – 4 down, 6 to go! Lots more questions coming I’m sure J

 

If you’re interested in hearing more about fragmentation, how to view it, how to clean it up and finally, how to prevent it – here’s the registration link:

MSDN Webcast: A Primer to Proper SQL Server Development (Part 5 of 10): New Features in Indexing and Index Maintenance Best Practices (Level 200) Thanks!kt

Effectively Designing a Scalable and Reliable Database A Primer to Proper SQL Server Development

Indexing Best Practices, Part 4 of 10

Presented by Kimberly L. Tripp, SQLskills.com

Q: Can I view a recording of this webcast? The webcast will be available for viewing within 24 hours. All “on-demand” webcasts can be found here.

Q: Where can we get the demo scripts AND the sample database: Credit? The demo scripts are in this zip (20050826MSDNDemoScripts.zip (12.17 KB)); here in this blog entry. However, at the series completion, I will also create an entry under Past Event Resources for the entire webcast series.  To download the ZIP of the Credit Database Backup click here. Once unzipped, restore this backup to SQL Server 2000 or SQL Server 2005. The backup is a SQL Server 2000 backup and can be restored to either version! If restoring to SQL Server 2005, you might want to change the destination for the data and log file as the path will probably be different.

Q: Where are the links to all prior Webcast Q&As from this series?

Part 1: Creating a Recoverable Database
For the MSDN Download for Part 1, click here.
For the SQLskills Blog Entry for Part 1, click here.

Part 2: Creating a Reliable and Automated Backup Strategy
For the MSDN Download for Part 2, click here.
For the SQLskills Blog Entry for Part 2, click here.

Part 3: Designing Tables that Scale, Best Practices in Data Types and Initial Table Structures
For the MSDN Download for Part 3, click here.
For the SQLskills Blog Entry for Part 3, click here.

Q: How can I replay previous sessions? I thought we were going to get emails for replaying -- but I haven't received any replay emails. You will receive replay emails ONLY when you register for these sessions through MSDN. We’ve come to find out that there are other ways to register but it’s only through MSDN that we know for sure you will receive the replay information. Regardless, you can always find the “on-demand” version of the sessions here.

Resources mentioned in today’s Session:

If you’re interested in hearing more of the theory behind the indexing recommednations, as well as more of the reasoning behind unique, narrow and static criteria for the clustering key –  watch the TechNet "It's Showtime" recording, recorded at Tech Ed Europe titled: Index Creation Best Practices with SQL Server 2005 which covers Index Creation Best Practices on SQL Server 2005 and really gets into a lot of the "why's". If you’re interested in hearing a lot more about how to use Profiler effectively with ITW, watch the MSDN Webcast titled: SQL Server 2000: Performance Tuning - Finding the Right Balance of Indexes which covers Index Creation Best Practices on SQL Server 2000.

As for whitepapers:

SQL Server 2005 Beta II Whitepaper: Partitioned Tables and Indexes by Kimberly L. Tripp

SQL Server 2005 Beta II Whitepaper: Snapshot Isolation by Kimberly L. Tripp

PREVIEW: The Database Administrator’s Guide to the SQL Server Database Engine .NET Common Language Runtime Environment by Kimberly L. Tripp

Improving Performance with SQL Server 2005 Indexed Views by Author: Eric Hanson and Contributors: Gail Erickson, Lubor Kollar, Jason Ward

Improving Performance with SQL Server 2000 Indexed Views by Author: Gail Erickson and Contributors: Lubor Kollar, Jason Ward contributor 

Additional Webcasts:

TechNet Webcast: Performance Diagnosis in SQL Server 2005 (Level 300) by Vipul Shah

TechNet Webcast: SQL Server 2005 Database Tuning Advisor (Level 300) by Vipul Shah Support WebCast: SQL Server 2000 Profiler: What's New and How to Effectively Use It by Sri Kasam and Ajay Manchepalli TechNet Support WebCast: I/O performance problems and resolutions in Microsoft SQL Server 2000 by David G. Brown TechNet Webcast: SQL Server 2005 Series (Part 7 of 10): Indexing Enhancements (Level 200) By Bryan Von Axelson

Summary!

Finally, check out the Indexes category on my blog – for a variety of Q&A postings, other resources and lots of great links! The rest of the technical Q&A, I should be able to post soon. Stay tuned! Thanks,kt

MSDN Webcast: A Primer to Proper SQL Server Development
Designing Tables that Scale, Best Practices in Data Types and Initial Table Structures, Part 3 of 10

Presented by Kimberly L. Tripp, SQLskills.com

Q: Can I view a recording of this webcast? Here’s the specific replay link.

Q: Where can we get the demo scripts? The demo scripts are in this zip (20050819MSDNScripts.zip (3.67 KB)); here in this blog entry. However, at the series completion, I will also create an entry under Past Event Resources for the entire webcast series.

Q: Does Kimberly L. Tripp have a blog? Yes, www.SQLskills.com/blogs/Kimberly and if you’re reading this on my blog…well, then you already know! J

Q: Where are the links to all prior Webcast Q&As from this series?

Part 1: Creating a Recoverable Database
For the MSDN Download for Part 1, click here.
For the SQLskills Blog Entry for Part 1, click here.

Part 2: Creating a Reliable and Automated Backup Strategy
For the MSDN Download for Part 2, click here.
For the SQLskills Blog Entry for Part 2, click here.

Q: Is this Session 2 of the "Effectively Designing a Scalable and Reliable Database" webcast series? The Title bar here reads "MSDN Webcast: MSDN: A Primer to Proper SQL Server Development (Part...)". A bit misleading don't you think? So, I finally figured out the mystery here. When I originally submitted my proposal for this series, I titled it like this: Effectively Designing a Scalable and Reliable DatabaseA Database Developer’s Primer to Proper SQL Server Development (Level 200-300) But somehow, through a few edits and since each module also has a more detailed title shown in this following list, then I think the title just got edited down to fit. In fact, with subtitles like these, I can barely fit the entire title, subtitle and session title on a single slide. J

Part 1: Creating a Recoverable Database

Part 2: Creating a Reliable and Automated Backup Strategy

Part 3: Designing Tables that Scale – Best Practices in Data Types and Initial Table Structures

So, the mystery ends here… It’s all the same content and the general idea is that building a scalable and reliable database takes a variety of best practices that together make it happen; there is no “DBCC MAKEITGOFASTER(‘TRUE’)” option. I'll unveal a new title slide in part 4... oh, the excitement!

Q: How can I replay previous sessions? I thought we were going to get emails for replaying -- but I haven't received any replay emails. You will receive replay emails ONLY when you register for these sessions through MSDN. We’ve come to find out that there are other ways to register but it’s only through MSDN that we know for sure you will receive the replay information. Regardless, you can always find the “on-demand” version of the sessions here.

Q: What does CLR stand for? Common Language Runtime. Quick WHATIS.COM Definition here.

Q: If something uses sp_OA* now in SQL 2000, am I guaranteed it will use SQLCLR in SQL 2005? Not automatically; however, you should consider rewriting you sp_OA* is used to automate COM objects; these, by definition, are not framework objects. Moving to SQL Server 2005, you can continue to run with your previous sp_OA procedures to access COM objects but if you can write SAFE .NET Frameworks code to do the same thing then you should convert your sp_OA* code where possible. Not everything can be converted but the basic principle is that which you use sp_OA* is a good thing to target for conversion. For some great details on the differences and how to move forward – go here. Also helpful is this KB Article regarding COM and .NET in SQL Server 2000 Using extended stored procedures or SP_OA stored procedures to load CLR in SQL Server is not supported

Q: I loved how XML data is available interactively in SQL Server Management Studio (SSMS); can HTML data columns be displayed in the same way? Yes, if you store HTML in a column of type XML then the column data will appear as a link and when clicked, you can edit it in the XML Editor window in SSMS.

Q: Is the "XML" datatype really a "varchar(max)"? No. In previous releases the XML type really was just a “blob” stored in the database. There were no inherent optimizations and you were not able to define a scheme for an XML column. In SQL Server 2005, XML data is stored natively and offers the ability to put indexes on the columns as well as better access and manipulation. You *can* store XML as varchar(max) but it will be harder to access/code. There’s a very nice description of the tradeoffs in this whitepaper:XML Options in Microsoft SQL Server 2005 

Q: What is the difference between numeric and decimal? In SQL Server they are synonyms. I seem to remember hearing that there was a difference in the standards…somewhere but with a few references and looking around I can’t find any. Regardless, I’d make sure that you standardize on one or the other – just in case they do change, at least all of your data is consistent.

Q: I use decimal datatype for money values. Why it may be suboptimal to use decimal (instead of money datatype)? I wouldn’t say that it’s suboptimal – except maybe in storage length. However, decimal can be more precise.

Q: When should you use a float? A number of my clients use Float – i.e. Banks, Stocks, etc. What would be your argument against this? Just the lack of an absolute minimum precision/scale.

Q: We have a database where datatypes are set. But to use a simple percentage calculation we had to add .0 to the calculation to get the correct percentage. For example 100(5/9) is truncated to 0, but 100((5+.0)/9) yields correct result. Is there any standard procedure without massive conversions to assure a correct result? It’s a matter of data type. In the case of 100, 5 and 9 – these are all integers – without any decimal place. So – the division of 5 divided by 9 yields 0. However, 5 divided by 9.0 or 5.0/9 – both have a decimal and in that case the other data types are implicitly converted to the higher/more precise data value. A cool way to find out what the actual “base type” and/or the level of precision/scale is to ask these questions as if the data is of type sql_variant:

SELECT SQL_VARIANT_PROPERTY(5/9.0, 'BaseType')
SELECT SQL_VARIANT_PROPERTY(5/9.0, 'Precision')
SELECT SQL_VARIANT_PROPERTY(5/9.0, 'Scale')   -- numeric(8,6)
SELECT SQL_VARIANT_PROPERTY(5/9, 'BaseType')
SELECT SQL_VARIANT_PROPERTY(5/9, 'Precision')
SELECT SQL_VARIANT_PROPERTY(5/9, 'Scale')   -- int

Q: I am surprised by the nchar/nvarchar suggestions. I've always been told to NOT use these data types. Can you give the "reasoning" for this suggestion here? The primary reason that you want to use Unicode data is to avoid a cost in “translation” between the client and the server – when you’re using web-based applications that natively use Unicode. Additionally, with ever growing customer bases – you can more effectively store international data properly by supporting Unicode in the client and in the database. You are correct that these take 2 bytes per character rather than one – but that’s what gives them the larger range of characters to use/display/manipulate. Also, if you plan to use the SQLCLR against these columns, you will need to use Unicode as opposed to ASCii as the SQLCLR requires it.

Q: Does Unicode use twice as much space? Yes, Unicode columns are 2 bytes per character. If you defined a column of type nchar(100), that column will require 200 bytes of storage.

Q: Sometimes, and not always, it becomes a tradeoff in Storage Space v. Performance, doesn’t it? That’s an excellent point. You always need to look at the trade-offs associated with any decision and in fact, this is yet another. So, I always want the row size to be as optimal and as reasonable as possible. However, there are many compelling reasons – international and/or web-based applications as well as SQLCLR functionality.

Q: Is the N' keyword usable in MSSQL 2000? Yes, all Unicode character strings should be preceded with an upper case N in all releases where Unicode is supported (SQL Server 7.0 was the first release that supported Unicode).

Q: Is there any relationship between the OS's code page and nvarchar/varchar? Yes. When SQL Server is installed- it chooses the server’s default code page from the Operating System. A code page (or collation) defines the order in which characters are evaluated. The additional attributes a collation can define are: whether or not the characters are case-sensitive; the sensitivity of width and kana-type. This may not seem obvious but if these characters sets are native to you then the order to the data would probably make more sense.  Once the collation is set for the server, the database inherits the server’s collation by default, when tables are created – each character-based column inherits the database’s default collation by default. However, at ANY of these levels the collation can be changed. There are benefits to doing so – in that you can keep data within a specific table ordered for a specific character set. Wow, now that I’m getting started, this could be a session in and of itself. So, just to give you a few tips:

  • Be careful using temporary tables if your database’s collation is different than your server’s collation.
  • Changing collation on the fly (for example, if you want to do case-sensitive searching through a view – of which I’ve included a sample file) can be expensive.
  • Changing a database’s collation is supported but you might have troubles converting from a case-sensitive to a case insensitive database.

The files that can help you to get more familiar with verifying collations, changing collations, etc. are CaseSensitiveSearchingInView.sql and ChangingDatabaseCollation.sql.

Q: Another great use of adjunct tables is to isolate culture-specific data from the base (and usually indexed) data. So, this is a great point. If you change the collation for a column – and index it – then you can get better performance when accessing that column. Changing a collation on the fly (as in the script sample for the previous questions), is what’s expensive. However, you should make sure to do a lot of testing.

Q: I have a US code page OS but Latin general 850 collation set for SQL Server; I have no problems storing and retrieving data. Exactly! That’s the benefit of collation information being stored with each column. It allows the application to store many types of data – natively and then just deal with their display at the client.  For those of you who want more details on how to do this, see COLLATE in the books online.

Q: Somewhere along the line I was always told to define varchars in multiples of 8 characters. Something to do with space allocated with new rows.  Any ideas if it has any premise? Well, my first response to this was that I’ve never heard of this… So, I thought I’d check around and well, the couple of folks I’ve chatted with haven’t heard this either… IF anyone has – feel free to comment on this blog.

Q: How about images? Is it better to store them in the db or just the url? This is really an old debate. I still think there’s validity to both sides of it as well. I’d say that you should go through the pro’s and con’s and really choose what’s best for your application. If you use a lot of urls then you might be able to reference more without having to manage a copy of that data… If the url is your data then you might find that managing it in the database (from a backup/restore perspective) is easier – and you know that it’s always going to be there v. the url occasionally not working. Wow, I’d almost like to get a panel of folks together to have everyone hash out their opinions on this one. J SQL Server 2005 does make managing images easier but there are still points to both arguments!

Q: How would SQL Server support becoming a data server of video file? The only real format for this data would be binary. However, there is a 2GB limitation to a single value.

Q: Are you aware of a stored procedure to handle blob data effectively? i.e. for image storage? No. But there is an executable that comes with SQL Server 2000’s samples and I think it’s called textcopy.exe. That might be some good sample code to review.

Q: Does SQL Server support the SQL 2003 NULLS FIRST and NULLS LAST keywords of the ORDER BY clause? No support for NULLS FIRST or NULLS LAST but the order by can include ASC for ASCending or DESC for DESCending. NULL values are treated as the lowest value so NULLS FIRST would be ASC and NULLS LAST would be DESC. But – that’s the only support for specific placement of NULL values.

Q: Is there a "best practice" for the new C# nullable types? i.e. int? vs. int -- re:MSSQL2005? When programming with the SQLCLR, always use types that allows NULLs (assuming your data can contain nulls) and then make sure to always follow the appropriate behavior with NULL. For example, concatenating NULL should yield NULL.

Q: Can I change (refactor) the underlying type of a UDT (e.g., from int to bigint) in SQL Server 2005?  Unless I've missed something, I can't do it in SQL Server 2000 unless I drop the type from all uses. No, as the user-defined data type (UDDT) is only used at create time to map to the datatype; the connection is then broken (altering the UDDT definition will only impact new tables and variable definitions). User-defined Types (UDTs) are even more schema bound than UDDTs and it's a much tougher job to alter them (you need to write conversion from the old to the new). This is much more complex since they fall outside the conversion matrix and the only thing SQL knows is the binary representation. So if the data type was smart enough to add a signature to the serialized data, so that the next version can understand which data type is represents and determine if it can convert to it you might have a shot, otherwise you have to add a column, and convert columns and drop a column afterwards.

Q: Can I easily replace the SQL Server 2000 UDT with a SQL Server 2005 CLR Custom Type? I have a client who wants to -- eventually -- migrate to SQL Server 2005, but for the immediate future wants me to build their application with SQL Server 2000. I'm very interested in SQLCLR types, especially WRT to maintaining type consistency across T-SQL, CLR SP's and external data objects, so using CLR custom types. This is a typical thought for the SQLCLR – “I am going to build an object database.” Type fidelity between the client and the server exists today, but they want class fidelity, so they are going to create the customer object UDT, which is not really for what the SQLCLR was intended. You should be thinking scalar types with additional custom characteristics – not object types – as the scalar types will give you optimal and predictable performance.

Q: Can you find ANSI_NULL_DFLT_OFF setting as on or off? Does DBCC USEROPTIONS do that? Yes, DBCC USEROPTIONS is a nice, quick way of seeing the currently set LIST of session settings. However, there are better ways to check these settings programmatically. it’s not as nice to check it programmatically  The files that can help you to get more familiar with ANSI Null Issues is ANSINullIssues.sql.

Q: It would be great if you could show us a list of  classic fields "Account No", "Account Balance", "Description" etc… and tell us what type you would use inc. "Null/Not Null" etc. Well, this is a bit subjective and with a bank account there would probably be a lot of business rules that surround the account number. However, if this were a sales table and you needed an ordered then it would be a bit easier (again, business rules might change this). Typically, I’d go for a Primary Key which is simple and narrow and not necessarily a “natural” key. However, some of this reasoning comes from how tables are joined and how SQL Server internally stores data. Account Balance – I’d go with a precise numeric. Based on length of storage and the business rules behind required precision, you might choose a numeric(19,6). This will require 9 bytes of storage. For description, I would definitely go with a variable type – probably nvarchar(200) depending on what description really means. Here’s how the table would probably look on creation:

CREATE TABLE Account(
AccountID     int           IDENTITY  NOT NULL CONSTRAINT AccountPK          PRIMARY KEY,
Balance       numeric(19,6)           NOT NULL CONSTRAINT AccountBalanceDflt DEFAULT 0,
Description   nvarchar(200)           NULL)
go
INSERT Account (Description) VALUES ('The account of Kimberly Tripp.')
PRINT 'Kimberly Tripp''s account number is ' + convert(varchar(10), @@IDENTITY) + '.'
go

Q: In the Q&A you might want to tell your listeners that in SQL the string 'abc' is equal to 'abc ' when compared – unlike for C, C++ ,C# , VBNET, etc. This will help with the char v. varchar issue… It’s a good point! You should make sure that you do some testing to make sure that data behaves the way you expect. Trailing blanks may be trimmed with some data types and not others and once again, ANSI settings may impact this behavior.

Q: What are the NULL block and varchar block in the row layout? The NULL block helps to store and identify rows that are set to NULL. In the past, SQL Server stored this as a single space in a varchar column ad there was no real way to differentiate between a real space (as a value) from a NULL value (stored as a space). When the storage engine was re-architected for SQL Server 7.0 the NULL block was added to store a NULL value specifically.

Q: How would the table structures differ between the 32-bit and 64-bit in SQL Server 2005 and SQL Server 2000? They don’t! There is no difference in the on-disk format between any of the versions of SQL Server. In fact, this was even true of the MIPS/Alpha/Win32 releases when SQL Server 6.0 released on all of those platforms.

Q: do you have a script to calculate rows per pages (using the formula you showed? Ha! Seemed a bit painful I know… But that formula works.But, yes – it’s a heck of a lot easier to just do this yourself if you have a table already. In SQL Server 2000 you need to use DBCC SHOWCONTIG with TABLERESULTS and in SQL Server 2005, you use a new DMV called sys.dm_db_index_physical_stats. This takes 5 parameters and has some really nice new features to it.

Q: I might have missed this, but is the SQL Server Management Studio (SSMS) the replacement for Enterprise Manager? Yes, SSMS replaces the Enterprise Manager and the Query Analyzer. And, for a more scaled down query tool, you can use Express Manager (XM). To download XM, you can go here  If you want more details regarding the latest SQL Server CTP, you can go here.

Q: Is Management Studio backward compatible for use with SQL Server 2000? Yes! You can use SSMS to connect to SQL Server 2005 and SQL Server 2000.

Q: How do we write our queries for Insert/Update/Select on a database that we have no idea how it was setup - so we don’t experience the local issue that you talked about? Basically, the best way is for the developer to create an interface that masks the complexities of the database schema through stored procedures, functions and views.

Q: Do indexes point to the row level or the page level? There are two primary parts to an index – the leaf level and the non-leaf level. The leaf-level of a clustered index IS the data (a clustered index is an ordered table).

Q: What is the best practice or rule of thumb on using identity primary keys on tables in a database? And would that be all tables and/or some tables or not all? This is a great question. However, I might phrase it a bit differently because I *think* I know where you’re going with this… Generally, I recommend that large tables have a clustered index on an identity column. This can help to improve performance of inserts by not causing splits (inserts go to the end) and by helping non-clustered indexes look up the corresponding rows in the clustered table by using a clustering key which is unique, narrow and static. This is really quite a bit related to the session on indexing… I think you should definitely attend session!

Q: How could a LOB end up in the leaf level? This is really another great question for the indexing session – and a bit on internals as well. In SQL Server 2005, rows can span pages (so you can have a row greater than 8060 bytes) and rows can have LOB types (even in the leaf level of an index). I’ll explain how this can happen with the new INCLUDE option for indexes – in the next session!

Q: Are you talking at the Vegas launch of 2005? Yes! You can see the complete list of my upcoming events here.

Comment: By the way, thanks for the great Web cast. I don't know of many other companies that do this sort of thing -- take questions directly from their customers.

Thanks! This is part of what I love about these webcasts… it allows me to hear what you’re interested in and it allows me to keep learning. To be honest, often I have to look things up and/or hit the KB and/or ask some of the other SQL geeks I know – in order to answer some of these questions. It really helps to keep me on my toes! In fact, this week I bothered two of my favorite people, my partner Patrick and my good friend Gert.

 

Thanks again for another GREAT week of questions!

kt

See you later today – Friday, Aug 26th

MSDN Webcast: A Primer to Proper SQL Server Development (Part 4 of 10): Best Practices in Indexing (Level 200) 

Two days isn't enough time!

In fact, that's really true. There are so many interesting (well, I think they're interesting) aspects to Indexing that we could in fact do a whole 10-part Series on Indexing and it's likely that we'd still have other things to look at and talk about. So, in preparation for part 4 of my MSDN Webcast Series, I thought I'd take a few minutes to pull together a number of related resources. If you watch/read them prior to the webcast then you'll be a step ahead. To minimize overlap, part 4 is a myriad of best practices with more of the "what's" and "how's" to implement the right indexes - but not quite as much of the theory and/or the "why's" as I've done quite a bit of that before and even if you don't watch/read these resources BEFORE the session...you can use these resources to fill in the gaps and answers the "why's" AFTER the session!

So, let's talk about the session... I'm going to focus on SQL Server 2005 for all of my demos (and I'm going to try to be VERY demo heavy in this one) but quite a bit of the theory also applies to SQL Server 2000...

If you're interested in some of the "why's" and/or you want to hear the information for SQL Server 2000 and/or you just want to see what other folks have asked, here are your options:

Watch the MSDN Webcast titled: SQL Server 2000: Performance Tuning - Finding the Right Balance of Indexes which covers Index Creation Best Practices on SQL Server 2000
Watch the TechNet "It's Showtime" recording, recorded at Tech Ed Europe titled: Index Creation Best Practices with SQL Server 2005 which covers Index Creation Best Practices on SQL Server 2005 and really gets into a lot of the "why's"

Read many of the past blog entries - including A LOT of Q&As created from former webcasts by accessing my "Indexes" category on my blog entries here.

So, if you're not scared to spend quite a bit of time with indexes... I think there's lots of good stuff to review! At a minimum you should think about showing up tomorrow, I promise to give you Practical Indexing Strategies and a lot of demos to prove my point!

Effectively Designing a Scalable and Reliable Database
A Database Developer’s Primer to Proper SQL Server Development
Part 4: Best Practices in Indexing

Attend Part 4 LIVE on 26 August 2005 by registering at this link!

This text is STRAIGHT from SQL Server Magazine's September 2005 Issue: Reader's Choice Awards 

A First Look at SQL Server 2005 for Developers by Bob Beauchemin, Niels Berglund, and Dan Sullivan claimed best-book honors in a highly competitive Readers' Choice category. The book shows working developers how to take full advantage of SQL Server 2005's key innovations. Readers get practical explanations of SQL Server 2005's new data model, built-in .NET hosting, SQL-99 compliance, and more. Virtually every key concept is illuminated by sample code tested with Microsoft's public beta. Key coverage includes: SQL Server 2005 as .NET runtime host, security, reliability, performance, writing procedures, functions, and triggers in .NET languages, leveraging T-SQL enhancements, the XML data type and XML query languages, SQL Server 2005 as a Web Services platform, client-side coding, ADO and ADO.NET enhancements, SQLXML, mapping, ObjectSpaces, and using SQL Server 2005's built-in application server capabilities. 

Congratulations you guys... writing a book is hard enough. Writing a good one is quite an achievement! 

Categories:
Opinions | SQL Server 2005

MSDN Webcast Q&A: A Primer to Proper SQL Server Development

Creating a Reliable and Automated Backup Strategy, Part 2 of 10

Presented by Kimberly L. Tripp, SQLskills.com

Q: Can I view a recording of this webcast? Here’s the specific replay link.

Q: Where can we get the demo scripts? The demo scripts are in this zip (20050812MSDNWebcast.zip (4.86 KB)); here in this blog entry. However, I will also create an entry www.SQLskills.com under Past Event Resources for the entire webcast series.

Q: Does Kimberly L. Tripp have a blog? Yes, www.SQLskills.com/blogs/Kimberly and if you’re reading this on my blog…well, then you already know! J

Q: Where are the links to all prior Webcast Q&As from this series? Click here for the MSDN Download for Part 1, click here. For the SQLskills Blog Entry for Part 1, click here.

Q: Is this Session 2 of the "Effectively Designing a Scalable and Reliable Database" webcast series? The Title bar here reads "MSDN Webcast: MSDN: A Primer to Proper SQL Server Development (Part...)". A bit misleading don't you think? So, I finally figured out the mystery here. When I originally submitted my proposal for this series I titled it like this: Effectively Designing a Scalable and Reliable DatabaseA Database Developer’s Primer to Proper SQL Server Development (Level 200-300) But somehow, through a few edits and since each module also has a more detailed title like the following, then I think the title just got edited down to fit. In fact, with subtitles like the following… I can’t even fit the entire title, subtitle and session title on a single slide. JPart 1: Creating a Recoverable DatabasePart 2: Creating a Reliable and Automated Backup StrategyPart 3: Designing Tables that Scale – Best Practices in Data Types and Initial Table StructuresSo, the mystery ends here… It’s all the same content and the general idea is that building a scalable and reliable database takes a variety of best practices that together make it happen; there is no “DBCC MAKEITGOFASTER(‘TRUE’)” option.

Q: WRT "best practices," what do you think of SQLBPA? It’s a great (and free J) tool that can help you to recognize some of the most commonly overlooked best practices. Part of the intent is also to help you determine if you’re violating certain practices that might impact your upgrade strategy. For example, if you’re accessing the system tables you will want change that code to use information_schema views, stored procedures (sp_*) or system functions to gather the system information. If you’re looking solely for information about your upgrade path, the BPA has been updated/replaced with a new tool called the Upgrade Advisor.

Best Practices Analyzer Tool for Microsoft SQL Server 2000 1.0 

Upgrade advisor URL - Microsoft SQL Server 2005 upgrade Advisor - CTP June version

Q: In the SBS2003 environment the default backup does a live backup using a volume shadow snapshot of the MSDE and SQL databases on the server. How bad is this practice v. doing a SQL backup and backing up the backup and the logs? Well, there are really a couple of issues here… First, the positives of controlling backups within SQL Server:(1)   You can backup databases individually(2)   With certain database backup strategies (specifically, Database/Log strategies in the FULL Recovery Model), you can recover up to a specific point in time (P.I.T. Recovery) or to a marked transaction or even up-to-the-minute (if the transaction log is still accessible at the time of the disaster.(3)   The process of backing up the transaction log helps to keep the transaction log small and manageable.Second, the negatives:(1)   You have to manage each database backup strategy individually – however, tools like the Database Maintenance Wizard can help to minimize this.(2)   You’ll need to do backups more frequently if you want to take more advantage of these features – however, this would also give you better granularity on your backups AND result in less data loss because your snapshots probably aren’t very frequent.Now – as for volume shadow snapshot – this is GREAT for a complete image of the system and it does work (especially well) when you have to replace the complete system or when you want access to specific files on the system. Where you can run into troubles (and I’m not saying that this is always going to be the case, one way or the other) is when a SQL Server database becomes damaged and you revert back to just a file (or even all of the database files) from a shadow copy image. In almost all cases (and I really can’t think of too many where this would work except in some read only cases), you won’t be successful *just* restoring an old image of just a single database file. So, the damaged database *might* end up being lost. Instead, you could restore the entire database image from the shadow copy point in time and that *might* work; however, if the database structures have changed since you backed it up then the best choice would be to drop the database, copy over the files (the mdf and ldf) and then re-attach the database. That should work and all you’ll need to do from there is add any logins and/or changes since the last backup. Where things could be a lot worse is if one of your system databases becomes corrupt. This might limit your ability to recover. Now, having said all of this – I *know* that your SQL backup strategy WILL work and I know the complete set of options related to it. While it is a bit more administrative work, if your data is critical and minimizing data loss is *very* high on your list; I would use BOTH. Volume Shadow Copy for the system and file recovery strategies and SQL Server database and log backups for your databases. OK, so I decided that I wanted to check things out a bit more…especially, since I am not all that well-versed on VSS; I decided to look up as much as I could find and well, I’m going to stop with this. First, pre-Windows 2003 sp1 it looks like even backing up a volume that includes databases that are in the FULL or BULK_LOGGED Recovery models, generates an error. As of sp1 (or the hotfix associated with this KB), the error is no longer generated BUT I did find the following paragraph very useful:

This hotfix does not let Windows Backup perform a Volume Shadow Copy service backup of a SQL Server database with a recovery model that is set to Full or Bulk-Logged. You must use the SQL Server backup procedure to back up your SQL Server databases that have a recovery model that is set to Full or Bulk-Logged. And – if you do a backup with NTBackup, it will log a message in your backup log of the following: NtBackup does not support backing up SQL databases which are configured in non-simple recovery mode.If the database has been backed up, it is likely corrupted. Please do not restore this database.The recommended way of backing up non-simple SQL database is to use the backup solution that comes with SQL Server.

So, while I’d really like to dig into this deeper – I have to use my best judgement at this point and say that you should probably work to handle backups through SQL Server.

Q: What is the default recovery model in MSDE and SQL Express? I ended up demo’ing this in the presentation. I was sure about MSDE but I wanted to do a quick demo on SQL Express. The quick check is just to check the database properties of the model database… Another option, create a new database and see what the properties are. For both MSDE and SQL Express, the default database recovery model is SIMPLE.

Q: How would you compare the native backup to products like LiteSpeed from Imceda (which is used internally at Microsoft - I'll avoid the typical dog food comments) and SQL Backup from the chaps at Red-Gate? Third-party products can offer significant additional benefits over just the native backup types – for example, encryption and compression. This can result in improved security, faster backup times and smaller backup files.  

For more details about SQL Lite Speed, please go to: http://www.imceda.com/For more details about SQL Backup, please go to: http://www.red-gate.com/

However, I would also like to make a comment about the fact that Lite Speed is used internally at Microsoft. Backup products are an interesting special case in terms of how they work with regard to SQL Server. Most backup products use the VDI interface (the Virtual Device Interface provided for SQL Server) in order to get access to essentially the same pages that SQL Server Backup does… So, in fact, using a third-party product like SQL Lite Speed, is actually still testing the VDI interface and still working SQL Server pretty well.

Q: Can you use 3-rd party backup with log shipping? Depends on whether or not they support it and/or have automated software to help you configure it. Many products do offer ways to help automate log shipping with compressed versions of your backup files… See earlier links for references to other products.

Q: How do you fix a log file that won't truncate? The best way to fix a transaction log that won’t clear:(1)   Terminate all connections to the database (best to set the database to SINGLE_USER mode)(2)   Backup the transaction log with a normal BACKUP LOG command(3)   Shrink the transaction log to the smallest size possible (DBCC SHIRNKFILE WITH TRUNCATEONLY)(4)   Alter the database to increase the transaction log to a more appropriate size with ALTER DATABASE.(5)   Return the database to MULTI_USER mode.These steps are exactly the same as the steps necessary to remove fragmentation in VLFs. The file that can help you to get more familiar with this process is located in the scripts associated with this session and is named: VLF Optimization - Northwind Walkthrough.sql.

Q: If you do the steps recommended to clean up the fragmented VLFs, does it break the continuity of the transaction log backup chain? No! You can backup the transaction log before, as part of (step 2) and after you clear up the fragmentation – and no other backups are necessary. It is a complete chain that can be used in recovery.

Q: The transaction log portion of the tempdb seems to be growing and growing- even with the simple recovery model. In fact, it only seems to release space when CHECKPOINT is run manually? Well, I have to admit that this is a bit strange and should not otherwise be the case. I would suggest seeing if you have any open transactions in the database (DBCC OPENTRAN) but that shouldn’t make a difference if it DOES clear when you execute CHECKPOINT. So, I have to admit that this is a bit bizarre. I would try to do some profiling and see if you can find any consistencies with the types of commands that are running at the time when it doesn’t seem to clear. I would also wonder if this is ALL the time or just occasionally. You may want to contact PSS for more assistance with this.

Q: TRUNCATE TABLE does not benefit from BULK_LOGGED? When I say that something benefits from the change to the BULK_LOGGED recovery model, it means that the change must offer a performance gain or otherwise. In the case of TRUNCATE TABLE, it is always executed in a manner which is logged – efficiently. So, there is no gain in switching; it is always an optimal command to execute.

Q: We have a habit of switching to "bulk-logged" mode when we are doing index -rebuilding, etc. which seem to benefit from this setting. Then, when we are done, we "switch" back to full mode. Does this affect our ability to recover later on using the logs that we are creating because of this switching back and forth? and Q: If I switch between the FULL and BULK_LOGGED recovery model, do I need to take full database backup after the switch to BULK_LOGGED? No! In fact, it’s recommended as long as you’re aware of the options that you lose when you are running in the BULK_LOGGED recovery model. However, the transaction log continuity is not broken – given the fact that you can successfully back up the log after the bulk operation completes. To minimize the time of which you are vulnerable, make sure to backup your log right before the switch (to BULK_LOGGED) as well as right after the switch (to FULL).  The file that can help you to get more familiar with this process is located in the scripts associated with this session and is named: DBAlterForBatchOperation.sql.

Q: Is it advisable to use transactions for each and every SQL statement? Even if they are a single, simple, one-command transaction? If not, how can we use the marked transaction? Well, the answer to the first part of the statements are: NO, it is not necessary and it is not advisable to wrap every command in a BEGIN TRAN/COMMIT TRAN block. However, as per the second part – you cannot mark a transaction without using the following complete syntax: BEGIN TRANSACTION TransactionName      WITH MARK 'description of mark'SQL_StatementsCOMMIT TRANThe file that can help you to get more familiar with this process is located in the scripts associated with this session and is named: Restore Log and Marked Transactions.sql.

Q: Do you dynamically create Transaction MARKs to make the mark names unique? I would assume a restore would stop at the first mark it came across with the specified name? You can certainly use dynamic string execution to build your transaction mark names dynamically. However, you can also use a mark with a datetime parameter. From the BOL: The STOPATMARK and STOPBEFOREMARK options both support an AFTER datetime clause. When used with datetime, mark names need not be unique. In a RESTORE DATABASE statement, datetime is required to recover to a marked transaction. In a RESTORE LOG, however, datetime is optional. If AFTER datetime is omitted, roll forward stops at the first mark with the specified name. If AFTER datetime is specified, roll forward stops at the first mark having the specified name exactly at or after datetime.

Q: When configuring a database maintenance plan through the Database Maintenance Plan Wizard, there is no possibility to switch between recovery models…Correct. However, you can modify the jobs created by the Database Maintenance Plans to include changes to the recovery model. BUT – having said that, I’m not sure why you’d want to change it during the actual backups… Instead, you should consider changing the recovery model as part of your batch process that would benefit from the change.

Q: How do you clear the log in SQL Server 2000 through the Enterprise Manager? The only way to clear the transaction log through the SQL Server Enterprise Manager is to do a regular transaction log backup. The way you can do this is to right-click on databases, tasks, backup - then choose a transaction log backup. The default behavior is to clear the inactive portion when the backup completes.There is no way to just “clear the transaction log” as that would break the continuity of the log. You could execute a command to clear the log but if you find that you’re manually clearing the log and NOT keeping transaction log backups, I would recommend setting the recovery model to SIMPLE so that the transaction log is cleared automatically.

Q: Can you explain when you need to recover .mdf and .ldf files? Recovering directly from the .mdf and .ldf files (through sp_attach_db) is really only guaranteed when the files were closed properly – through either a sp_detach_db or the files were copied when the server was shutdown.

Q: How do you recover if the master database becomes corrupt? Is the only option uninstalling and reinstalling SQL Server? No, SQL Server 2000 includes a utility to rebuild the master database called rebuildm. SQL Server 2005 uses setup – but with special parameters that only rebuild the system databases as opposed to do a full installation.

Q: Any BP on restores on replicated database i.e. distribution matching database? Transaction Log marking can help to ensure that multiple servers are all restored and rolled forward to the same point in time. Additionally, there are options related to whether or not you KEEP_REPLICATION (this is for the publisher). However, outside of those couple of options, you should review the Books Online for best practices in this configuration.

Q: Is it possible to backup database in the standby mode? No. But, I wasn’t completely sure during the webcast so I decided to try it. On the backup, I received:Database 'test' is in warm-standby state (set by executing RESTORE WITH STANDBY) and cannot be backed up until the entire load sequence is completed. BACKUP DATABASE is terminating abnormally.

Q: With log shipping and 2000, and the destination database is exactly the same as the source, how about backing up the destination rather than the source? What happens to the shipped logs during the backup? You can’t backup a database that is in the NORECOVERY or the STANDBY recovery completion states. As a result, you can only backup the source, not the destination.

Q: What are my risks if I use Simple Recovery and what are the advantages of it? Risks are mostly in the potential for work-loss exposure and data loss in the event of a disk failure for your database. The advantages are in simplicity and ease of management because you don’t need to backup the transaction log. However, through simple automation procedures you can automate the backups of transaction logs while getting the most options for recovery. If you want more details, you should review the chapter that’s available for download on SQLskills. This chapter covers the differences in the Recovery Models as well their pros and cons in management and recovery! Here’s a direct link: www.sqlskills.com/resources/SQLServerHAChapter9.pdf 

Q: How come you have both the SQL Server 2005 and Express on the same machine? Is it because you don't have Visual Studio 2005 installed? The documentation of VS.Net 2005 states to uncheck the SQL Server 2005 Express if also installing SQL Server 2005 CTP. Part of the reason that Visual Studio doesn’t want you to install Express if you already have the CTP of SQL Server is that they may not be the same versions. However, if you have the appropriately matched builds of SQL Server and Visual Studio, you can install multiple versions of SQL Server on the same machine- with Visual Studio Whidbey. Having said that – of what’s available today- the correct combination would be SQL Server CTP June (IDW15) with Visual Studio Beta II. On my machine, I’m actually running SQL Server 2000 (as a default instance), two Developer editions and one Express edition. AFTER SQL Server was successfully installed, I installed the appropriately matched Visual Studio installation. There won’t be any need for this once the products release in November but while they’re both being developed – and while there are “later” releases that leap-frog over each-other, it’s a bit confusing.

Q: SQL 2000 Maintenance Plans for transaction logs (say, all user databases) indicate job failure if just one user db is in the SIMPLE recovery model. Any alternatives short of creating our own backup script to avoid this SQL Maintenance backup job failure? Yes, this is because the sqlmaint.txt does not verify the recovery model.

Q: Is this a SQL 2005 behavior too? Database Maintenance Plan Wizard has been changed significantly in SQL Server 2005 and it’s now a lot easier to create more interesting/more complex plans on one or more databases – I would work harder to create a class of jobs for your SIMPLE recovery model databases (just full backups and maybe differentials) and a different class of jobs for your FULL/BULK_LOGGED recovery model databases.

Q: Does the execution of a transaction log backup impact simultaneous database usage? If well optimized the cost of a transaction log backup should be relatively low. Might be a good time to make sure that the transaction log is optimized! Go figure, I wrote another blog entry to help you optimize the transaction log. Check out 8 Steps to Better Transaction Log Throughput.

Q: Why does the log backup restore slower than restoring a full database backup? In general, a transaction log is a more intense operation. When restored, the transaction log must be redone (“redo”) and then undone (“undo”). This process is what applies the changes to get you up to what the database looked like when the transaction log backup was executed. And, while each log row’s redo operation is generally much faster than the original execution (they do not have to do any calculations, etc. as the FINAL version of the changes for the row is what’s in the log), there might be a lot of them!

Q: Do I have to backup the ReportServerTempDB? I’m not a reporting services expert but my understanding is that yes, you do need to backup a ReportServerTempDB. The best thing to read is from the Books Online: Backup and Restore Operations for a Reporting Services Installation.

Q: Will the 2005 Backup wizard allow scripting of the 'scheduling'? The backup command itself will be in the scheduled job but the statement that is used to create the job can only be scripted by scripting the job – not the backup command.

Q: Did you talk about partial database availability? Yes! SQL Server 2000 does not offer this new feature, but SQL Server 2005 does. Partial database availability is a new feature that allows a damaged database to stay online – even while secondary data files are damaged and unavailable.

Q: What is an optional secondary file?An optional secondary file is a data file that is NOT the mdf. Secondary data files are created in larger databases and store user-defined data. They might be used to store different types of data and/or even possibly part of a large table (partitioning). Imagine having 100s of millions of rows – because you have three years of historical data – if year 2003 is in a secondary data file and becomes damaged, it will not require the database to be taken offline.

Q: I have not seen Management Studio. Is this new with SQL Server 2005? Yes, SQL Server Management Studio (SSMS) replaces both Query Analyzer and SQL Server Enterprise Manager – in a more robust UI that also include Solution/Project control and Source Integration. Check out the online labs to get some hands-on lab time in the Virtual Lab environment here.

Q: A FULL BACKUP has the option "remove inactive entries from transaction log" --- why doesn't it seem to do this? It would seem from this option that nightly full database backups would keep the log size small? This is a SQL Server Enterprise Manager bug. It should NOT even be an option on that tab. It should be grayed out. If you click around to differential and then back to full on the General Tab, before you go to the Options tab, then you will no longer see that option available. As far as this being something desired – not really… The reason why clearing the log is NOT performed after a full backup (or as part of a full backup) is so that if the full database backup becomes corrupt in some way, you can fall back on your transaction log sequence to successfully restore! Basically, this ensures that nothing ever breaks the continuity of the transaction log!

Q: You recommend Trace Flag 3231, what does it do? Trace Flag 3231 is an undocumented Trace Flag that disables the ability to clear the transaction log in a database that is running in the FULL or BULK_LOGGED recovery models. This significantly reduces the ability for someone to break the continuity of the transaction log… ah, so that would probably be the next question – what else could possibly break the continuity of the transaction log? Changes to the SIMPLE recovery model. It is highly recommended that ONLY changes between FULL and BULK_LOGGED be performed in databases where transaction log backups are relied upon for recovery.

Q: How can we test our backup strategy? How do we test our differential backup content? When we test the differential backup content, do we need to restore the full backup content at the same time?The best test of your backup strategy is a complete restore sequence to a secondary server – which you then follow up by testing your application. There *are* other dependencies outside of the database that could cause your application to fail. However, the restore to the secondary server will ensure that at least your data is accessible. Below are some KB articles that you can review for more details on some of these additional dependencies. To access them, go to http://support.microsoft.com.  Q240872 – INF: How to Resolve Permission Issues When a Database is Moved Between SQL ServersQ246133 – INF: How To Transfer Logins and Passwords Between SQL ServersQ307775 – INF: Disaster Recovery Articles for Microsoft SQL ServerQ224071 – INF: Moving SQL Server Databases to a New Location with Detach/Attach.

Q: I have a database server hosted by a 3rd party on the web ... is there a secure, automated way for me to make remote copies of the Database Backup? You'll need to talk to your ISP to see if they support that. More than anything it depends on the interface they give you, the services you've paid for and whether or not you have secure channels over which you can communicate with your ISP.

Q: Is there a way for you to answer these questions – that is – release the question from the question manager so that another question can be asked? I had a question in the "queue" for most of the webcast and because of that, I could not ask additional questions which I had wanted to ask... I ended up deleting my original question just so the question window would allow me to ask the next question... This is a LiveMeeting issue. I’m trying to put just a quick “a” in the response during the webcast but really, it’s an issue with the LiveMeeting Query Manager UI. I REALLY wish that questions just went into the queue and didn’t hold your window as well… I’ll see if we can pass your suggestion on to the LiveMeeting folks!

Thanks again for another GREAT week of questions. See you in the next session…which because this is posted so late, is session 4 - on Friday, August 26th. You can register for this session here: MSDN Webcast: A Primer to Proper SQL Server Development (Part 4 of 10): Best Practices in Indexing (Level 200)

Thanks for reading!

kt

MSDN Webcast Q&A: A Primer to Proper SQL Server Development

Creating a Recoverable Database, Part 1 of 10

Presented by Kimberly L. Tripp, SQLskills.com

Q: Can I view a recording of this webcast? Here’s the specific Replay Link: http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032278585&Culture=en-US

Q: Where can we get the demo scripts? The demo scripts are in this zip (20050805 MSDN Webcast ScalableSys01.zip (5.47 KB)); here in this blog entry. However, I will also create an entry www.SQLskills.com under Past Event Resources for the entire webcast series.

 

Q: Does Kimberly L. Tripp have a blog? Yes, www.SQLskills.com/blogs/Kimberly and if you’re reading this on my blog…well, then you already know! J

Q: How about running on a VMware Virtual Machine? Sure.  

Q: Is the placement of data and log important when using a disk array or SAN? It’s important to make sure that you don’t have bottlenecks at the physical disk. You should see if your SAN software supports monitoring at the disk level and if so, consider reconfiguring if bottlenecks occur. It’s harder to say that a SAN will have the same problems that you would have with direct attached storage since SANs often place larger logical disks on more than one physical disks (by using 9, 18 or 36 GB chunks from a variety of physical drives). The main point is that SANs offer a lot of caching and other optimizations, so you don’t often have to worry as much. However, I would strongly suggest getting to know your SAN config as well as work with your SAN vendor to really get an understanding of how the resources are allocated as well as used. Also, make sure you’re current with all bios, firmware, drivers, etc.

Q: Where will be the demo code available? What version of SQL Server will you be using? All demos were done on SQL Server 2005; however, for this (and even for many others), a lot of the demo code works on both SQL Server 2000 as well as SQL Server 2005. At a minimum, many of the concepts apply and where a feature is new or only supported on one version, it will be pointed out. For example, fast file initialization is on SQL Server 2005 only.

Q: Does SQL Server support log mirroring (i.e. dual logging)? No. A very old version did (I think 4.2 on OS/2 did) but the key problem was performance. You are a lot better off letting the hardware handle the mirroring rather than software talking to os talking to hardware.

Q: Do Secondary files allow partial backups? Not really sure I follow this question but…if you’re asking whether or not you can perform a backup of just a part of the database – yes! In fact, any file can be backed up independently of the filegroup of which it’s a member (there was a restriction in 7.0 that didn’t allow this but as of SQL Server 2000, files can be backed up at any time). If you’re really inrerested in file and filegroup backups, I wrote a couple of articles for SQL Server Magazine and you can find the complete list here: http://www.sqlskills.com/articles.asp 

Q: What is a page? Generally speaking a page is a unit of storage. Specifically speaking with regard to to SQL Server, a page is 8K in size and is the smallest unit of I/O in terms of data. When a table is stored on disk, it is stored in 8K chunks (and most of the time SQL Server allocates 8 – 8K chunks to objects). A 64KB block of the database is called an extent. SQL Server allocates extents once an object reaches a minimum size (which is also 64KB) to try to keep an object more contiguous.

Q: Is caching user specific or server specific. If two users are accessing same table, does it create two separate pages in cache or just one? Well, data cache is not user specific (at least not with regard to your question). However, there are a few things that would be local in scope – like a user-defined temporary table… However, a database table being accessed by many users, would only have one set of pages in cache.

Q: Why would a user of the database do a checkpoint? Users cannot perform checkpoints, only database owners and administrators can force a checkpoint. Generally speaking, it is not often that a checkpoint needs to be force. SQL Server controls and handles the checkpoint automatically.

Q: Wouldn't the engine know the best time to do a checkpoint? Yep! And it does. The default setting for checkpoint is “0” minutes. Meaning – SQL Server decides!

Q: How does page caching function during the 'redo' process? Same as it would normally. Data being “redone” is loaded into cache and changed – based on what’s held in the transaction log. (Keep reading, more in the next question/answer.)

Q: So does the transaction log contain all the information about say an update… i.e. what columns, what data? Yes, for the most part what is contained in the transaction log is the “after” version of the modifications so that log rows do not need to execute any functions or other code. The idea is that log rows can be processed extremely quickly – but have sufficient information to make sure that the data is modified properly. (And keep reading, more in the next question/answer.)

Q: When you say transaction is re-done, what exactly happens? Does SQL Server automatically take values again from cache and copy to the disk? What if we loose the cache also during the process? Then redo will begin again when the system restarts…Generally, the process is – go to the log, redo, undo, checkpoint and the information is not deemed inactive until after it checkpoints. So, even if you had the equivalent to “truncate log on checkpoint” this information would not be lost in the midst of restart recovery (redo) because it’s still active until it’s checkpointed!

Q: Is Roll back notification done asynchronously or is another client request required, e.g. long running client with long SQL call intervals…I’m not sure if I follow this exactly BUT I think you’re wondering how you would know that your transaction has been rolled back? The key way that many applications know – is that they never got actual confirmation and maybe they’ve lost their connection and just timed out. But, if this doesn’t answer your question…send me mail!

Q: Can you pre-grow a database at non-peak times grow the database when space starts getting low? Sure, there are a few options really. One way, create a SQL Agent Job that checks space allocation at 2am and if it’s getting close to full – execute a manual increase in space. With a bit of dynamic string execution I think you could get this to be a very flexible and easily automated process!

Q: Best drive configuration for data, logs, indexes? Well, typically, I like to separate data by access pattern and type rather than data from indexes. Typically, I like RO v. RW v. a single large table v. LOB Data.

Q: I have a 24/7 SQL Server production environment with a database that's well over 300 GB - when do I defrag it? You really have a couple of options to defrag a table – truly defrag (and only defrag it) or rebuild it (which does a lot more than just defrag). To make the answer even more clear – defrag often as it doesn’t take the table offline (in 2000). Rebuild the table when/if you can afford downtime against that table (in SQL Server 2000, a rebuild requires that the table be either read-only [when rebuilding a non-clustered index] or completely inaccessible [when rebuilding the clustered index]). So, now this gets a bit harder to answer! Check out the webcast on SQL Server 2000 – Index Fragmentation Best Practices here.

Q: If you have multiple data file, can you merge back to a single data file? Only when the files are a member of the same filegroup. If/when you want to do this you have two steps: DBCC SHRINKFILE(file_to_remove_logical_name, EMPTYFILE) This will empty the contents of the file into the other files within the same filegroup. Once emptied, use:ALTER DATABASE dbnameREMOVE FILE file_to_remove_logical_name

Q: What is zero initialization?Where the entire contents of the file are zero’ed out. This is done for security reasons.

Q: Doesn’t truncateonly make us lose continuity between transaction backups? First – this was with regard to my recommendation to use DBCC SHRINKFILE with the TRUNCATEONLY option NOT using with TRUNCATE_ONLY on a transaction log backup… So, specifically, NO. DBCC SHRINKFILE with the TRUNCATEONLY option does not break the continuity of the transaction log. Now – just to add a bit of irony here… BACKUP LOG with TRUNCATE_ONLY no longer breaks the continuity of the transaction log in SQL Server 2005. In fact, BOTH the TRUNCATE_ONLY and the NO_LOG options have been changed to ONLY perform a CHECKPOINT. In a database running in the FULL or BULK_LOGGED Recovery Model, this will have NO real impact on the transaction log. In a database running in the SIMPLE Recovery Model, this will execute a checkpoint and the database setting of simple truncates the inactive portion of the transaction log when a checkpoint occurs.

Q: I've read the BOLs about faster performance when putting files on raw partitions? Hmm… I’d love to see the reference. They might have said that raw partitions may offer performance benefits but I would generally doubt it. More importantly, you’d lose other key features if you didn’t use NTFS – like Database Snapshots. So, my main point – don’t use raw partitions! Even if they did offer a performance gain, what you’d lose isn’t generally worth AND I can get better gains elsewhere (indexing, optimizing procedural code, etc.)

Q: Could you please tell us quickly why raid 1+0 is better? Basically, RAID 1+0 offers better availability than RAID 0+1 because it can tolerate the loss of more than one drive. If a drive in a RAID 1 array is lost, all other drives still function. If a drive in a RAID 0 array is lost, all other drives in the RAID 0 array stop functioning. Here’s a good link to review more about different RAID configurations: http://www.raidarray.com/04_00.html

Q: I recently came across a 200meg db with a 50GB log. We tried everything to truncate it but finally could only fix it by changing to a SIMPLE recovery model. What is the most likely culprit for such an outrageous log growth? Well, I wish I could say I hadn’t seen this… But, it’s due to autogrowth and it’s due to running in the full recovery model without performing transaction log backups (but you are performing database backups). Now, the reason why you had so much trouble with it – was because it was horribly fragmented (because of the autogrowth defaults).  So, the main point here – if you follow the steps of the VLF Optimization content from the session, you would have also solved the problem. I’d at least check your current fragmentation and make sure that everything is fixed!

Q: What is a "long" transaction (seconds, minutes, or hours?) There is no specific time but the longer a transaction is – especially relative to transaction log backups – the more possible larger growth of the log and less control.

Q: Will *you* be the one doing parts 2-10? Yep… I’m a glutten for punishment. Just kidding! To be honest, I really enjoy these webcasts!!! See you in the next one.

Q: Should SQL Server developers invest the time to learn Microsoft's Enterprise Library (the Application Blocks thing)? OK, I checked around a bit on this and the general consensus is that developers should be aware of all of the resources that exist. It’s by no means an absolute but there are quite a few *very* useful resources there. Review things lightly to see if anything seems useful to your application and then dig deeper if/when they do.

Q: Should the SQL Server Disks be defragged at the os layer with disk tools to defrag? Sure. It doesn’t hurt and can help. The only negative is that you must shutdown SQL Server in order to do so. The good news, once defragged you don’t really need to do it again (unless you have a lot of autogrowth and other files on the disks).

Q: I had a lot of problems convincing my clients to create a new instance of SQL in their servers. The objective was to isolate our systems from theirs. What are the pros and cons of creating an additional instance against adding more DBs to an existing one? Well, you’re correct that it’s mostly an isolation issue. A big concern in some SQL Server environments is the access to metadata that everyone has. In SQL Server 2000, logins can see that other databases and other logins exist – even if they can’t use them. This presents a security concern. In SQL Server 2005, metadata is restricted so that this doesn’t happen.

Finally, here is the link to the Part 2 Registration page.

See you in Part 2,

kt

Wow, I've been horribly bad at blogging these days and I owe you a few entries as well... But, good news, I have a new MSDN Webcast series that will help to answer a lot of best practices in database design strategies that can help you achieve better scalability, availability and reliability. All of my demos are on SQL Server 2005 but many of the concepts apply to both SQL Server 2000 as well as 2005.

The sessions are all targeted at best practices and as such we went with a base of 200 level. Knowing me I can't go through an entire session without some internals and/or technical depth so you should expect 200-300 overall. I'm very excited about this series because everything counts! What I mean is that each and every best practice you implement helps to bring about a more scalable, available and reliable solution - there are no magic bullets only great overall design can achieve these targets!

So, what are we going to talk about in 10 - 90 minute presentations..... a lot!

Here's a link to the first one and as we get rolling, I'll blog a lot more about additional resources, best practices and of course, I'll blog the webcast Q&A as I've done in the past.

Enjoy!

Today, SQLskills proudly annouces an exciting addition to our company... Bob Beauchemin - a database developer and SQL Server 2005 expert in the developer space - joins SQLskills as the Director of Developer Skills. Bob brings a tremendous amount of expertise with him in having worked with computers since 1977 and a variety of databases during this time. In the time that I have known Bob, I have been impressed with both his technical expertise as well as his ability to convey complex topics with clarity. We've got quite a few things in the works and we'll keep you posted of the changes on our blogs and our SQLskills website.

In the interim, make sure to update all of your references for Bob! If you've subscribed to his blog in the past, update your links to his new blog site:

If you want to read a bit about his past, present and future, check out our AboutBob page here: http://www.sqlskills.com/aboutbob.asp

And, if you're interested in hearing the PodCast he recorded with SQL Down Under host Greg Low, check out: http://www.sqldownunder.com/. The specific downloads for Bob's recordings are here: SDU4FullShow.mp3 or here: SDU4FullShow.wma.

Exciting times indeed. Stay tuned!

 

Categories:
Events | Opinions | SQL Server 2005

If you're not aware, there are two SQL Server 2005 contests running:

Connected Systems Developer Competition, Prize $50,000 USD
Click on the graphic for the link to the competition site:

 

Best Migration from Oracle to SQL Server, Prize - Custom Chopper (approx. $50,000 USD)
Click on the graphic for the link to the competition site:

GOOD LUCK!

Categories:
Events | SQL Server 2005

Scott Stanfield, a Microsoft Regional Director, had a great idea for Tech*Ed US (well, it was a great idea before he knew exactly how much work was involved :)...

He decided that he would get other Microsoft Regional Directors together to discuss (and record) interesting topics for just 10 minutes (of course, some of us went over... SORRY!). We could pick anything we wanted and then he would - with the help of a few other RDs such as Scott Golightly, Patrick "Beach Master" Hynds, Kate Gregory, and J. Michael Palermo - record our sessions to later edit and post on the GrokTalk site.

So, after A LOT OF HARD WORK - they're done! All 35 of the GrokTalks are posted and you can find them on the GrokTalk site: http://www.groktalk.net/

Finally, since I delivered a GrokTalk on Stored Procedures, I'm going to make sure that you have a comprehensive set of resources related to stored procedure optimizations here:

Well, now that should keep you busy...

Enjoy!!

NOTE/UPDATE: Be sure to read this post as well: Transaction Log VLFs too many or too few? after reading this one. Not only can you have too many small VLFs but if incorrectly sized, you can have too few! Enjoy!!

On a few of my last trips, I've had the pleasure of doing a lot of customer visits - visiting some of the larger implementations of SQL Server. Many of these implementations are at banks where both performance and recovery are critical. After my trip to Turkey (where I did customer visits prior to presenting at the Microsoft Professional Developers Summit), I received this email:

As you might remember we talked at xxxbank, Turkey about performance problems in transaction log backups and splitting tempdb data file into equal size chunks equal to number of CPUs. I have implemented both optimizations in my troubled server, and the results are great! Transaction log backups do not impact the server at all and with lowered congestion in tempdb, overall system performance has gone up as well.

So - this is my first of two blog entries targeting these two VERY typical problems:
* Transaction log fragmentation (both internal and external - this customer was having problems with internal fragmentation)
* Optimizing TempDB

We'll start with 8 Steps to Optimizing your Transaction Log - not necessarily in any order. It's best to review all of these to make sure you have a comprehensive view of how to improve both performance and recovery of your transaction logs!

1) Try to dedicate the transaction log portion of your database to its own phyiscal disk. In high volume OLTP system, isolating the transaction log can allow the disk head to be ready for the next write by not having other files contend for the physical disk resource. If your database already exists, the best way to "move" your transaction log is to detach your database (sp_detach_db) and then reattach it (sp_attach_db). Here's a great KB article titled: Moving SQL Server databases to a new location with Detach/Attach.

2) Defrag the disk(s) on which your transaction logs reside. This will get rid of external fragmentation of the transaction log - better known as disk file fragmentation. This will require that your server be taken offline but if your files have had a lot of autogrowth and/or they reside on a disk with a lot of other files that have been modified, then all of your files (incl. the transaction log file) are likely to be interleaved and fragmented. The good news is that this is a one time operation (unless the files continue to grow and shrink excessively - but we'll fix that as well). The bad news is that you'll need to shutdown SQL Server and then use some form of system tool - Microsoft or third party - to defrag. On Windows XP and Windows Server 2003 machines, the disk defragmenter tool can be launched by running: dfrg.msc.

3) Create only ONE transaction log file. Even though you can create multiple transaction log files, you only need one... SQL Server DOES not "stripe" across multiple transaction log files. Instead, SQL Server uses the transaction log files sequentially. While this might sound bad - it's not. If you want to get better performance out of the transaction log, place it on faster disks and/or a more performant disk (RAID) configuration. You could place the transaction log on a RAID 0 array but you'll likely want fault tolerance as well as performance. Isolation, as per number 1 is a good start but if you also need increased capacity then using hardware based RAID to combine disks for both better performance and redundancy is your next choice. See 4.

4) Not only should you try to isolate the transaction log to its own physical disk but you should make sure that the logical/physical disk configuration is as efficient as possible. Try to use an isolated RAID 1 mirroring set if you don't need significant capacity. If you need a greater capacity OR you want better performance, consider a combination of RAID 0 and RAID 1 (either RAID 0 + 1 or RAID 1 + 0). While RAID 0 + 1 can often offer better performance, RAID 1 + 0 offers better reliability. If you're new to RAID and are interested in learning more - check out the RAID Tutorial here: http://www.raidarray.com/04_01_00.html.

5) Don't be caught up in nothing but transaction log speed, you'll also want to make sure that your transaction log is always available as this can help you in times of disaster. Even if the data is damaged, if the transaction log is available and you have a series of backups up to and including the last transaction log then you can make a final backup of the transaction log that will represent all of the changes since your last transaction log backup. If this backup is possible (it's called backing up the "tail" of the log), then you can achieve up-to-the-minute recovery. This is only possible when the log is available. By placing the log on mirrored disks you can increase your chances of recovering data and minimize data loss!

6) Create transaction log files with a reasonable initial size. When you create a database it's ideal to PRE-ALLOCATE both your data files and your transaction log file. A little bit of capacity planning goes a long way... Now, if you think that you've got absolutely no idea how to size your transaction log you're going to need, well - here are the things that have the greatest impact:

  • Type of activity - transaction processing or decision support
  • Frequency of that activity - the more frequent the changes, the faster the transaction log will grow
  • Recovery Model - the recovery model of the database
  • Frequency of transaction log backups
  • Whether or not replication is used (since the log reader relies on the transaction log)

I wish I could give you a rough idea on sizing but if you look only at database size and none of these other factors, you could end up with a transaction log that's seriously oversized or seriously undersized. I've seen recommendations of 10-25% of the size of the data and you can use that but I would also add a bit of common sense. A larger database with very frequent transaction log backups may not need a transaction log that's even 1% of the size of the data... The best way is to setup your development environment similar to that of your production environment (including backup jobs) and then see how the transaction log grows. If you have a lot of autogrowth (because your guess was wrong), you can later clean up the fragmentation that has occurred and get back to a reasonable, intact, and optimal transaction log. If you want to know more about Recovery Models and their impact on the transaction log, review a sample book chapter from an MSPress book (Microsoft® SQL Server™ 2000 High Availability) that I helped to co-author. I only helped out on a couple of chapters but this specific chapter explains a lot about the general database environment settings that impact recovery. You can download Chapter 9: Database Environment Basics for Recovery here: http://www.sqlskills.com/resources/SQLServerHAChapter9.pdf

7) Don't let autogrowth get out of control. As important as capacity planning, you're not likely to be spot-on in your estimates. I don't recommend completely turning off autogrowth but in general I also don't like SQL Server 2000's default growth rate (or max size). In general, I would recommend setting ALL of the transaction log files attributes: initial size, growth rate AND maximum size. For the growth rate, I recommend something that can be allocated somewhat quickly and something of a fixed size. In general, I recommend a value which is less than or equal to 1GB (based on total size) but something that doesn't mean that you're going to autogrow again soon. So, for databases whose transaction logs are under 1GB then you might set autogrow to somewhere between 20 and 100 MB. For those of you who have transaction logs measured in GB, then I'd set the autogrowth to 500MB or 1GB. In general, I don't like the percentage because it needs to be calculated (which I realize isn't really all that big of a deal) but the larger the file the larger the autogrowth and the longer it takes. In SQL Server 2000, autogrowth can create blocking...so, it's best to minimize this in general.

8) Check/fix your internal fragmentation. OK, so this is the one that really helped the customer in Turkey. Often, when transaction logs are not pre-allocated and/or when there's been a lot of autogrowths, the transaction log can become internally fragmented. Internally your transaction logs are broken down into smaller more granular chunks called VLFs (Virtual Log Files). The size and number of VLFs you'll have depends largely on the size that the chunk is when it's added to you transaction log. If you add a new chunk to the transaction log which is 20MB (through autogrowth or through manual growth) then the number of VLFs that are added is 4. If you add a chunk which is greater than 64MB but less than or equal to 1GB, you'll add 8 VLFs. If you add more than 1GB then you'll add 16VLFs. In general, most transaction logs will only have 20 or 30 VLFs - even 50 could be reasonable depending on the total size of the transaction log. However, in many cases what happens is that excessive autogrowths can cause an excessive number of VLFs to be added - sometimes resulting in hundreds of VLFs. Having an excessive number of VLFs can negatively impact all transaction log related activities and you may even see degradation in performance when transaction log backups occur. To see how many VLFs you have solely look at the number of rows returned by DBCC LOGINFO. The number of rows returned equals the number of VLFs your transaction log file has. If you have more than 50, I would recommend fixing it and adjusting your autogrowth so that it doesn't occur as fequently. To get rid of all of the execessive VLFs, follow these easy steps to shrink off the fragmented chunk and add a new, clean chunk to your transaction log:

1. Wait for an inactive time of day (ideally, it would be best to put the database into single user mode first) and then clear all transaction log activity through a regular transaction log backup. If you're using the simple recovery model then you don't need to do a log backup... Instead, just clear the transaction log by running a checkpoint.

 BACKUP LOG databasename TO devicename

2. Shrink the log to as small a size as possible (truncateonly)

DBCC SHRINKFILE(transactionloglogicalfilename, TRUNCATEONLY)

NOTE: if you don't know the logical filename of your transaction log use sp_helpfile to list all of your database files.

3. Alter the database to modify the transaction log file to the appropriate size - in one step

ALTER DATABASE databasename
MODIFY FILE 

      NAME = transactionloglogicalfilename 
    , SIZE =
newtotalsize
)

NOTE: Depending on the total size desired, you might want to break this into multiple chunks. Be sure to read this post as well: Transaction Log VLFs too many or too few? after reading this one. Not only can you have too many small VLFs but if incorrectly sized, you can have too few! And... there's a bug referenced/mentioned there. You'll want to read that post as well!

And, that should do it!

Many of you in the .NET Community are already familar with theServerSide.NET but many of you in the SQL Server community are not...well, here's your chance to bridge the gap over to .NET and hear a bunch of great .NET interviews! My interview is more SQL-centric but there are lots of great ones on Development in general and even one on SQL Server 2005 Reporting Services by Jason Carlson.

My interview was recorded with them back in February when I was speaking at VSLive in San Francisco and they've been saving them up, editing them and then posting an interview roughly each week (mine is the 29th posted!). The interview was great fun and Paul Ballard asked some really interesting questions about tuning, indexes, stored procedures and SQL Server 2005. Mostly it's an interview about optimization tips in both SQL Server 2000 and 2005 and just in general - things to look forward to in SQL Server 2005.

Here's a link to all of theServerSide.NET interviews: http://www.theserverside.net/talks/index.tss 
Here's a link specifically to mine: http://www.theserverside.net/talks/videos/KimberlyTripp/interview.tss?bandwidth=dsl
Here's the link to Jason Carlson's Reporting Services interview: http://www.theserverside.net/talks/videos/JasonCarlson/interview.tss?bandwidth=dsl and finally, here's a TechNet Webcast that Jason did titled: Authoring Reports in SQL Server 2000 Reporting Services (Level 200)

Enjoy!

On Tuesday, June 21, I had the pleasure of assisting on a TechNet webcast delivered by Michiel Wories - a Program Manager from the SQL Server Development Team at Microsoft. My assistance was limited as I really just listened and helped answer some of the questions that came up during the session. In fact, there were a few of us fielding questions and in general, there seems to be a lot of excitement around this new tool. The focus of the webcast was a new utitlity called SQLCMD. SQLCMD is a command-line tool that replaces OSQL and allows a lot of new functionality, such as: parameterization, access to environment variables, complex scripts that connect and execute against multiple servers, MUCH better code page support, input scripts, output scripts and error handling within scripts. Well, you'll probably learn more if you just watch the webcast! Below are a bunch of resources related to SQLCMD, I hope you enjoy.

Michiel Wories webcast titled: Scripting in Microsoft SQL Server 2005 (Level 300)
Michiel Wories blog entry with all of his webcast demo scripts: http://blogs.msdn.com/mwories/archive/2005/06/21/scripting_webcast.aspx

As for additional resources:
I've written a couple of blog entries that related to new SQL Server 2005 tools and specifically SQLCMD here:
     SQL Server 2005 - Management Studio Scripting Options and SQLCMD (loving it!)
Click here for the full group of my SQL Server 2005 blog entries.

Click here for the full list of all TechNet on-demand webcasts.

Click here for the list of “Get Ready for SQL Server 2005” resources!

Last but not least, here are the blogs of the other MVP's who helped answer questions during the SQLCMD session - lots of great stuff to keep reading!
Louis Davidson, Steve Kass and Kent Tegels

Categories:
Events | Resources | SQL Server 2005

As I'm preparing for my Tech*Ed session on Indexing Best Practices in SQL Server 2005, I'm reminded that there are a ton of best practices that really apply to both SQL Server 2000 as well as SQL Server 2005. When it comes to indexing, there are many dependencies on the storage structures. These dependencies are the basis for why I recommend a very specific type of clustering key - for all versions of SQL Server, 7.0 and higher!

I'm going to start with my recommendation for the Clustering Key - for a couple of reasons. First, it's an easy decision to make and second, making this decision early helps to proactively prevent some types of fragmentation. If you can prevent certain types of base-table fragmentation then you can minimize some maintenance activities (some of which, in SQL Server 2000 AND less of which, in SQL Server 2005) require that your table be offline. OK, I'll get to the rebuild stuff later.....

Let's start with the key things that I look for in a clustering key:

  • Unique
  • Narrow
  • Static

Why Unique?
A clustering key should be unique because a clustering key (when one exists) is used as the lookup key from all non-clustered indexes. Take for example an index in the back of a book - if you need to find the data that an index entry points to - that entry (the index entry) must be unique otherwise, which index entry would be the one you're looking for? So, when you create the clustered index - it must be unique. But, SQL Server doesn't require that your clustering key is created on a unique column. You can create it on any column(s) you'd like. Internally, if the clustering key is not unique then SQL Server will “uniquify” it by adding a 4-byte integer to the data. So if the clustered index is created on something which is not unique then not only is there additional overhead at index creation, there's wasted disk space, additional costs on INSERTs and UPDATEs, and in SQL Server 2000, there's an added cost on a clustereD index rebuild (which because of the poor choice for the clustering key is now more likely).

Why Narrow?
A clustering key should be narrow for some of the same reasons it should be unique. If the clustering key is used as the lookup key from all non-clustered indexes, then the clustering key is duplicated in all non-clustered indexes. If the clustering key is really wide, then all of the non-clustered indexes will be [unnecessarily] wide. This will waste disk space, create additional costs on INSERTs and UPDATEs, and require more time (because of size) when rebuilding these index structures. So, what does narrow mean - as few bytes as possible to help uniquely define your rows. A narrow numeric when possible.

Why Static?
A clustering key should be static for some of the same reasons it should be unique and narrow. If the clustering key is used as the lookup key from all non-clustered indexes, then the clustering key is duplicated in all non-clustered indexes. In fact, for a given table the clustering key will be the most duplicated data. If this data changes then they'll need to update the value in the base table as well as in EVERY non-clustered index. And, if the key changes, it will cause the record to move. When a record moves, it creates fragmentation. This will waste disk space, create additional costs on INSERTs and UPDATEs, and require more time (because of record relocation and [the likely] subsequent splits) and require more maintenance. 

OK, so it sounds like I want a narrow, unique and static value... What about a guid?
Typically, I recommend a numeric IDENTITY column as the clustering key but I always get this question. In fact, I often wait to see how long it's going to take before I get this question ;). Anyway, a guid does meet the criteria fairly well - it's certainly unique, it's usually static and it's relatively narrow. So, what's wrong with it? In SQL Server 2000, the guid function (newid()) is built using a value that does not create an ever increasing pattern (an IDENTITY column would). But wait, I didn't say that you needed to have an ever-increasing pattern.....

OK, so the final criteria I look for in a clustering key is: an ever-increasing pattern!
If the clustering key is ever-increasing then new rows have a specific location where they can be placed. If that location is at the end of the table then the new row needs space allocated to it but it doesn't have to make space in the middle of the table. If a row is inserted to a location that doesn't have any room then room needs to be made (e.g. you insert based on last name then as rows come in space will need to be made where that name should be placed). If room needs to be made, it's made by SQL Server doing something called a split. Splits in SQL Server are 50/50 splits - simply put - 50% of the data stays and 50% of the data is moved. This keeps the index logically intact (the lowest level of an index - called the leaf level - is a douly-linked list) but not physically intact. When an index has a lot of splits then the index is said to be fragmented. Good examples of an index that is ever-increasing are IDENTITY columns (and they're also naturally unique, natural static and naturally narrow) or something that follows as many of these things as possible - like a datetime column (or since that's NOT very likely to be unique by itself datetime, identity). But wait, what about that a guid.

Well, in SQL Server 2000 the only SQL Server function for guids is newid - that does not create an ever increasing pattern. In SQL Server 2005, you can use a new guid function called newsequentialid() to populate your uniqueidentifier column. Here's an example of how you can use it:

CREATE TABLE Test
(
TestID uniqueidentifier CONSTRAINT Test_TestID_Default DEFAULT newsequentialid(),
Inserted datetime CONSTRAINT Test_Inserted_Default DEFAULT getdate()
)
go

INSERT Test DEFAULT VALUES
go

SELECT * FROM Test
go

Is there a way to create a sequential guid in SQL Server 2000?
YES, use Gert Drapers wrote an xp to generate sequential guids! Check it out here. He just published this recently (May 2005) and it's a good change for you in SQL Server 2000 databases/applications.
Full title with link: XPGUID.DLL - Sequential GUID generation and GUID helper functions XP

OK, so I've tackled a few things here today and I've answered a few questions related to indexes before. Check out my Indexes Category of blog entries here and if you're at Tech*Ed today, I hope to see you at 5PM.

Thanks for reading,
kt

I think there are numerous reasons for why I love technology but at the top of the list: learning. It's amazing to me that not a day goes by where I don't dig deeper into something or clarify it further. Even learning something trivial, like a new keystroke, can make our work easier to do and make us more productive at it. Things are constantly changing; the one thing I do know about technology is that there's a lot to learn!

So, to start, thanks for everyone's feedback (esp. Steffen Krause here) on some of my recent posts regarding LEFT and RIGHT based partitioning functions here. There was a lot feedback regarding the simplicity in syntax and declaration of a RIGHT-based partition function and by making the first partition of a RIGHT-based partition function empty, YOU'RE RIGHT! We can eliminate the need for data movement. There's no performance difference and it doesn't matter internally which type you choose but - I've heard you all loud and clear! You don't like dealing with the imprecision of a datetime data type when specifying upper boundaries. So, having said that - I need to make a few changes. In my next revision of my presentation materials, whitepaper, scripts, etc. I'll work to give both perspectives. For some reason, I still like LEFT-based partition function but RIGHT is definitely easier to define.

So, keep it coming everyone. You've hit the nail on the head. The fun part about technology is... no one knows everything and we're all always learning!

Enjoy Tech*Ed!

Blog edits brought to you by Richard Campbell (long story)

NOTE: Generally, I recommend RIGHT-based partition function so that you don't have to deal with datetime timetick issues at all. However, this post can really help you if you still want to use LEFT-based partition functions. Enjoy!!! 

When creating partitioned tables in SQL Server 2005, a partition function requires a LEFT or RIGHT designation. In general, I recommend that you choose to create a LEFT-based partition. However, choosing LEFT always creates some confusion because a left-based partition function requires upper boundaries. This results in a more complicated partition function definition (than a RIGHT partition function).

Background First
In SQL Server 2005, you can create truly Partitioned Objects (objects are inclusive to Tables and Indexes) and to create a partitioned table you must base that table on a Partition Scheme (PS) and the PS must be based on a Partition Function (PF). Now, if you're new to the concept of partitioning, you should seriously consider reading the whitepaper I wrote for MSDN which is posted here.

Regardless of your having read the whitepaper, I want to add a few basic principles and make sure that some of the key points are clear. Simply put, a PF is the logical definition of your boundary cases (or possibly better described as the intervals over which each partition covers). A PF must cover the entire domain from negative infinity to positive infinity. If a value doesn't have a place to go then you'd have a problem with the logic of your partition function. Having said that, when you define the PF you are ONLY stating values, not where they will physically will reside. Futhermore, the expression on which the data will be based has to be somewhat straightforward and consistent. In other words, simple. If the PF were extremely complex it could compromise the usage and performance of the Partitioned Object. Finally, the definition of the PF is really just a way to easily define these boundaries. As a result, you can choose to enter a specific value (like June 1, 2005) or a function like the following to create a PF over 4 months with each month being computed from current:

CREATE PARTITION FUNCTION TestFunctionValues(datetime)
AS
RANGE RIGHT FOR VALUES (
   CONVERT(char(4), year(dateadd(m, 1, getdate()))) + right('0' + convert(char(1), month(dateadd(m, 1, getdate()))), 2) + '01'
    -- next month (executed in May, 2005 yields June 2005)
   , CONVERT(char(4), year(dateadd(m, 2, getdate()))) + right('0' + convert(char(1), month(dateadd(m, 2, getdate()))), 2) + '01'
    -- executed in May, 2005 yields July 2005
   , CONVERT(char(4), year(dateadd(m, 3, getdate()))) + right('0' + convert(char(1), month(dateadd(m, 3, getdate()))), 2) + '01'
    -- executed in May, 2005 yields August 2005
   , CONVERT(char(4), year(dateadd(m, 4, getdate()))) + right('0' + convert(char(1), month(dateadd(m, 4, getdate()))), 2) + '01'
    -- executed in May, 2005 yields September 2005
)
GO

Since all of these complex expressions are evaluated to a constant at creation, you can immediately verify your boundary cases by accessing some catalog views. Here is the catalog view query which helps you see what your partition boundaries are:

SELECT PF.[name], RV.boundary_id, RV.[value]
FROM sys.partition_range_values AS RV
 JOIN sys.partition_functions AS PF
  ON RV.function_id = PF.function_id
WHERE PF.[name] = 'TestFunctionValues'

In the above partition function, you'll notice that in addition to the values, I also have a RIGHT designation. The choice of using RIGHT, helps to target whether or not the value is a member of the first partition or the second partition. In the case of RIGHT, this says that the value is a lower boundary of the second partition. In an example that involves something like "sales" where data rows are ever increasing and their usage is for more recent rather than older data you migth want to always have the most recent four months of data available (in the real world you'd probably want more than 4 months but I'm going to simplify to four so that we have fewer boundaries with which to work). So, if we want 4 partitions - one for June, July, August and September - we need to define the partitions in the form of intervals (using boundaries). The first boundary point will either reside in the first partition (as an upper boundary) or the second partition (as a lower boundary). For four partitions, there are numerous ways in which you could define the partition function such that the logical/physical placement of the data is THE SAME however, the behavior of the partition function during later SPLIT and MERGE operations is also impacted by the choice of LEFT/RIGHT so understanding how they work is critical in making your decision to use one or the other. More than anything, will need to respect the full domain from negative infinity to positive infinity.

The following partition function is one of the easiest and most logical to understand. We need four partitions so let's just enter our boundary points and be done.

CREATE PARTITION FUNCTION RightPartitionFunction1(datetime)
AS
RANGE RIGHT FOR VALUES (
   '20040601', -- Jun 2004
   '20040701', -- Jul 2004
   '20040801', -- Aug 2004
   '20040901' -- Sep 2004
)

However, remember that we have the full domain to cover. As a result, the above partition function will create 5 partitions where the partitions will be:
 Partition 1: all data less than 20040601
 Partition 2: all data greater then/equal to 20040601 and less than 20040701
 Partition 3: all data greater then/equal to 20040701 and less than 20040801
 Partition 4: all data greater then/equal to 20040801 and less than 20040901
 Partition 5: all data greater then/equal to 20040901

This really isn't too bad but it seems to waste the first partition because it's completely empty... so, you could remove the first boundary point and just supply the second, third and forth. In that case, the first partition would cover June (instead of being empty) and you could get down to 4 partitions with only 3 boundary points.

CREATE PARTITION FUNCTION RightPartitionFunction2(datetime)
AS
RANGE RIGHT FOR VALUES (
   '20040701', -- Jul 2004
   '20040801', -- Aug 2004
   '20040901' -- Sep 2004
)

This will create 4 partitions where the partitions will be:
 Partition 1: all data less than 20040701
 Partition 2: all data greater then/equal to 20040701 and less than 20040801
 Partition 3: all data greater then/equal to 20040801 and less than 20040901
 Partition 4: all data greater then/equal to 20040901

OK, so this seems a bit more optimal. And by definition alone, I would agree with you. Where it becomes problematic is in the merge and split operations. When a partition function goes through a merge of a boundary point, that boundary point is essentially removed. This seems simple but also has an impact on the data which resides in that partition (the partition which includes that boundary point). So, if this partitioned table is NEVER going to be modified and you never need to plan for a merge or split, then you can choose whatever definition is easier for you to use. However, if you plan to do a merge or split, you need to understand the ramifications of the LEFT/RIGHT decision. If November 1 comes around and you want the data for October to be added to your partitioned table and you ONLY want to show the most four months, then you will need to add October and remove June. The process will essentially consist of the following:

  1. Create a place for new data (Oct) to reside
  2. Add this location to the partition scheme (more details on the scheme v. the function shortly)
  3. Split the last boundary to add the new month (Oct)
  4. Switch-in the staging table of October data so it becomes part of the partitioned table
  5. Create a staging table into which the old data (Jun) will go
  6. Switch-out the June data into the staging table
  7. Merge the boundary point to remove June from the table

More background first - Partition Schemes
You will want to have a place to put the new data (there are lots of things we could look at here... do you want to ALWAYS only use 4 filegroups and when you remove June's data do you want to reuse that space - or do you have a new location on which October can be placed). This is a whole different topic over which we should chat... but for simplicity let's just say that we want to use a new location. To tell the PF where this location is, you need to work with the physical definition of the partition function. OK, that might not be clear. So far, I've only told you about the PF and so far, I've only said that the PF described the logical boudaries. You may have been wondering how SQL Server maps these logical boundaries to the phyiscal files and the answer is through the Partition Scheme. The partition scheme (at initial creation) sets up the mapping of the boundaries to their phyiscal locations (through filegroups). For the first and second PFs created above the respective partition schemes would be:

CREATE PARTITION SCHEME [PartitionScheme1]
AS
PARTITION [RightPartitionFunction1] TO
  ( [Primary]    -- should be empty?
  , [PartitionedTablesFG1]  -- for June
  , [PartitionedTablesFG2]  -- for July
  , [PartitionedTablesFG3]  -- for Aug
  , [PartitionedTablesFG4]  -- for Sept
  )
and/or

CREATE PARTITION SCHEME [PartitionScheme2]
AS
PARTITION [RightPartitionFunction2] TO
  ( [PartitionedTablesFG1]  -- for June
  , [PartitionedTablesFG2]  -- for July
  , [PartitionedTablesFG3]  -- for Aug
  , [PartitionedTablesFG4]) -- for Sept

In both cases, it seems as though these do the same thing with the exception that the first PF ends up with an extra partition which is essentially empty. In this case (in a RIGHT-based PF), the second is a better choice (coming up there will be a reason for why we create an empty partition).

Back to SPLIT and MERGE
OK, so back to the issue of spliting and merging. Our goal is to remove June and add October so that the "current" data shows the four months of July, Aug, Sept and Oct. If we want to create a location on which October can be placed, we need to add another filegroup to the scheme. The way we do this is that we first need to add the filegroup/file to the database and then we need to add this new filegroup to our partition scheme.

Step 1 - Create a place for new data (Oct) to reside
Create a new filegroup and new file.

ALTER DATABASE CreditPT
 ADD FILEGROUP PartitionedTablesFG5
GO

ALTER DATABASE CreditPT
 ADD FILE
  (NAME = N'CreditPTFG5File1',
  FILENAME = N'C:\SQLTemp\CreditPTFG5File1.ndf',
  SIZE = 30, FILEGROWTH = 10, MAXSIZE = 50)
  TO FILEGROUP [PartitionedTablesFG5]
GO

Step 2 - Add this location to the partition scheme
The way to do this is to add a "NEXT USED" filegroup. Syntax:

ALTER PARTITION SCHEME PartitionScheme2 NEXT USED [PartitionedTablesFG5]
GO

Once the partition scheme has a new filegroup to use (and you can only have one at a time - note that if you add a second it solely replaces the first and does not warn you that you already had a next used), then the next time a partition is needed (which will be on the next split) then this "next used" partition will be used.

Step 3 - Split the last boundary to add the new month (Oct)
Now that we have a filegroup AND we have told the partition scheme to use it for the next split operation, we can split the PS to include this 5th filegroup

ALTER PARTITION FUNCTION RightPartitionFunction2()
SPLIT RANGE ('20041001')
GO

Step 4 - Switch-in the staging table of October data so it becomes part of the partitioned table
This step really needs more to it. To switch in your staging table you need to already have a staging table. To make this an optimal operation, you MUST create your staging table on the same fielgroup on which the partition will reside. What you're trying to do is make the switch optimal by making it a metadata only operation. This step really consists of creating an identical table with identical indexes and creating it with a constraint that matches the partitions' definition (so that SQL Server can trust that it only includes data within the interval for that partition).

CREATE TABLE ChargesPTRightOct
(  charge_no  int   NOT NULL IDENTITY,
 member_no  int   NOT NULL
      CONSTRAINT ChargesPTRightOctMemberNoFK
       REFERENCES dbo.Member(Member_No),
 provider_no  int   NOT NULL
      CONSTRAINT ChargesPTRightOctProviderNoFK
       REFERENCES dbo.Provider(Provider_No),
 category_no  int   NOT NULL
      CONSTRAINT ChargesPTRightOctCategoryNoFK
       REFERENCES dbo.Category(Category_No),
 charge_dt  datetime  NOT NULL
      CONSTRAINT ChargesPTRightOctChargeDtCK
       CHECK (Charge_dt >= '20041001'
         AND Charge_dt < '20041101'),
 charge_amt  money  NOT NULL,
 statement_no int   NOT NULL,
 charge_code  char(2)  NOT NULL
) ON [PartitionedTablesFG5]
GO

ALTER TABLE ChargesPTRightOct
ADD CONSTRAINT ChargesPTRightOctPK
  PRIMARY KEY CLUSTERED (charge_dt, charge_no)
--   ON Credit4MonthPFN (charge_dt)
GO

INSERT ChargesPTRightOct (member_no, provider_no, category_no
      , charge_dt, charge_amt
      , statement_no, charge_code)
 SELECT member_no, provider_no, category_no
   , dateadd(yy, 5, charge_dt), (charge_amt + charge_no)/10
   , statement_no, charge_code
 FROM CreditPT.dbo.Charge
 WHERE month(charge_dt) IN (10)
 ORDER BY charge_dt, charge_no
GO

ALTER TABLE ChargesPTRightOct
SWITCH TO ChargesPTRight PARTITION 5
GO

Step 5 - Create a staging table into which the old data (Jun) will go
Similarly, you will want to create a place for the June data to go - kind of like a “staging” table but more for the exit of the table.

CREATE TABLE ChargesPTRightJun
(  charge_no  int   NOT NULL IDENTITY,
 member_no  int   NOT NULL
      CONSTRAINT ChargesPTRightJunMemberNoFK
       REFERENCES dbo.Member(Member_No),
 provider_no  int   NOT NULL
      CONSTRAINT ChargesPTRightJunProviderNoFK
       REFERENCES dbo.Provider(Provider_No),
 category_no  int   NOT NULL
      CONSTRAINT ChargesPTRightJunCategoryNoFK
       REFERENCES dbo.Category(Category_No),
 charge_dt  datetime  NOT NULL,
 charge_amt  money  NOT NULL,
 statement_no int   NOT NULL,
 charge_code  char(2)  NOT NULL
) ON [PartitionedTablesFG1]
GO

ALTER TABLE ChargesPTRightJun
ADD CONSTRAINT ChargesPTRightJunPK
  PRIMARY KEY CLUSTERED (charge_dt, charge_no)
--   ON Credit4MonthPFN (charge_dt)
GO

Step 6 - Switch-out the June data into the staging table
This must also be a metadata only step. You need to make sure that it's created on the same filegroup as where the data currently resides (see the ON [ParitionedTablesFG1] clause. This is the current location of this partition. As a result, the switch out will be very fast.

ALTER TABLE ChargesPTRight
SWITCH PARTITION 1
TO ChargesPTRightJun
GO

Step 7 - Merge the boundary point to remove June from the table
This last step is the whole reason for this blog entry... yes, we're getting there ;).

ALTER PARTITION FUNCTION RightPartitionFunction2()
MERGE RANGE ('20040701')
GO

When you merge this boundary point, you will remove the value AND the parition in which it resides. Since this boundary point resides in the second partition (PartitionedTablesFG2) then it's actually PartitionedTablesFG2 that gets removed. As a result ALL of the data that resides on it, must be moved to the filegroup that now covers this boundary (which is PartitionedTablesFG1 and which was just emptied). As a result, MERGE does NOT optimally merge with a RIGHT-based partition on the rolling range scenario.

And - as a result, I recommend LEFT-based partition functions... Since the whitepaper tends to focus on using LEFT-based parition functions, please review that for more details. You can access the “Paritioned Tables” whitepaper on MSDN, here.

Hope this helps and thanks for reading!
kt

For the past couple of years, the early betas of SQL Server 2005 were very limited and even the events were somewhat limited - limited primarily to large customers under NDA, MVPs, RDs, and very dedicated SQL Server specialists. So far, only the SQL Express 2005 Edition made it on microsoft.com for download (click here if you're interested in SQL Express and the SQL Express Manager downloads)...

SQL Server Beta 2 and soon SQL Server Beta 3 have had slightly wider distribution, even being handed out to SQL Server conference attendees (such as at now PAST SQL Connections conference in Orlando), etc. but in most cases you need to have your own clean machine to install on and play with and you really need to have a completely spare machine as you don't want to impact any of your production and/or otherwise critical applications. For some, this meant not installing it at all.

If you're still waiting to get your hands on it, you have a new option! The SQL team has created a virtual lab environment.

Below is the promotional blurb I received. And, I decided to check it out for a second. When you click you'll need to install an ActiveX control, register and then login. Once logged in, you'll select your lab environment and then they will “build it”. Once selected, you'll have 90 minutes to complete the lab (and you can only enter one lab at a time) and you can download/print your lab manual (pdf format) to help you along. The feel is very similar to using Virtual PC and the build that was used in the labs I entered - was the December CTP of SQL Server (IDW 11). The current beta on Beta Place (for SQL Server 2005 Beta Subscribers is the February CTP which is IDW 13). Performance in my quick look around (of menu items, SSMS, etc.) seemed pretty good but I also have a decent internet connection. I started this process but didn't actually complete a lab yet... it's certainly worth checking out! Now you don't even need to install it yourself, you can use a Virtual Lab environment. If you do check it out - good or bad - let me know!

*************************************************

Are you ready to experience SQL Server 2005?

Announcing the launch of the SQL Server 2005 Virtual Hands on labs. In these labs, you will get to experience many of the new features in SQL Server 2005 including CLR integration, XML support and deep business intelligence integration.

Just follow the link and experience SQL Server 2005 for yourself!

Registration link:
http://msdn.demoservers.com/login.aspx?group=sql2005

*************************************************

When I first heard about "stored procedures being written in C#, VB.Net, ... etc." I remember thinking; Is Transact-SQL dead? Am I out of a job? What does this mean for me and for DBAs in general? Luckily, I learned quickly that not only is Transact-SQL not dead but it's got some significant improvements that might make code written in client applications A LOT easier - for example:
    1.      New syntax to traverse hierarchical/recursive relationships,
    2.      Extended XML integration,
    3.      Much improved error handling,
    4.      Easier in-database pivoting of data

I was worried that the CLR would cause problems for the stability of the database, acting like badly written extended stored procedures which can leak memory, crashing the server or even worse, potentially become a vector for worms and viruses. However as I learned more about the technology, I think that Microsoft has put in the work to give the DBA control over this technology. SQL Server controls the memory used by the CLR, there is a fully integrated security model and the DBA has control over what the CLR code can do and can even turn off the CLR completely on systems that will not use it. In fact, I also learned that the SQLCLR was going to be "off by default" as well. And, for me, as well as many DBAs, I think these points have squashed many of our fears. But then I almost felt like I was missing something? If things didn't need to change then why are they changing?

So, then I just wanted to know more about why the extension was being made to begin with and when and why it might be useful. From considering that point alone came the most important thing I've learned about SQL Server 2005; there are many new ways of doing things: Transact-SQL; SQLCLR; XML; Full-Text; Integration Services; Analysis Services; and all of them have different benefits and implementation effects. DBAs need to learn when to choose each of the technologies and even more importantly when their use is not appropriate, even when they're not an expert in all of them (and I truly think it will be a challenge to be an expert in every area of SQL Server 2005).

SQL Server 2005's feature set has been expanded such that C# or VB.NET can natively run in the server. This gives you immediate access to the power of the .NET Framework and helps to allow better integration with existing utility components and web services already in place. Some of the greatest benefits are that it allows you to place complex computations on the server - close to the data - without having to send large amounts of data to a separate application server. And all of this is done while SQL Server manages memory and resources ensuring a more balanced server system. So, my main recommendation for DBAs who are skeptical is to not worry about knowing everything about SQLCLR but instead focus on why it might be proposed and determine if SQLCLR is really the best way to handle it!

I’m still convinced that Transact-SQL wins when interacting with SQL Server using set-based SQL commands, but if you have a stored procedure with a cursor it may be a candidate for the CLR, if you have code that loads all the data in a table to perform special aggregations it may be a candidate for the CLR, as might your table valued functions (as the CLR does a really optimized job here). To sum up: the DBA has to get a grip of these new technologies and not let the development community overturn years of best practices in database system design – the same principles apply, there is just a richer toolset that, when used wisely, should improve performance and developer productivity.

Categories:
Opinions | SQL Server 2005

OK, so I've been bad (at best) at blogging this year... but I'll blame some of that on the fact that I've done more travel in these first three months of the year then I've ever done. Yes, I've been to Charlotte, NC (for a SQL Server 2005 Ascend Workshop) then Dallas, TX (for yet another Ascend Workshop) then Cairo, Egypt (for the MiddleEast Developers Conference) then Brussels, Belgium (for Dev and ITPro Days) then San Francisco, CA (for SQLLive) then London, UK (for another Ascend Workshop) then Zurich/Interlaken, Switzerland (for TechDays) then Orlando, FL (for SQLConnections) and now I'm home for a couple of weeks (well, kind of that's another story for another day :)). But, yes, I did come back to Redmond (home) a few times in there but - it's been busy to say the least! It's really been an amazing year! From a technical perspective, I can't tell you how many exciting projects I've seen and how many interesting customer scenarios I've been presented with. Some of the planned implementations of SQL Server 2005 and even some of the changes still occuring to SQL Server 2000 implementations are incredibly interesting, challenging and innovative! And on the learning side, I've been completely burried in Yukon, x64 and Developer stuff too (like SQLCLR and C#). I've been working on new courses, new presentations, two already posted whitepapers (see “Online Resources” for links) and two more are coming... And - even more exciting, I'm also working on a couple of pre-conference seminars that will be a blast!! And, that's part of why I'm writing. There are three events coming up where the seating is limited. I thought I'd give you some of the details/highlights so if you're on the fence you can sign up before they sell out. So, let me tell you about the three coming up:

PASS 2005 European Summit
Munich
, Germany - May 10, 2005 - 9:00 am to 5:00 pm
The Next Generation DBA
Critical New SQL Server 2005 Features YOU Need to Learn!
This is a full-day lecture/demo packed session where I'll also provide demo scripts (posted on my website) and I'll spend the entire day drilling into the new features of SQL Server 2005 that offer you better data availability. But what does that translate to: Database Mirroring, new Piecemeal Restore/Recovery, fast file initialization, database snapshots, table/index partitioning, snapshot isolation and even some details regarding index-related changes. However, I have two sessions at the event that will cover Best Practices and Changes to Indexing so I'll save those details for the breakout sessions and pack more details into the availability topics. In fact, THE ABSOLUTELY MOST GEEKY demo you will ever see - will be done during this session! (Wow, I've certainly set the bar high for that demo, I hope I can live up to that expectation :).
Check out this link for more details: http://www.sqlpass.org/events/Europe/2005/Pre_Con.cfm.

Microsoft Tech*Ed US
Orlando, FL - June 5, 2005  - 1:00 pm to 6:00 pm
Taming the Tools
Effectively Using and Understanding SQL Server 2005 Tools
This is a first! We’ve decided to have a room packed full of PCs – loaded with Yukon – and we’re going to hammer out a TON of exercises for a full 5 hours. I’ll be lecturing, demo’ing and doing “instructor-led walkthrough” exercises while you get real hands-on experience with Yukon. We'll be focusing on using the tools to make sure that you'll effectively use the right tool for the job! There are all sorts of little hidden treasures in SSMS (SQL Server Management Studio), Profiler, Computer Management, etc. and I've been working with the Tools team to make sure I know about them and I will certainly let you know about them! There will even be a few additional folks present (SQL folks) to make sure that everyone stays on track, focused and that all your exercises go smoothly. And - to make things even better, I’ll be providing all of the exercises and demo scripts (and of course, printed slides) so that you can take everything home and continue to play! This is an exciting one (and definitely limited in the number of seats, hint, hint J) and the early bird discounts end on April 15 (fyi).
Check out this link for more details: http://www.msteched.com/content/precons.aspx#DBA001.

Microsoft Tech*Ed Europe
Amsterdam, the Netherlands - July 4, 2005 
- 10:00 am to 5:30 pm
Taming the Tools
Effectively Using and Understanding SQL Server 2005 Tools
OK, so this might be a tad misleading… it appears to be an exact copy of the US Tech*Ed session (at least in title) but in fact, it's not! First, the format is very different. The US session will be 5 hours – and hands-on. The European session will be lecture and demo and over 6+ hours so I’ll be adding more content, more demos and more stuff to work with. And, just like the US session, I’ll be providing all of the exercises and demo scripts (and of course, printed slides) so that you can take everything home and continue to play!  
Check out this link for more details: http://www.mseventseurope.com/TechEd/05/Pre/Content/PCTools.aspx.

And, if you're at any of these events, I look forward to seeing you there! I'm certain we're going to have some fun. Make sure to say hi and tell me you read my blog. It's always fun to meet people who read this! Oh speaking of reading this, here's a strange challenge (and probably very unlikely given the locations). If any of you attend all three... We can continue the discussions of the pre-conference seminars over dinner - in Amsterdam - on July 4th. My treat!! And - in the unlikely event that dozens of you attend all three (yeah, right) well, then beers and a not-quite-so-decadent-dinner-in-Amsterdam is on me! OK, I'm in a weird mood today... What can I say!

Finally, if you're wondering why I no longer allow comments.... ugh, blog spammers from h*ll. I'll bring back comments again after I do a bit of tweaking so I can get the blog spammers to go away!

Thanks for reading!

Categories:
Events | SQL Server 2005

Starting the week of December 6th, Microsoft will launch a series of webcasts focused on SQL Server 2005 Development. During the week of December 6th, Microsoft/MSDN will host 15 webcasts introducing all the new features of SQL Server 2005 for Developers. Then, starting in January, there will be at least one webcast per week focused on specific developer topics for SQL Server 2005.

There are a bunch of benefits to watching this series (in addition to the wealth of information provided ;):

  • Everyone who attends a webcast will receive a copy of the Beta 2 Resource Kit and the Beta 3 Resource Kit when it ships. 
  • The first 1500 people that watch 5 or more webcasts will also receive a special, limited edition SQL Server 2005 Webcast T-shirt.
  • All webcast viewers will be entered into a competition to win an XBOX, one for each day of the week. The official rules are here.

The official site with session information, dates, times and abstracts is: http://msdn.microsoft.com/SQL/2005Webcasts and if I hear of any updates I will also keep you posted on further details, etc...

Categories:
Events | Resources | SQL Server 2005

While getting ready for the PASS Community Summit in Orlando this week (where I plan to talk about SQL Server 2005 Partitioned Tables/Indexes), I realized we were close but not quite ready to release the post-beta II whitepaper on MSDN.

So, in preparation for PASS we decided to post a pre-release of the partitioning paper here.

Enjoy!

See you in Orlando... eventually?!

Categories:
Events | Resources | SQL Server 2005

I began setting up for Profiler and Performance Monitor by first launching Profiler and then launching Performance Monitor within it (there's an icon for PerfMon second from the Right or you can select Performance Monitor from the Tools Menu). I had already created a Performance Monitor Counter Log so all I needed to do was start it (more details on PerfMon Counter Logs later).

 

Tips:

·                     If you want to correlate Perf Mon events with Profiler you will need to do this from SAVED Profiler Traces and PerfMon logs. Real-time analysis is not supported.

·                     The correlation is done based on the time of the events so it is important that the times are correct. You should make sure to either:

·                     Run these on the same machine

·                     Make sure that the machines are in sync in terms of time.

·                     The correlation is a bit tough if there are TONs of events as Perf Mon’s granularity for sampling is 1 sec and Profiler can return a large number of events in a single 1 sec range… SO – filtering (and you can do this later) will be a very good idea. Regardless, this is a GREAT feature to see how certain counters (memory, disk, CPU, etc.) look over the course of long running activities.

 

If you're not familiar with creating a Performance Monitor Counter Log click here.

 

Ok, so that was mainly setup. For my first real demo I wanted to show Profiler and a few of my favorite new features. Here’s a quick list:

 

When creating a trace there are ONLY two dialogs: General and Events Selection. The General dialog is similar to the old General dialog where you choose the template, the save to file and/or table options, etc. The Events Selection dialog however, is ALL NEW. The Events Selection Dialog has a ton of new and incredibly useful changes:

  • Only data columns that produce values for each of the specific events are shown and only those data columns can be selected (check boxes). See this dialog here: EventSelectionDialog.jpg (56.51 KB). Notice that there are blank spots where an event doesn't produce a data column.. yeah!
  • Filters apply at the column level. See this dialog here: EditFilterDialog.jpg (16.74 KB). And you can state whether or not events that do NOT generate a value for the filter are sent (yeah!!!). However, I do remember having a bit of trouble with this one...
  • You can pause an active profiler trace, change the events and/or data columns and restart it… all of the additional events go to the same trace file and you do not lose the previously started trace, etc.
  • Deadlock graphing – this is an Event – and it produces a graphic display of the spids that were involved in the deadlock. You can “extract” these events (right click on the event “Deadlock Graph“ and choose “Extract Event Data“) and then open this xdl file within SQL Management Studio so you can spend more time analyzing it.

So, once setup I created a deadlock scenario, showed the profiler output for deadlock graphing, paused the trace, removed the deadlock related events (really only because I could J) and then opened another window to generate some simple activity. So – speaking of the simple activity... I created a “demotable” that had defaults for all columns and then I created an insert statement using the DEFAULT VALUES clause. To get this to execute repetitively I could have written a complex (not!) loop such as WHILE 1=1 but instead I chose to use go N (mostly because people don’t tend to know about this one). Here’s the full extent of my “code”

 

            INSERT DEMOTABLE DEFAULT VALUES

            GO 100000

 

This will cause that statement/batch to execute 100000 times.

 

Once that was running, I wanted to generate more activity to profiler so I decided to go back to show another favorite from Management Studio. In Management Studio, I had two registered servers waiting: one is a SQL Server 2000 server and the other SQL Server 2005. I connected to both. One of the databases on my SQL Server 2000 system is a schema from a Microsoft.com data warehouse. The reason why this is interesting is that this database holds 300+ tables and 1300+ views (not to mention many other objects). While this isn’t the largest database out there (at 450GB) it certainly has issues in terms of finding objects and managing your “view” of what you’re trying to manage. So – in the SQL Server 2005 Management studio you’re able to create “Filtered” Views. By right clicking on Views, you can create a name filter. For example, all of their user objects have a certain naming convention which includes “WMU”, all of their rollup counts have the word “count” and the counts refer to a dimension of time (weekly, monthly, etc.)… So, if I want to see all of their user related views with counts by week, I can create this filter WMU%Count%Week. Using this filter, the object list drops from 1300 views down to only 12. And – this demo was using the SQL Server 2005 Management Studio against a SQL 2000 server. Nice.

 

So - now with a bunch of activity generated I can go back over and get the data to correlate. I stop the trace. I stop the Profiler Log...

 

I re-open the profiler trace (only because this would likely to be done later... I'm not sure if I've ever just stopped it and then integrated perfmon... that should work?). Anyway, only AFTER the entire trace file is loaded with the menu option to integrate perf mon counters become available. Choose “Import Performance Data“ from the File dropdown and wham - IF they have corresponding times you will see how the times match (remember, if this is on two machines and the times aren't in sync then the correlation could be virtually meaningless). Also, if they don't intersect you'll get an error: Correlation is not possible because there is no intersection between trace and performance data time ranges. Not bad!

 

If the two DO correlate in terms of time, then you'll get a new dialog which will show all of the performance monitor counters that are IN the perfmon log. The idea here is to “limit“ the number of counters to correlate. In fact, the dialog is called “Performance Counters Limit Dialog.“ OK, so my log has only a few counters so I'll just select them all. Click Ok. Once clicked your performance monitor dialog will show both - profiler data and perfmon data. Here you can select points in time and see what performance events occurred or choose profiler events and see what the perfmon counters look like. Here's what it looks like:ProfilerPerfmon.jpg (167.51 KB). You can also highlight chunks of the graph and expand them to show only a specific time range. You do this by clicking in the graph and then drag the mouse to create a rectangle.

 

OK, so in summary - I REALLY like this BUT I think you'll need to play with this a bit. I think it's a great feature but the range of values for correlation for Performance Monitor only being 1 second makes it a bit tough to see exactly what event caused what spike (if you have thousands of events per second in Profiler) BUT - it's good to get you close!!! And much closer than anything we've had before!

 

Have fun.

kt

Categories:
Events | Resources | SQL Server 2005

One of my favorite features of Management Studio is the ability to “script” a step rather than execute it. Within SQL Management Studio, each dialog has 4 different options from which to choose (these are including the typical OK/Cancel) but also add a couple more at the top of the dialog:

 

At the top of the dialog

  • Schedule – which will create an Agent job.
  • Script – which will send the command(s) to a new Query Window in Management Studio

At the bottom of the dialog (the norm):

  • Run it right now – with the OK button
  • Cancel it – fairly obvious, eh?

The “Script” option has to be one of my favorites. You can learn from this dialog, you can cut/paste and help to build more complex scripts, you can just see what the heck the UI was going to do, etc… and well, that’s what led to SQLCMD. Something I wanted to do (from just a practical perspective) was create a simple/fast way to connect to my existing SQL Server 2000 machine, backup a database and then restore it to my SQL Server 2005 machine. One of the benefits of backup/restore is that SQL Server will upgrade the database on restore (no special settings required - this is default behavior if you restore a SQL Server 2000 backup to SQL Server 2005). Although finding the correct syntax for each step is easy through the UI, I wanted to take the overall process one-step further and create a SQLCMD-specific script to automate it (yes, I kind of like the idea of a command line tools for batch/automation, etc.. and yes, I could have also done this with DTS or the SQL Agent... good idea for another blog entry! Hey guys (and you know who you are!) you should do the comparable DTS/Agent jobs?! I'll be waiting. J J J).

 

Anyway, the primary goal is to leverage some of the new capabilities of SQLCMD (which I have to admit – if you’re a command line/automation person – this is going to quickly become your favorite new features).  Here’s a very quick summary of SQLCMD:

 

  • Replaces OSQL (and then some!)
  • Can set a variety of environment variables
  • Offers a -A parameter for the dedicated admin connection (only one window available for this at a time and the usage here is when you can’t otherwise connect to your server – for whatever reason. To ensure that they can always use this they have kept resources available so that this should always work. I use “should” because well, I rarely use always (or never)… J
  • Adds –v for variable input from the command line but also supports variables set as environment variables (via SET) or variables set within the SQLCMD script (SETVAR).
  • Allows connections to be specified within the script via:
        
    :CONNECT servername
    NOTE: the semi-colon is required
  • Has a few new error handling options such as
        
    :ON ERROR EXIT
    NICE: this uses RAISERROR and when a severity of 11 (I think) or higher is raised you can define what happens (EXIT, IGNORE or RETRY n) to control script flow and retry behavior (yeah!)
  • And with the exception of a few new (yet VERY cool things like this) SQLCMD is really just TSQL plus some very nice automation stuff. (I can hear TSQL geeks cheering...)

So – I created a script that does this and here’s the play by play of it:

 

I don't want to support system databases for backup/restore. You could certainly do a backup and then restore to a new name... but that will have to be in version 2.

:ON ERROR EXIT
go

 

IF '$(DB)' IN ('master', 'model', 'msdb', 'ReportServer', 'distribution', 'tempdb')
BEGIN
 RAISERROR ('System Databases are NOT supported. Script Terminated.', 16, -1)
END
go

Next, I wanted to create a connection to my 2000 server (this could have easily been a parameter!)

:CONNECT servername

 

BACKUP DATABASE $(DB)
 TO DISK = '\\servername\sharename\path\$(db).bak' -- NOTICE YOU CAN USE VARIABLES WITHIN STRINGS!
 WITH INIT
PRINT ' '
go

NOTE: Be sure to set a GO after each batch that includes changes in connections, etc. The idea is that much of the SQLCMD code (per batch) is evaluated and executed first. Without the go I would have ended up in the next connection - executing the backup. Here's a quick example of what I mean:

-- Example 1

:CONNECT SERVER1
EXEC PROC1


: CONNECT SERVER2
EXEC PROC2

-- In Example 1, both procedures execute on SERVER2

 

:CONNECT SERVER1
EXEC PROC1
go

 

-- Example 2

: CONNECT SERVER2
EXEC PROC2
go

-- In Example 2, PROC1 executes on SERVER1 then PROC2 executes on SERVER2 - as desired.

After the database is backed up I connect to the SQL 2005 server and drop the existing database... you may or may not want to do this so blindly.

:CONNECT servername

SET NOCOUNT ON

 

IF DATABASEPROPERTYEX('$(DB)', 'COLLATION') IS NOT NULL
 DROP DATABASE [$(DB)]

 

WAITFOR DELAY '$(DELAY)' -- this was just for testing...

Next, I need to determine the SQL 2005's Data Path as I didn't want to assume C:\Program Files\...

To do this I'm using an undocumented xp (cause I wanted to see if it still works? it does!)

DECLARE @SQLPath nvarchar(512)
EXECUTE master.dbo.xp_regread 'HKEY_LOCAL_MACHINE',
            'SOFTWARE\Microsoft\MSSQLServer\Setup',
            'SQLPath', @SQLPath OUTPUT

Next, I do some cool stuff to create the correct locations for the new data and log file names. This was the hardest part really... What if the database has multiple files - and the pathes are NOT the same as the server from which the database was backed up. Well, now I need to do a restore with MOVE. Have you seen the syntax for RESTORE with MOVE... it's not overly pretty. Anyway, I first create a table into which I will store the output of LOAD FILELISTONLY (so I can interrogate the backup device and see the filelist of the database which was backed up):

 

CREATE TABLE #BackupFileList
( LogicalName sysname NULL
 , PhysicalName sysname NULL
 , [Type] char(1)
 , FileGroupName sysname NULL
 , Size bigint
 , MaxSize bigint
 , FileId smallint
 , CreateLSN numeric(25,0)
 , DropLSN numeric(25,0)
 , UniqueId uniqueidentifier
 , ReadOnlyLSN numeric(25,0)
 , ReadWriteLSN numeric(25,0)
 , BackupSizeInBytes bigint
 , SourceBlockSize bigint
 , FileGroupId  smallint
 , LogGroupGUID uniqueidentifier
 , DifferentialBaseLSN numeric(25,0)
 , DifferentialBaseGUID uniqueidentifier
 , IsReadOnly bit
 , IsPresent bit
)

Next, I get the info from the backup device (again, notice the parameters being used even in Dynamic String Execution!)

INSERT #BackupFileList
 EXEC('LOAD FILELISTONLY FROM DISK = ''D:\SQLDemo\$(db).bak''')

Now I reverse the physical name to find only the “file“ part of the name... reversing it to find the first string up to a backslash (from the end). Then I extract that string and reverse it back... and - at the beginning of the string, I add the local server's data path.

UPDATE #BackupFileList
 SET PhysicalName = @SQLPath + N'\Data\' + REVERSE(SUBSTRING(REVERSE(PhysicalName), 1, PATINDEX('%\%', REVERSE(PhysicalName)) -1))

Now I'm ready to walk the filelist and build the RESTORE with MOVE database string. This completes the script!

DECLARE @LogicalName sysname
 , @PhysicalName sysname
 , @ExecStr   nvarchar(max)

 

DECLARE FileListCursor CURSOR FAST_FORWARD FOR
 SELECT LogicalName, PhysicalName
 FROM #BackupFileList

 

OPEN FileListCursor

 

FETCH NEXT FROM FileListCursor INTO @LogicalName, @PhysicalName

 

SELECT @ExecStr = N'RESTORE DATABASE $(DB)' +
       N' FROM DISK = ''D:\SQLDemo\$(db).bak''' +
       N' WITH MOVE ''' + @LogicalName + N''' TO ''' + @PhysicalName + N''''
 
FETCH NEXT FROM FileListCursor INTO @LogicalName, @PhysicalName

WHILE @@FETCH_STATUS <> -1
BEGIN
 SELECT @ExecStr = @ExecStr + N', MOVE ''' + @LogicalName + ''' TO ''' + @PhysicalName + ''''
 FETCH NEXT FROM FileListCursor INTO @LogicalName, @PhysicalName
END

 

-- SELECT @ExecStr

EXEC (@ExecStr)

 

DEALLOCATE FileListCursor
go

I set my parameters on the call to the script.

If you're interested in reading more about somewhat stressful demos for the BI Team read here.

Categories:
Events | Tips | SQL Server 2005

Theme design by Nukeation based on Jelle Druyts