In the last posting, I talked about the differences between DACFx 3.0 and previous versions from an API point of view. This time, I'll look at the files in a DACPAC and see how those differ.

DACPAC really stores a database model, that can be used to recreate the database and server (3.0 only) objects in the model, in an XML format. Note also that I've seen it written the a DACPAC is a simple zip file and you can unzip it to produce the same files that the "Unpack" Windows explorer shell extention (i.e. right-click on a DACPAC on a machine where DAC is installed and you'll see an "Unpack" context menu entry. This is not exactly true, as unzipping a V1.1 DACPAC on my system produces the following files:
 DacMetadata.xml
 LogicalObjectStream.xml
 PhysicalObjectStream.xml
 [Content Types].xml

And using the Unpack extention produces:
 DacMetadata.xml
 LogicalObjectStream.xml
 PhysicalObjectStream.xml
 model.sql

Note that the "unpacker" uses the object definitions in the XML files to generate a SQL script for the model. To stay on the V1.1 DACPAC for a minute, if the DACPAC has been produced using the VS2010 DAC Project, additional files may be present. These include:
 Script.PreDeployment.sql
 Script.PostDeployment.sql
 TargetSelection.xml

Pre and post deployment scripts are intuitively named. TargetSelection refers to the file the VS2010 DAC Project produces if you use a SQL target selection policy. With this project type, you can specify (editable using the ServerSelection.sqlpolicy node in the Solution Explorer for the project) which servers this DAC should be installed on. Choices include Collation, IsCaseSensitive, OSVersion, and more. Note: TargetSelection seems to be missing from the SSDT projects.

With an DACFx 3.0 DACPAC, different files appear. The "unpacker" will produce:
 DacMetadata.xml
 Origin.xml
 model.xml
 model.sql 

Note that, just with pre-3.0 versions, no .sql file appears in the unzipped version, so the .sql script is actually produced by the "unpacker". The model is strictly XML-based until unpacked. As interesting aside is that in DACFx 3.0, unpacking is supported by the API, using the Unpack method in the DacPackage or BacPackage classes. These methods support unpacking 3.0 or pre-3.0 DACPAC/BACPAC formats. The equivalent code in the pre-3.0 unpacker couldn't possibly know about future versions.

So what do these files contain?
 DacMetadata.xml - The name and version of the "application" that this DACPAC represents. Note that in 3.0, an XML namespace is added.
 Origin.xml (3.0 only) - Some metadata about where this DACPAC came from (which version, which tool, etc).

So the difference really amounts to the distinction between LogicalObjectStream.xml and PhysicalObjectStream.xml (pre-3.0) and model.xml (3.0). These files contain an XML model of a database. In appears that, in DACFx 3.0, the two XML files were consolidated into one. And all the additional database and server objects that DACFx 3.0 represents were added.

Finally, let's take a look at the model files. DACPAC pre-3.0 refers to two XML schema, in both the "Logical" and "Physical" model files. These are:
 http://schemas.microsoft.com/sqlserver/ManagementModel/Serialization/2010/11
 http://schemas.microsoft.com/sqlserver/RelationalEngine/Serialization/2010/11

I was not able to find a current version of the XSD files at the location implied by the namespace name (that's not a requirement of the XML schema spec, but sometimes it works), but I was able to find an older version of the ManagementModel schema at the schema.microsoft.com website.

In DACFx 3.0, there is reference to a single XML schema in the model.xml and DacMetadata.xml files:
 http://schemas.microsoft.com/sqlserver/dac/Serialization/2012/02

This schema appears at the location that matches its namespace name. And two additional XML schemas also exist related to DACFx 3.0. They are:
 http://schemas.microsoft.com/sqlserver/dac/DeployReport/2012/02
 http://schemas.microsoft.com/sqlserver/dac/DriftReport/2012/02

So, we've just taken a little spin through the items in a pre-3.0 and 3.0 DACPACs. The XML files and the information contained in those files have evolved greatly in DACFx 3.0. This is likely due to the integration of SQL Server Database Projects from VS2010 and the DAC Project in VS2010 SP1 into a single project type and format, that of SQL Server Data Tools, which subsumes both types of project. As for the concept of using DAC to manage "Deployment target" systems (i.e. the TargetSelection.xml file) depending on its utility and acceptence for use in database/application management, maybe this will return one day.

@bobbeauch

Looking at the API and at the serialized form (i.e. the DACPAC), it turns out that DACFx 3.0 is not just "DAC V-next". It's an entire quantum change from all other versions, including DAC 2.0, the version that introduced the BACPAC (serialized schema and data). This has some interesting repercussions with compatibility. Let's start with the "native" API (meaning, programming over the DLL directly, I'll leave the REST API used by SQL Azure Portal's Import/Export for another time).

The DACFx 3.0 API lives in a different DLL than previous versions of DAC. The old DAC (2.0 and below) lives in Microsoft.SqlServer.Management.Dac; DACFx 3.0 lives in Microsoft.SqlServer.Dac. The API is quite different with respect to classes and methods then the one I wrote about in SQL Server Magazine's Jan 2012 issue. Instead of using the DacStore and DacExtractionUnit classes from previous versions, most everything has been refactored into the all-new DacServices controller class. This new controller class contains most of the methods from the original DAC (albeit with different names like Deploy, Extract, Register, Unregister) along with new methods that generate a drift report and methods to Export and Import the BACPAC (schema and data) format. The one method that's been dropped in the new API is the Uninstall (unregister and drop database) method.

One other difference is that DACFx 3.0 can work with DACPACs and BACPACs from previous DAC versions, but can only emit DAC 3.0 formats. The upshot of this is you can't use a DACPAC from SQL Server Data Tools or SSMS 2012 with earlier versions or tools.

The nice improvement we get from the DACFx 3.0 format is support for almost every object in SQL Server 2005-2012. In addition, when using the 3.0 APIs directly, or with SSDT, there are infinitely more options for working with DACPAC, encapsulated in the DacDeployOptions or DacExtractOptions classes. These options are exposed by SSDT as project properties, and its command-line utility, SqlPackage as command-line parameters as well. Some random examples would be the deploy option to "BlockWhenDriftDetected" or the extract option to "IgnoreUserLoginMappings".

You can read the documentation for the new classes in Microsoft.SqlServer.Dac here.

@bobbeauch

I can script out a database schema (and optionally data) from SQL Server databases. Any version from 2005 and beyond, as well as SQL Azure Database. And most all instance level objects.  And I'm not the SMO scripter object. Or a DBA. Who am I?

I would be DACFx 3.0. And I can do some things (unlike SMO scripter, or a busy DBA) like script out deltas (schema only), given any two databases or DACPACs. Or any combination of SSDT project, database, and DACPAC. 

When DAC (that's data-tier applications, not dedicated administrator connection) was introduced back around SQL Server 2008 R2, there were a subset of "most likely candidate" database objects it could script. This increased over time to include all SQL Azure database objects. But in DACFx 3.0, which appears in SSMS 2012, SSDT (along with its command-line brother sqlpackage.exe), and soon the Azure portal, there is almost full-fidelity up to SQL Server 2012. There are still some rough edges being worked on (along with enhancements), both in the functionality and in the integration with SSMS, as you might expect.

SSMS and SQL Server's PowerShell provider are both built with SMO, and SMO's functionality roughly follows what SSMS supports (e.g. before SMO 2012 there was no SSMS/SMO support for extended events; now there is, in both places). SMO is much, much more granular in its API... think "almost every SSMS dialog is built over SMO". DACFx providers a discrete set of (less than 10) utility functions. But in SSMS 2008 R2, 2008 R2 SP1, and 2012, DACFx makes its presence felt. For example in SSMS 2012, we have, in SSMS Object Explorer:
 
Databases Tab:
    Deploy (from DACPAC)
    Import (from BACPAC)
Individual Databases/Tasks:
    Extract (to DACPAC)
    Deploy Database to Azure (Migrate via BACPAC)
    Export (to BACPAC)
    Register (to MSDB, to master on SQL Azure)
    Upgrade (existing registration and schema)
    Unregister

In SSMS, these functions are fairly rigid, i.e. there aren't many options, failure to comply with the "default options" can cause failure. But with SQLPackage.exe and SSDT and the DACFx API, a plethora of options exist (albeit not as many as SMO scripter, but some of scripter's options are for code formatting). These options live in the DacDeployOptions and DacExtractOptions classes in the API (that is Microsoft.SqlServer.Dac.dll) and are exposed as command-line switches in SQLPackage.exe or properties in the SSDT project properties page. And documented in the SQL Server documentation. 

If you don't want to install SSDT on your machine, you can download DACFx 3.0 as part of the SQL Server 2012 Feature Pack. It's called "Microsoft® SQL Server® 2012 Data-Tier Application Framework".

More to come...

@bobbeauch

After my last adventure, I thought it would be interesting to try the "upgrade path" with SSDT and older "Database Projects" (i.e. Visual Studio for Database Professionals, or whatever the last name was before we went to the nice SSDT acronym, I always called them "Data Dude" projects after Gert Drapers, the original Data Dude).

Note first-off that when you have both VS2010 and SSDT projects installed into Visual Studio, there are two main menus, one called "SQL" and the other called "Data". In case you can't guess, "Data" is the one that goes with the "older" projects (it contains an entry for "Data Compare", a feature that SSDT doesn't have yet). And "SQL" is the SSDT project menu. There's a few "dualing toolbars" too; luckily for me, I'm not a big toolbar user in VS.

At least in VS2010 the Database Projects were always version-specific in name so I know that I can't target SQL Server 2012 in a "SQL Server 2005 or 2008 database project". Easy. Opening one of these projects brings up a "Convert to SSDT" dialog. This works as advertised, just realize that there are some "artifacts" that are not converted because the corresponding feature doesn't exist in SSDT yet (e.g. Data Generation Plans). Backing up the old project is a good idea in any case. See this page for a list.

Finally, there are Data Tier Application projects. These support SQL Azure and the subset of database objects that SQL Azure supports. You can create a DAC Project in Visual Studio after the SSDT tools are installed. Interestingly (but expected), you can't import database objects from a SQL Server 2012 database, even if the database only contains the subset of objects that DAC supported in DAC V2.0, you're presented with an "Unable to connect to database" message. Surprisingly, if you set your SQL Server 2012 database to Version 90 or 100 compatibility mode (thats SQL2005 or SQL2008 compatibility), creating a DAC project and importing the database now works! (if it contains the right subset of objects). And of course the project type would work (at least at this point in time) with a SQL Azure database or importing a V2.0 (but not V3.0) DACPAC. Bringing up an existing VS DAC project brings the (at this point) familiar "Convert" dialog. Bear in mind that DAC projects could have some of the same unsupported artifacts as Database Projects. 

Got it? So now you can co-exist with both/all sets of SQL database projects, database-version willing. There are some nice new features in SSDT that make it compelling to convert, and the "catch up" features may be added soon, because SSDT will be updated "on a SQL Azure cadance". That's once every 3-6 months. And, come Visual Studio 11 (available in CTP/beta now), there will ONLY be SSDT projects. So...Enjoy!

@bobbeauch

I'm the kind of person that doesn't often/always use Visual Studio "autodeploy" (that is, the "Deploy Solution" option) for SQLCLR projects. It's always been missing things (like ALTER ASSEMBLY) and never quite kept up with the newer things you could do with SQLCLR (e.g. ordered table-valued functions, multiparameter aggregates). But every once-in-a-while, especially while I'm teaching and writing example assemblies on-the-fly, its a useful way to get things cataloged fast without writing DDL.

So when I installed SQL Server 2012 and the SQL Server Data Tools on top of an existing VS2010 installation, I was not entirely surprised that, when opening a SQLCLR project, I was prompted to "convert to an SSDT project or leave it alone". Either choice promised me that "I wouldn't be prompted again" but I didn't want that. So I simply cancelled out of the dialog.

However, VS2010 SQLCLR projects autodeploy *doesn't work* against a SQL2012 database. Ever. Even with the simplest projects. A profiler trace of the proceedings looks like some of the DDL it generates is incompatible with SQL Server 2012. Hrmph.

So now I choose the "convert" option, and expect to see a full-fledged SSDT project in Solution Explorer after the conversion. I don't. Looks like I have approximately the same project. Not so. Looking at the project properties gives me all the SSDT properties, and using "Add/New Item" on the project gives the entire list of (85 or so) items that you can add to an SSDT project. So I DO have a full-fledged SSDT project. Does not mean I'll have to import the rest of the database objects to get "autodeploy" (there's no "Deploy Solution" but there is a Publish... option) to work??

Let's try this on the pubs database (I can always re-create the pubs database if something goes wrong). And "Publish" simply publishes the assembly, leaving the rest of the database undisturbed. Turns out that is what I want, so...great.

Now, I've made the moral equivalent of a SQL Server 2012/VS2012 SQLCLR project by conversion. How do I make a new one? There no choice with SSDT but "Other Languages/SQL Server/SQL Server Database Project". Made an empty one of those. So let's make an assembly. Add/New Item...and the SQLCLR items don't appear in the list of all new items. But there is a branch off the tree for "SQL CLR" and "SQL CLR C#". The "SQL CLR" is VB.NET. So first, I need to add an assembly info file, if I want one of these. So far, so good, but there are no referenced assemblies other than .NET 4.0's System.Dll (the properties window, SQLCLR tab, show Assembly Info and fill in the dialog does the same thing). It looks like it's smart enough to build a .NET 4.0 target because my target database in project properties is SQL Server 2012. Now let's add a SQLCLR UDF (the default template for this no longer returns "Hello", but String.Empty, hmmm...). And I do get System.Data and System.Xml added to the references. And Build and Publish works. BTW, during the "Publish" process, you also have the ability the generate a publishing script rather than publish in real-time. That's nice.

So, I'll go through of compare the generated DDL later and see if I can do anything in SQLCLR 2012 that SQLCLR 2008 autodeploy (Publish/Deploy) can't do. For now, I at least have a way to replicate the functionality of SQLCLR projects. That's works with a SQL Server 2012 (and 2008 and 2005, though I didn't try each one, I'll take them at their word, for now) database. Whew! But, once you've converted the original VS2010 project (unless you choose, "save old project" during conversion) you can't go "home" anymore... You're in SSDT-project-land now.

@bobbeauch

I recently completed another update to the whitepaper, now entitled "SQL Server 2012 Security Best Practices - Operational and Administrative Tasks" to cover best usage of the security features in SQL Server 2012. And, concurrently with the launch of SQL Server 2012, it was available on the web a few days ago. Many thanks to the reviewers. Enjoy!

@bobbeauch

Categories:
Security | SQL Server 2012

Hope that you'll find time in your schedule to attend DevWeek 2012 in London, March 26-30. I'll be doing a preconference talk, an entire day of SQL Server 2012 for Developers. In addition, I present some breakouts on SQL Server security best practices, Extended Events and the new profiler, Mission-critical SQLCLR and other relevent topics. Here's hoping to see you there, drop by and say hi!

@bobbeauch

Categories:
SQL Server 2012

Back when I was saw SQL Azure was producing an @@version string that mentioned version 11 (that's SQL Server 2012's major version number), I'd started looking at the version number every week or so. And looking for SQL Server 2012 functionality in SQL Azure. See the blog post "SQL Denali T-SQL features in SQL Azure now" for a list of the first functions supported. I'm not going to list them all again here. And Ed Katibah's (a.k.a Spatial Ed) latest list of SQL Server 2012 spatial functionality in SQL Azure.

This week, I'd thought to look at the SQL Azure version again. I get:

Microsoft SQL Azure (RTM) - 11.0.1831.30
 Jan 27 2012 23:11:55
 Copyright (c) Microsoft Corporation

This is different (newer) than when I previously looked, so its time to look for new T-SQL 2012 functions again. And to find some. 13 of the 14 new T-SQL scalar functions (all of them except TRY_CONVERT, I wonder why that one didn't make it) are in. That would be: FORMAT, PARSE, TRY_PARSE, IIF, CHOOSE, CONCAT, EOMONTH, and the new "date-from-parts" series.

In addition, the T-SQL 2012 updated version of the T-SQL LOG function that allows a logarithm's base as a second argument works in SQL Azure as well.

There's also one more new function that isn't mentioned in the books online (I hit it by mistake when I thought I was somehow mistaken about there being a TRY_CONVERT and it works in SQL Server 2012 RC0 as well), that is TRY_CAST. It's analogous to TRY_CONVERT, as CAST is analogous to CONVERT, modulo CONVERT's expression and style arguments.

This isn't a complete implementation of SQL Server 2012's T-SQL improvements in SQL Azure, though. SQL Azure is still waiting for:

1. Sequences
2. UTF-16 collations with supplimentary characters
3. Windowing (OVER clause enhancements, LAG/LEAD and others, new analytical functions)

I didn't notice any change in spatial support moving more towards to SQL Server 2012 feature set this time, maybe Ed noticed something.

Well, a bit at a time, I guess. Welcome the new T-SQL functions to SQL Azure.

@bobbeauch

I've said before how thrilled I am will the new extended event UI in SSMS for SQL Server 2012. However, you might be one of the early adopters who made up their own scripts to define extended event sessions, and use hardcoded scripts to harvest the results. So, you may run into this problem if you're using what I call "lazy XML" in the event harvesting script. Take, as an example, an extended event session defined as follows with 3 actions:

create event session errorsession on server
add event sqlserver.error_reported
(
action
(
package0.callstack,   
sqlserver.session_id,  
sqlserver.sql_text    
)
where error = 547 and package0.counter <= 3 
)
add target package0.ring_buffer
with (max_dispatch_latency = 1 seconds)
go

In previous versions, you could pretty much depend on the XML presenting the actions in order. So the following XML would return a subset of the event information in rows and columns.

SELECT
  Data2.Results.value ('(data/.)[1]', 'int') AS ErrorNumber,
  Data2.Results.value ('(action/.)[2]', 'int') AS Session,
  Data2.Results.value ('(action/.)[3]', 'nvarchar(max)') AS SQLStatement
from
(
select CAST(xet.target_data as xml) as data
from sys.dm_xe_session_targets xet
join sys.dm_xe_sessions xe
on (xe.address = xet.event_session_address)
where xe.name = 'errorsession') events
cross apply Data.nodes ('//RingBufferTarget/event')
  AS Data2 (Results)

The XML above is based on the assumption that action 2 is the sqlserver.session_id and action 3 is the sqlserver.sql_text, because it doesn't check names, just blindly uses the ordinal number in XPath. It makes the XML easier to write and a bit faster to execute, but its "lazy" XML. This order of actions was a valid assumption in SQL Server 2008; its not valid in SQL Server 2012. The data fragment containing the action data appears like this:

    <action name="sql_text" package="sqlserver">
      <type name="unicode_string" package="package0" />
      <value> 
        DELETE pubs.dbo.jobs
      </value>
    </action>
    <action name="session_id" package="sqlserver">
      <type name="uint16" package="package0" />
      <value>53</value>
    </action>
    <action name="callstack" package="package0">
      <type name="callstack" package="package0" />
      <value>..callstack elided...</value>
    </action>

So now, sql_text (the third action defined) is the first action presented. The fragile harvesting script will break. So, to be one the safer side, if you have any such scripts change them to actually look for the element for want using a named XPath predicate, like this:

SELECT
  Data2.Results.value ('(data[@name="error_number"]/.)[1]', 'int') AS ErrorNumber,
  Data2.Results.value ('(action[@name="session_id"]/.)[1]', 'int') AS Session,
  Data2.Results.value ('(action[@name="sql_text"]/.)[1]', 'nvarchar(max)') AS SQLStatement
from
(
select CAST(xet.target_data as xml) as data
from sys.dm_xe_session_targets xet
join sys.dm_xe_sessions xe
on (xe.address = xet.event_session_address)
where xe.name = 'errorsession') events
cross apply Data.nodes ('//RingBufferTarget/event')
  AS Data2 (Results);

Note that it's probably better to do this with the "data" items too (ErrorNumber in the query above), although you may be a little safer with these, as they have a defined schema per-event. But BOL does point out that events have a "versioned" schema. The actions can be defined in any order, so make your scripts more robust. Use XPath predicates. And if you're not on SQL Server 2012 yet, change your scripts now.

@bobbeauch

OK, back to PathName() with AlwaysOn, which I started on in the previous blog post. PathName() in SQL Server 2012 doesn't return the computer name by default ("AlwaysOn1" or "AlwaysOn2" in my example) but returns the VNN name (virtual network name). That is, it returns the availability group "listener share" name. In my case, the PathName() would start with \\AlwaysOnAG1. There's an additional option in SQL Server 2012 PathName(), that allows you to return the current replica name. So, when "AlwaysOn1" is the current primary replica, it returns that; when we fail over to "AlwaysOn2", that's what name is returned. In addition, GetFileNamespacePath() and FileTableRootPath() always return the availability group listener name, they don't have an option to return the current replica name.

Finally, I'll discuss using AlwaysOn, FilePath(), T-SQL access and secondary replicas.In theory, ADO.NET 4.02 (and, of course, SQLNativeClient 11ODBC/OLE DB and Microsoft JDBC 4.0 driver) allows you to declare your ApplicationIntent as ReadOnly or ReadWrite. ReadWrite is the default. Adding either ApplicationIntent=ReadOnly or ApplicationIntent=ReadWrite in the SSMS Connection dialog is accepted, but either one allows connection to the secondary replica. This is because, by default, secondary replicas are configured with "Readable Secondary=Yes" rather than "Readable Secondary=Read-intent". Configuring a secondary as "Readable secondary=Read-Intent" disallows connections that do not specify ApplicationIntent=ReadOnly in the connection string. Obviously, connecting as "ReadWrite" to a ReadOnly replica errors out if you try and do something other than read. Check out Sunil's excellent blog post on the motivations for these secondary replica settings.

Now that we're connected (in T-SQL) to the secondary replica, let's try and get the PathName() from here:

SELECT Name, file_stream.PathName() FROM NWTab;

Msg 33447, Level 16, State 1, Line 1
Cannot access file_stream column in FileTable 'NWTab', because FileTable doesn't support row versioning. Either set transaction level to something other than READ COMMITTED SNAPSHOT or SNAPSHOT, or use READCOMMITTEDLOCK table hint.

Now, this was surprising...I checked the snapshot info in sys.databases and also DBCC USEROPTIONS. Neither snapshot isolation level was turned on or being used. And setting the iso level to, say, repeatable read or even the dreaded read uncommitted didn't help. But using the READCOMMITTEDLOCK did. Hmmm...

But of course, Sunil had the answer for this one too..."To avoid this blocking completely, all transaction isolation levels used in an application on the secondary replica are mapped transparently to Snapshot Isolation..." And this behavior is more obliquely documented in the BOL, here. But Sunil's blog post does go on to say "You may ask how about locking hints? Well, all locking hints are ignored." But the READCOMMITTEDLOCK locking hint does work in this case (modulo the effects it might have on performance, as Sunil mentions). So at this point I wonder if the READCOMMITTEDLOCK locking hint on a FileTable has a special dispensation (other locking hints, do, in fact, fail to have the desired effect), but I'll need to revisit this in 2012 RTM. BTW, it's only using the file_stream column of the FileTable that causes this, other columns work fine. But PathName() and GetFileNamespacePath() are methods on the column, so you can't use these, or access the column in T-SQL, on readonly secondaries without the hint.

Back to BOL comment, changing the NON_TRANSACTED_ACCESS to READ_ONLY (on the primary replica) changes the behavior on the secondary as promised, but now I can't write to the FileTable using the file system. As expected.

Two more things to mention. Firstly, Msg 33447 isn't a mistaken error message. You can, of course, attempt to access the file_stream column in a FileTable from the *primary* replica using Snapshot isolation. Then you'll receive the error, and changing the iso level WILL help. When enabling read-committed snapshot or snapshot you DO get this useful (but slightly misleading, IMHO) message on the ALTER DATABASE DDL statement: "When the FILESTREAM database option NON_TRANSACTED_ACCESS is set to FULL and the READ_COMMITTED_SNAPSHOT or the ALLOW_SNAPSHOT_ISOLATION options are on, T-SQL and transactional read access to FILESTREAM data in the context of a FILETABLE is blocked.". That's not quite true; if the options are on and you *USE* snapshot iso levels, you'll get the error. Changing the iso level to a non-snapshot iso level or using the READCOMMITTEDLOCK hint WILL succeed on the primary.

Finally, remember that "ordinary" (i.e. non-FileTable) tables containing filestream columns DO support snapshot isolation levels as of SQL Server 2008 R2. It should be only the FileTable, with its non-transacted access, that has this behavior with snapshot isolation on the primary and readonly secondaries. And, in fact, this does test out to be the case, an "ordinary" filestream column works fine, even in the same database.

Enough? ;-) I even turned moderated blog comments back on (against my better judgement, the amount of spam comments is amazing), if this is a useful discussion vehicle. Or, you know where to find me... Cheers.

@bobbeauch

When I was covering the FileTable-specific functions and methods, I didn't mention PathName(), a filestream-related function, that (naturally) can be used with FileTables. This posting is about PathName(), but, mostly about what happens with FileTable in an AlwaysOn availability group configuration. There were some surprises there. None of the FileTable-specific functions and methods mention AlwaysOn, but PathName() mentions it. There is an additional parameter you can specify that matters if you use PathName() in an AlwaysOn availability group environment.

Remember that, in addition to FileTable, there have been enhancements to filestream in SQL Server 2012. Namely, you can have multiple filestream containers (filespecs) per-filestream filegroup. This has an effect on the paths returned by PathName(). In SQL Server 2008/R2, PathName() returns a path that looks something like this:

\\ZMV08\MSSQLSERVER\v1\Northwind\dbo\Employees2\Photo\DF707B77-9FA4-4837-91C6-363E23BCEBDD.

In SQL Server 2012, an analogous request would produce a path that looks like this:

\\ZMV04\MSSQLSERVER\v02-A60EC2F8-2B24-11DF-9CC3-AF2E56D89593\FTTest\dbo\Documents\file_stream\D46814E2-633D-E111-982F-000C2959648F\VolumeHint-HarddiskVolume1.

So, for at least one reason, there's a new "version" of filestream namespace names (I'm just looking at the v01 vs. v02 part of the pathname).

So let create an AlwaysOn Availability Group setup to try it out with FileTable and PathName(). Availability group is a new feature in SQL Server 2012 that appears, at first glance, to be an big extension to the database mirroring feature. An availability group can contain multiple databases, and up to 4 secondary replicas, in addition to the primary replica. The secondary replicas can be (optionally) read-only. Using availability groups requires using Windows Failover Clustering. And, perhaps because of this, availability groups support filestream storage (and by extension, FileTables). Availability groups also can have "listeners", separate endpoints that abstract connecting to an availability group.

So I've defined two SQL Server instances, each a default instance. On machines named "AlwaysOn1" and "AlwaysOn2" (disclaimer: I never proported to have an imagination for making up names). My availability group is named "AlwaysOnAG1" and encompasses the pubs and northwind database. 'Cause we all know sample databases need to be highly available. ;-) For reference, the availability group is using synchronous-commit mode and is set up for automatic failover (which means it could do automatic, manual, or even forced failover, if need be. Both nodes are configured to be read-only when they are the secondary node. I also have a listener set up named "AlwaysOnAG1" that has a hardcoded IP and listens on port 7001.

I've decided to put a FileTable on Northwind. And, from a previous experiement, I have a FileTable on a database named "FTTest". This database does NOT participate in the availability group and only exists on the "AlwaysOn1" instance, not on "AlwaysOn2". But it produced one of the surprises.

After enabling Northwind for FileTables (AlwaysOn1 is the current primary), I define a FileTable named "NWTest". With this configuration:
 --When SSMS connected to "AlwaysOn1"
    --The share opens from SSMS Object Explorer
    --Machine "AlwaysOn2" has an MSSQLServer share, but nothing is visible underneath.
    --Using the share \\AlwaysOn1\mssqlserver, I see both Northwind's filetable and FTTest's filetable.
 

--When SSMS connected to "AlwaysOn2"
    --Machine "AlwaysOn2" has an MSSQLServer share, but nothing is visible underneath.
    --Attempting to open the share from ObjectExplorer AlwaysOn2 node produces error: "The file location cannot be opened. Either access is not enabled or your do not have permission for the same."

--When Connected to the "listener share" (i.e. \\AlwaysOnAG1\mssqlserver) I see both Northwind's filetable and FTTest's filetable. This was surprising to me because FTTest is not a member of the availability group and I'm using it's "listener share" name. I can see both FileTables even when I open the listener share from a different machine on the network (like the "AlwaysOn2" machine). Just an observation...

The file system and SMB are more tolerant of temporary outages than SQL Server connections. But that's a subject for another blog post. So, using the "listener share" I create open/edit close Somefile.txt in Notepad. Now, open the file again. Make some changes in the text. Now failover the availability group from "AlwaysOn1" to "AlwaysOn2"...

During failover, I get the "Network resource unavailable" message for my Notepad. Then, as failover completes, the message goes away. Now I can save my changes. Without incident.

So, after failover, what do things look like?
--When SSMS connected to "AlwaysOn1" (now secondary)
   --The share receives an error from SSMS Object Explorer.
   --Machine "AlwaysOn2" has an MSSQLServer share, with Northwind FileTable subdirectory.
   --Using the share \\AlwaysOn1\mssqlserver, I see FTTest's filetable (this database is not in the AG, remember), but not Northwind's FileTable subdirectories.
 

--When SSMS connected to "AlwaysOn2"(now primary)
   --The share opens from SSMS Object Explorer
   --Using the share \\AlwaysOn2\mssqlserver, I see Northwind's FileTable subdirectories/files.
 

--When connected using the listener share
   --I see only Northwind's FileTable subdirectories/files.

Notice that this is only observations with one specific configuration of the availability group. No guarentees as to what happens with other configurations.

OK, long-ish post again. But this post began with discussion of PathName() method on the filestream column. I'll conclude with that next.

@bobbeauch

So finally, we get to using FileTable's path_locator column with the methods of hierarchyid. To pick up where we left off, remember that hierarchyid has a method named GetLevel(). Given the following directory structure in the FileTable's share:

File1.txt
File2.txt
SQLFiles
  SubdirFile1.txt
  SubdirFile2.txt

I have a few more levels of subdirectory here, but hopefully, you get the idea. Issuing the query

SELECT path_locator.GetLevel() as Level, Name
FROM dbo.Documents;

show, unsurprisingly, File1.txt, File2.txt, and SQLFiles directory at level 1, and the subdirectory files at level 2. So to be sure we're getting the "right" SQLFiles directory, we could change that query in the previous blog entry to:

SELECT @pathstring = path_locator.ToString() FROM dbo.Documents WHERE Name = 'SQLFiles' AND path_locator.GetLevel() = 1;

Note also that, the way the FileTable namespace works, the dbo.Documents directory is the root of the hierarchy. We can prove it using hierarchyids.

-- Level Number of table is 0. Table PathLocator is root
SELECT FileTableRootPath('dbo.Documents') as RootPath,
       GetPathLocator(FileTableRootPath('dbo.Documents')) as Path,
       GetPathLocator(FileTableRootPath('dbo.Documents')).GetLevel() as Level;

As long are we're on that subject, let's use the rest of the hierarchyid methods on the FileTable as well.

-- an interesting way to get all direct children of the top-level SQLFiles directory node
SELECT p.name as ParentName, c.name as ChildName, c.is_directory,
 c.file_stream.GetFileNamespacePath() as ChildPath
FROM dbo.Documents p
JOIN dbo.Documents c
ON p.name = 'SQLFiles'
AND p.path_locator.GetLevel() = 1
AND c.path_locator.GetAncestor(1) = p.path_locator;

-- but we have a persisted computed column for that...
SELECT p.name as ParentName, c.name as ChildName, c.is_directory,
 c.file_stream.GetFileNamespacePath() as ChildPath
FROM dbo.Documents p
JOIN dbo.Documents c
ON p.name = 'SQLFiles'
AND p.path_locator.GetLevel() = 1
AND c.parent_path_locator = p.path_locator;

How about all children of that node, at any subdirectory level?

-- child of the top-level SQLFiles node, any
SELECT p.name as ParentName, c.name as ChildName, c.is_directory,
 c.file_stream.GetFileNamespacePath() as ChildPath
FROM dbo.Documents p
JOIN dbo.Documents c
ON p.name = 'SQLFiles'
AND p.path_locator.GetLevel() = 1
AND c.path_locator.IsDescendantOf(p.path_locator) = 1;

And of course, wind our way up the tree:

-- parents of a specific node
SELECT p.name as ParentName, c.name as ChildName, p.is_directory,
 p.file_stream.GetFileNamespacePath() as ParentPath
FROM dbo.Documents p
JOIN dbo.Documents c
ON c.name = 'SQLSubdirSomeText.txt'
AND c.path_locator.IsDescendantOf(p.path_locator) = 1;

And, we can see the obvious use of GetReparentedValue. It moves a file from one subdirectory to another. You can't just set parent_path_locator because that's a computed column.

DECLARE @oldpath hierarchyid, @newpath hierarchyid;
SELECT @oldpath = path_locator from documents where name = 'SQLServerSubdir' AND path_locator.GetLevel() = 2;
SELECT @newpath = path_locator from documents where name = 'SQL Server' AND path_locator.GetLevel() = 1;
UPDATE dbo.Documents SET path_locator = path_locator.GetReparentedValue(@oldpath, @newpath)
WHERE name = 'SubSyntheticFile2.txt';

Enjoy!

@bobbeauch

So the functions/methods that I wrote about in previous post are needed because FileTables don't store the UNC path name of the file, they store the path_locator as a hierarchyid data type. Wonder what encoding scheme they're using. Let's see, by doing

SELECT path_locator.ToString(), Name
FROM dbo.Documents

We get hierarchyid strings that look like this: "/192992825631153.73945086322524.2119705196/" Turns out that the encoding scheme involves newid() as you can see by looking at the definition for the default constraint for the path_locator column. It looks like this:

convert(hierarchyid, '/' +    
convert(varchar(20), convert(bigint, substring(convert(binary(16), newid()), 1, 6))) + '.' +
convert(varchar(20), convert(bigint, substring(convert(binary(16), newid()), 7, 6))) + '.' +
convert(varchar(20), convert(bigint, substring(convert(binary(16), newid()), 13, 4))) + '/')

In fact, almost all of the columns in a FileTable have defaults or are computed columns. So, to create a row in a FileTable, let's say, a file named "Testfile1.txt" in the root of the file share, all that's required is:

INSERT INTO dbo.Documents(Name, file_stream) VALUES('Testfile1.txt', 0x);

But, if I have a directory named "SQLFiles" at the root? How do I create a file in that directory? You might think parent_path_locator, but that's a computed column. Well, how about...

SELECT @pathstring = path_locator.ToString() from documents where name = 'SQLFiles';
SET @newpath = @pathstring +  convert(varchar(20), convert(bigint, substring(convert(binary(16), newid()), 1, 6))) + '.'
                           +  convert(varchar(20), convert(bigint, substring(convert(binary(16), newid()), 7, 6))) + '.'
                           +  convert(varchar(20), convert(bigint, substring(convert(binary(16), newid()), 13, 4))) + '/';

INSERT INTO dbo.Documents(Name, path_locator, file_stream) VALUES('SQLFilesTest.txt', @newpath, 0x);

You could also just use "@pathstring + '1/'" but this sticks to the "native" encoding scheme. And no, newid() can't be used in a function, if that's what you're thinking.

Now that I've got a zero-length file, can I open it and edit it with Notepad.exe? Well, you can if its empty. But once there is data in the file, attempting to edit with Notepad.exe returns an error, "This request is not supported". This is because Notepad.exe uses memory-mapped files, a win32 feature which isn't supported by FileTable. Paint.exe uses them too. So you'd need to open it from a remote location. Actually, connecting the share as a network drive will work as well.

But what if there are directories named "SQLFiles" at different subdirectory levels? How can we distinguish between them? That's where the hierarchyid comes in. I'll continue with that next.

@bobbeauch

I've been working with the SQL Server 2012 FileTable feature lately. Besides learning to appreciate the esoteric features of the NTFS file system and SMB protocol, only some of which are supported by FileTables, I've been trying to work with FileTables in SQL Server using T-SQL. This turns out to be an interesting exercise, especially if you're trying to brush up on your skills with the hierarchyid data type.

It turns out that a FileTable is just like a normal SQL Server table with a filestream varbinary(max) column (named, unsurprisingly, file_stream). It uses computed columns and constraints rather extensively. Behind the scenes, SQL Server functions as a Win32 namespace owner and exposes a virtual file share. Although the namespace management and fitting non-transactional access into a transaction-based system, and its easy to think of the share as a real file share, SQL Server manages everything. And you can manage the FileTable data completely with T-SQL. This one's for the folks that think *everything* ought to be managed in T-SQL. But before trying to create and manage files with T-SQL, I looked into the FileTable specific functions, GetFileNamespacePath(), FileTableRootPath(), and GetPathLocator(). These will be useful and they have some interesting options.

GetFileNamespacePath() is exposed as a method on the file_stream column.Besides being used without options, it has two options. The first option allows you to get a full namespace path instead of a relative path. Relative path is the default. So if, for example, you have a FileTable named "Documents" on a database named FTTest on a machine named ZMV04, your full path for a file at the root named Testfile1.txt will be "\\ZMV04\MSSQLSERVER\FTTest\Documents\Testfile1.txt", and relative path would be "\Documents\Testfile1.txt". A second option lets you decide whether you want a NETBIOS name, Machine name, or Domain name in your path.

FileTableRootPath() is a global function. It produces slightly different names for the root depending on whether or not you use the first parameter, the name of the filetable. Without this parameter, FileTableRootPath() doesn't include the part of the path that includes the filetable. So, in our case its "\\ZMV04\MSSQLSERVER\FTTest". When the filetable name is specified this function returns "\\ZMV04\MSSQLSERVER\FTTest\Documents. This option makes a difference when using this method to construct full paths, as we'll soon see. There's a second option for path using NETBIOS name, Machine name, FQDN is analogous to GetFileNamespacePath().

Finally there is another global function, GetPathLocator(). This one was kind of strange, even though the use case provided by the BOL was clear enough. You'd use it if you have existing (pre-FileTable) SQL Server tables that use path locators. Path locators, if you're unfamiliar with the term, refer to storing a path name as a column value in a table. You'd use that name, stored in the database, to locate the file to open and read/write. The drawback to traditional path locators is that its up to you to keep the file location in the database in sync with the filesystem. And, of course, the files that file locators point to have no intergrated backup/restore with the database. With FileTables, management is taken care of by the database engine, along with coordinated backup and restore and other nice database features.

The twist is that FileTables don't directly expose file paths, although you can derive them using our first two functions. FileTables use the hierarchyid data type, stored in a column path_locator. The GetPathLocator() function can get you a path_locator (hierarchyid) given a valid string data type FileTable-produced path name. So, if I have a file "TestFile1.txt" in the root of a FileTable named "Documents", I can get its path locator hierarchyid by using GetPathLocator(file_stream.GetFileNamespacePath(1)). Of course, you can simply use the column value in the path_locator column, but this function may come in handy later on.

This posting has gone a little long, so I'll continue it in a subsequent posting.

@bobbeauch

... Continued from previous blog entry ... 

The point of using a Server Audit Specification with database events in SQL Server 2012 is this. In SQL Server 2008, audit is an Enterprise-only feature. Only available in Enterprise, Evaluation, and Developer Edition. In SQL Server 2012, one of the new features (to quote BOL) is: "Support for server auditing is expanded to include all editions of SQL Server. Database audits are limited to Enterprise, Datacenter, Developer, and Evaluation editions." (Note: there's no DataCenter edition any more after the recent licensing change).

So there IS a SQL Express 2012 RC0 available. Let's download it and try it out. [Starts download. Waits 5 minutes. Starts install. Done in 5-10 minutes. Nice].

There is no GUI for Server Audit and Server Audit Specification, even when you load up an Express edition into a Enterprise SSMS Object Explorer. But who needs a GUI...we can use DDL and use sys.fn_get_audit_file to read the audit log.

USE master
GO
CREATE SERVER AUDIT TestAudit2 TO file (filepath = 'c:\temp')
WHERE database_name ='AdventureWorks' AND schema_name ='HumanResources'
  AND object_name ='EmployeePayHistory' AND database_principal_name ='dbo';

CREATE SERVER AUDIT SPECIFICATION TestServerSpec2
FOR SERVER AUDIT TestAudit2
    -- ADD SELECT ON HumanResources.EmployeePayHistory BY dbo) -- note: you can't do this in SERVER AUDIT SPEC
    ADD (SCHEMA_OBJECT_ACCESS_GROUP)
    WITH (STATE = ON);
GO

As my friends in the UK would say: "Works a treat!". And Books Online is correct, attempting to define a Database Audit

Specification in SQL Express returns:

Msg 33075, Level 16, State 3, Line 1
Granular auditing is not available in this edition of SQL Server. For more information about feature support in the editions of SQL Server, see SQL Server Books Online.

So you can't do without Database Audit Specs is audit Database Level audit actions. Like INSERT, SELECT, or DELETE. So maybe you're not *supposed to* be able to specify action_id in a Server Audit Specifcation filter predicate. Although you can't specify these (action_ids in predicate filters) for Server-level objects either. Even in Enterprise edition.

Still, having seen auditing with SQL Express and only SERVER AUDIT filters, it's a MUCH more powerful and compelling feature than I'd imaged when I first heard of it. Especially after seeing folks try to shoehorn auditing using CDC. Or change tracking. Or use triggers on every action. And BTW, you are limited to 3000 characters in a filter predicate, although they seem to allow AND/OR/NOT and the standard comparison operators. No "LIKE" capability, but you don't get that it SQL Server auditing anyhow. 

In any case... enjoy auditing in SQL Express! And Standard Edition. And BI Edition. And no, I haven't tried it on LocalDB yet.

@bobbeauch

Categories:
Security | SQL Server 2012

I've always been pretty "standard" in my approach to SQL Server's auditing feature. That is, Server Audit Specifications are for auditing server-level objects and Database Audit Specifications are for auditing database-level objects. There have always been a few "Audit Action Groups" that pertain to database objects that could be specified in Server Audit Specifications. An example of this would be the Audit Action Group SCHEMA_OBJECT_ACCESS_GROUP that will audit a schema-level object use of an object's permission. Or SCHEMA_OBJECT_CHANGE_GROUP that will audit a CREATE/ALTER/DROP against a schema-level object. This has been the case back to SQL Server 2008, when the auditing feature was first introduced.

The drawback of using Audit Action Groups like SCHEMA_OBJECT_ACCESS_GROUP in a Server Audit Specification is that they audit access to ANY schema object in ANY database. Server-level actions do not allow for detailed filtering on database-level actions. This is going to generate a LOT of audit records.

Enter SQL Server 2012. One of the new auditing features allows predicates to be specified on the CREATE SERVER AUDIT level. This allows much more granular auditing using Server Audit Specifications. The way that this works is to use predicates on the SERVER AUDIT object associated with a SERVER AUDIT SPECIFICATION.

Here's an example. Here's a SQL Server 2008 way to audit SELECT on the HumanResources.EmployeePayHistory table in AdventureWorks by dbo:

USE master
GO
CREATE SERVER AUDIT TestAudit1 TO file (filepath = 'c:\temp');

USE AdventureWorks
GO
CREATE DATABASE AUDIT SPECIFICATION TestDBSpec1
FOR SERVER AUDIT TestAudit1
    ADD (SELECT ON HumanResources.EmployeePayHistory BY dbo) WITH (STATE = ON);
GO

Fire up these objects (run ALTER SERVER AUDIT TestAudit1 WITH (STATE = ON); ) and you're auditing. But in SQL Server 2012, you can do (approximately) the same thing with this:

USE master
GO
CREATE SERVER AUDIT TestAudit2 TO file (filepath = 'c:\temp')
WHERE database_name ='AdventureWorks' AND schema_name ='HumanResources'
  AND object_name ='EmployeePayHistory' AND database_principal_name ='dbo';

CREATE SERVER AUDIT SPECIFICATION TestServerSpec2
FOR SERVER AUDIT TestAudit2
    -- ADD SELECT ON HumanResources.EmployeePayHistory BY dbo) -- note: you can't do this in SERVER AUDIT SPEC
    ADD (SCHEMA_OBJECT_ACCESS_GROUP)
    WITH (STATE = ON);
GO

Note that they're not EXACTLY the same. The Database Audit Spec is auditing only SELECT, while the filtered predicate doesn't filter on action_id (the action_id for SELECT is 'SL'). This *may* be a bug, reported on Connect currently. Although the action_id is defined as CHAR(4), specifying a string causes an error: "Msg 25713, Level 16, State 23, Line 1 The value specified for event attribute or predicate source, "action_id", event, "audit_event", is invalid.". Using a number for action_id works (and using action_id is supposed to work according to BOL) but there are no specifications of numeric action_ids. Even in sys.dm_audit_actions.

Although maybe you're not *supposed to* be able to filter on action_id, and the error message text phrasing is just a red herring. Because...

(This entry is getting a bit long, but I'll tell you what the VERY interesting point of this exercise is in the next blog entry. Stay tuned.)

@bobbeauch

Categories:
Security | SQL Server 2012

When I installed CTP3 of SQL Server 2012 (on Windows Server 2008 R2 OS), I noticed that the "Service SID account" (known as the Managed Service Account) was directly available in the setup dropdown box, selected it, wrote a blog entry mentioning it, and went on. Lately, I've been looking at the local Windows groups (or lack of them).

Remember when, starting in SQL Server 2005, SQL Server would create local Windows Groups for service account and plunk the appropriate user into the group? And you assign permissions to the group? Well, in SQL Server 2012, it (almost) never does that any more.

I've done a few sample setups of various services to check, and the only local Windows Groups for service account that are even created are the Windows Groups for SQL Browser service and Analysis Services. No other services (and I've installed Database Engine, FDHost, Reporting Services, VSS Writer) use the "local Windows group created at startup" any more. And, thanks to the fact that SQL Server 2012 only runs on OSes where Service SIDs are allowed, the SSAS and Browser groups contain only the Service SID.

As far as I can see (I've not tried every configuration possible) when running on OSes where Service SIDs are available (e.g. Windows Server 2008 SP2/Windows Vista SP2 and above), SQL Server has used (enabled) them and plunked them into the Windows Group rather than using the group membership of the service account user you select at startup. It's only in Windows Server 2008 R2 and Windows7 w/SQL Server 2012 that Managed Service Accounts can be specified as the service account in SQL Server. And SQL Server 2012 can use Virtual Accounts as service accounts too. This is all doc'd here in the SQL Server 2012 Books Online.

And, we're almost rid of those local groups. Check here for the well-known service SIDs to assign permissions to.

@bobbeauch

Categories:
Security | SQL Server 2012

As I mentioned in a few previous blog entries, SQL Server 2012 runs the .NET 4 version inside of SQL Server. Your SQLCLR code may be affected by some behavior differences between .NET versions 2.0/3.5/3.5 SP1 and .NET 4.0. I like the way SQL Server chose to deal with these differences.

Although the documentation on these differences indicate that they can be handled in the configuration file, there is no sqlservr.exe.config file, and I'd suggest that you keep it that way, at least for now, until using/changing a config file with SQL Server is officially supported. See an old blog entry here for an official response to this. I've seen nothing indicating this information is changed so far.

Instead, how you handle these changes is difference-specific and requires no config file (or changes to machine.config; SQL Server doesn't change this either). Here's the differences and documented ways to handle them.

1. In .NET 4.0, user code does not catch Access Violation (and other corrupted state) exceptions by default. You could catch these exceptions in previous versions of .NET. To continue to catch these in SQL Server 2012, use the System.Runtime.ExceptionServices.HandleProcessCorruptedStateExceptionsAttribute attribute on the method that contains the exceptions catch block.

2. In .NET versions previous to 4.0 Code Access Security (CAS) policies are always enabled. In .NET 4.0, CAS is not enabled by default. SQL Server uses CAS as part of its permission_set implementation, and so SQL Server always enables CAS.

3. .NET 4.0 introduces differences for .NET's internal sorting (because it uses Unicode version 5.1), and .NET 4.0 throws an Invalid Format Exception on format errors in System.TimeSpan previous versions didn't. Although you probably won't use .NET's internal sort ordering in SQLCLR enough to notice the difference, System.TimeSpan is the .NET type that corresponds to SQL Server's TIME data type. So watch out for format error of you use TimeSpan in your SQLCLR code and update the code appropriately.

Although these last two (Unicode sorting and TimeSpan) can be overriden in a config file, SQL Server 2012 has a different, nicer, implementation. If the Database Compatibility Level is 110 (SQL Server 2012), the new behaviors are in effect. Setting Database Compatibility Level lower than 110 reverts to the old compatibility level.

There's example code in the .NET books online that illustrates these last two differences. I used these to verify that setting the database compatibility level works as advertised and that it works whether your TargetFramework version (your code) is .NET 3.5 or 4.0 (i.e. your TargetFramework has no effect in this behavior within SQLCLR code).

Now, on to investigate .NET 4.0 features that can be used to improve SQLCLR code. I've already noticed that .NET 4 supports ETW events, so you could combine XEvent ETW target trace with a .NET 4 ETW trace to determine when garbage collections are occuring, for example. More later perhaps...

@bobbeauch

Categories:
SQL Server 2012 | SQLCLR

In SQL Server 2012, SQLCLR now loads .NET 4.0 code rather than .NET 2.0 as in previous versions of SQL Server.

Note that the "main system" assemblies (e.g. mscorlib.dll, System.Data.dll) in .NET did not appear as a separate directory, between version 2.0 and 4.0; instead, the .NET 2.0 librares were updated. As an example, SQL Server 2008 loads the .NET 3.5 SP1 versions of the "main system" assemblies, but these are numbered as version 2.0.50727.xxxx. On my machine the 2.0 version of mscorlib.dll is actually version 2.0.50727.5420. The "original" version 2.0 mscorlib.dll (that shipped with SQL Server 2005 RTM) was 2.0.50727.42. See one of my previous blog posts for more information on .NET versioning in general. Even through this post references clients, the same versioning (in .NET 2.0-3.5) applies to SQLCLR as well.

Moving SQLCLR up to version 4.0 allows collection of some additional per-appdomain diagnostic information. In SQL Server 2012, this information appears in 3 additional columns in the DMV sys.dm_clr_appdomains. You can now monitor, on a per-appdomain basis, total_processor_time_ms, total_allocated_memory_kb (all memory allocations made since the appdomain was started, without subtracting garbage-collected memory) and survived_memory_kb (memory that survived the last full blocking garbage collection, referenced by the current appdomain).

I've been looking at these and gotten one particularly interesting result, so far. The master.sys[runtime].1 appdomain (the one where spatial data types and other system CLR items run) shows total_allocated_memory_kb = 0 and survived_memory_kb = 356 immediately after SQL Server 2012 starts up. It looks odd to have "surviving" memory for an appdomain that hasn't reported any allocated; I'll have to look more closely at the allocations to figure out how that could be.

Running the following code:
declare @a geometry = 'POINT(1 1)';

loads the spatial library (Microsoft.SqlServer.Types.dll) and results in a more believable total_allocated_memory_kb = 543, survived_memory_kb = 356.

This additional information should be helpful for DBAs and developers troubleshooting SQLCLR behaviors.

@bobbeauch

Categories:
SQL Server 2012 | SQLCLR

Just had the occasion to look at SSIS Sources and Destinations in SQL Server 2012 RC0. After reading about the stated direction to use ODBC for SQL Server/relational data access over OLE DB and writing up my own blog post about it, I wondered how long it would be before a native ODBC source and destination appeared in SSIS. Didn't have to wait long...its in RC0.

Following the dialogs for these components (and the new ODBC Connection Manager), they seem to use native ODBC, not the OdbcDataProvider in ADO.NET as in previous versions. The SSIS team blog mentions that they are third-party components (from Attunity). I wonder if this will be the case post-SQL Server 2012.

Although the Source Assistant and Destination Assistant appear to route SQL Server connectivity through the SQL Native Client OLE DB provider for now, and the SQL Server Destination still exists, folks that are using SQL Server with SSIS might give the ODBC Source and Destination a try. Check out the speed and functionality vs. the OLE DB Source/Destination. ODBC has been named the supported "native" API for relational data in the future.

@bobbeauch

Last week, while teaching a class based around SQL Server 2008 R2, I happened to "brag up" the new tools in SQL Server 2012, mainly the Extended Event graphic user interface in SSMS and the ability of Database Tuning Advisor to tune from the plan cache. This, of course, immediately brought up the question: "Do the new tools/features work with pre-SQL Server 2012 instances". So, I had to try it.

Database Turning Advisor's "tune from plan cache" feature works just fine with a SQL Server 2008 R2 database. I'd assume that this means it works against SQL Server 2008, not so sure about SQL Server 2005, but the plan cache hasn't changed that much since 2005. It's worth a shot. Don't have a 2005 instance handy.

The Extended Event graphic user interface, however, was another story. This node just didn't appear when connecting SQL Server 2012 RC0 SSMS to a 2008 R2 instance. Since I knew that the Extended Event GUI just used new SQL Server 2012 SMO XEvent classes, I tried writing a simple PowerShell script to get a XEStore class against a SQL Server 2008 R2 instance. The error message "Operation not supported on version 10.50.2500" appeared. And, of course, a similar error message appeared when attempting to use the PowerShell SQL Server provider XEvent "directory". Strange, as most SMO classes are supported against older versions of SQL Server, some of them back to SQL Server 2000.

Well, one out of two ain't bad, I guess. Enjoy tuning from the plan cache.

@bobbeauch

This post is about the changes in the client stack for SQL Server 2012. It's a very short post. The last two major releases of SQL Server (2005 and 2008) corresponded to .NET 2.0 and .NET 3.5 SP1 respectively. In addition, each one corresponded to a new version of SNAC (SQL Native Client, that is, ODBC and OLE DB). The JDBC driver followed, but trailed a bit behind.

Newer clients have needed to be installed to support newer features. Table-valued parameters, new data types (including XML and UDTs), MARS, and filestream file access are a few features from previous (2005,2008) versions that come to mind.

SQL Server 2012 ships with a new SNAC provider/driver, named "SQL Native Client 11.0" according to the ODBC control panel applet. The new feature changes are documented in the SQL Server 2012 Books Online. In addition, .NET version 4.02 shipped just recently to add SQL Server 2012 support. There are three changes, according to the corresponding knowledge base article. The main features correspond to the Always-On functionality in SQL Server.

There are two new connection string keywords:

ApplicationIntent
This can be ReadOnly or ReadWrite(default). When ReadOnly is used, applications can directly connect to a read-only secondary server, or can be automatically redirected to secondary servers in an availability group.

MultiSubnetFailover
This keyword can be Yes or No(default). When Yes is specified, the client supports fast failover connectivity for geographically-distributed topologies through faster detection of and connection to the (currently) active server.

In addition, the .NET 4.02 supports connection to LocalDB (the standalone, local version of SQL Server used by the Visual Studio Data Tools) by the developer. Can't find any reference to this for SNAC 11, but I'd suppose LocalDB is supported with these APIs too.

One new SQL Server 2012 feature I wrote about before that I didn't find in ADO.NET 4.02 was support for the correlated server/client trace using BID tracing and XEvents. Jerome Halmans showed this in his SQLPASS summit talk and this will be supported in ADO.NET 4.5 (the version of ADO.NET that shipped as part of the Windows 8 beta). And correlated trace IS currently supported with SNAC 11.

@bobbeauch

Since SQL Server Denali adds support for index compression of spatial indexes, using the same options as data and index compression in SQL Server 2008 and above. As opposed to what I expected at first (because spatial indexes consist of only a few small fixed-length columns), using compression with spatial indexes appears to be quite effective. With my cursory tests, I obtained between 25-40% reduction in size with ROW compression and 50-90% reduction in size with PAGE compression. Queries that use a compressed spatial index execute in about the same (in some cases, less) CPU and Elapsed time as the same queries using uncompressed spatial indexes. As spatial indexes can be quite large depending on the MAX_CELLS_PER_OBJECT and density settings, and the makeup and size of the spatial data set being indexed, compression of spatial indexes (which is restricted to SQL Server Enterprise and above editions) is a good space-saving choice.

@bobbeauch

After the last post on new warnings in SQL Server Denali Showplan, just for the fun of it, I had a look at the XML version of the showplan. And saw some fairly amazing new things I hadn't noticed before.

Everyone's been looking into the new "Batch Execution Mode" with the new ColumnStore index feature. The the new "WindowSpool" operator for use the new Windowing enhancements (listed under "Enhancements to the OVER clause" in BOL's What's New). But somehow I'd missed these (raw XML included even though these do show up in the Query Plan Properties Windows when using the appropriate iterator)

<QueryPlan DegreeOfParallelism="0" NonParallelPlanReason="MaxDOPSetToOne" MemoryGrant="1344" CachedPlanSize="24" CompileTime="2" CompileCPU="2" CompileMemory="264">
  <MemoryGrantInfo SerialRequiredMemory="512" SerialDesiredMemory="1344" RequiredMemory="512" DesiredMemory="1344" RequestedMemory="1344" GrantWaitTime="0" GrantedMemory="1344" MaxUsedMemory="1344" />
  <OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="39308" EstimatedPagesCached="3416" EstimatedAvailableDegreeOfParallelism="1" />

Optimizer Hardware Dependent Properties? Really? I'm sure I'd never seen that before. Nor thought that SQL Server even considered the "EstimatedPagesCached" when creating a plan. NonParallelPlanReason would also be a big hit, perhaps. In fact, I do want to see why some of my spatial query plans don't use parallelism when spatial indexes are involved. Just what I needed (and no, it wasn't that I set MaxDOP to 1).

Intruiged, I took a look at the XSD schema for Showplan in Denali (did you know there is an XSD schema for Showplan? It's under the SQL Server directory under "...110\Tools\Binn\Schemas"). And did a quick diff with the XSD schema for Showplan in SQL Server 2008 R2. All the new elements and attributes I've mentioned here (and more) are there in the new schema (which retains the same XML targetNamespace but does add an attribute "version=1.2". Nice. There's too many interesting items to list here, and some of them don't seem to be implemented, at least in the plans I'm looking at. Some of these look like they're for special hardware configurations.

I'm impressed by just those two new elements in the XML above. And the new "NonParallelPlanReason" attribute. Maybe I'm just easy to please...

@bobbeauch

Sort and hash warnings can be found using SQL Server profiler, but they don't appear in query plans like "Columns With No Statstics" warnings in SQL Server 2008 R2 and below. In SQL Server Denali, they DO appear in query plans. No profiler needed. I'm not sure if all the pertainant events from the trace events "Errors and Warnings" (in Denali, I'd actually look at Extended Events Category Errors and Category Warnings), but I have seen some of them.

SORT iterator warning
 Operator used tempdb to spill data during execution with spill level 1

HASH MATCH (Aggregate) warning
 Operator used tempdb to spill data during execution with spill level 1

The ever-popular Index Seek Iterator warning
 Columns With No Statistics: [database].[schema].[table].Column

JOIN Iterator (appears pre-Denali too)
 No join predicate

And I've also seen some warnings in Denali query plans that don't appear in Extended Events Errors/Warnings. These appear on the SELECT Iterator:

Type conversion in expression (CONVERT_IMPLICIT(nchar(13),[database].[schema].[table].Column,0)) may affect "CardinalityEstimate" in query plan choice,
Type conversion in expression (CONVERT_IMPLICIT(nchar(13),[database].[schema].[table].Column,0)=[@1]) may affect "SeekPlan" in query plan choice

And indeed it did affect the plan. Drastically.

Also in the SELECT iterator:
 SpatialGuess - True

I've been troubleshooting a spatial query plan, and this one has me baffled, so far. I assumed that it meant that the QP couldn't sniff the value of a spatial parameter, but I pass it in trying sproc parameter and using sp_executesql and still the warning persists. Hmmm...

I'm not sure you can get 'em all, so if you happen to see some that I haven't yet seen (especially if it DOESN'T appear as a trace/extended event error/warning), just drop me a line.

@bobbeauch

Earlier this month I noticed that SQL Azure, with the latest upgrade has a version of 11...something (actually 11.0.1467.26, to be exact). And I wondered, aside from the long-touted new spatial library, if there were any Denali T-SQL enhancements that were now available. That is, what version of "version 11" is this? I did do some cursory tests of spatial functionality, but Ed Katibah (aka SpatialEd) will be posting the long list of spatial feature tests directly, I don't want to steal his thunder on those. But here's the tests for Denali T-SQL features, tested against SQL Azure, today.

Working in SQL Azure, now:
1. ORDER BY with OFFSET and FETCH
2. THROW (adjunct to TRY-CATCH)
3. Metadata discovery system procedures (sp_describe_first_result_set and friends)
4. EXECUTE ... WITH RESULT SETS
5. FORCESCAN and FORCESEEK hints with extensions

Not working in SQL Azure yet:
1. Sequences
2. UTF-16 collations with supplimentary characters
3. New functions (CONCAT, IIF, EOMONTH, and friends)
4. Windowing (OVER clause enhancements, LAG/LEAD and others, new analytical functions)

So it appears as if T-SQL and Spatial in SQL Azure is now approximately SQL Denali CTP1-- (that's one-minus-minus). Interesting. Wonder when the rest of the functionality will appear. And will it beat the SQL Denali release to RTM?

Bear in mind, of course, that most of the SQL Denali funtionality doesn't apply to SQL Azure (e.g. Always On, security enhancements) because SQL Azure doesn't support this level of control in the first place. But you could almost (if not exactly) state that SQL Azure databases were "contained" databases all along.

Followup from a friend on the SQL Azure team: "The features from Denali that are enabled in SQL Azure are fully supported and can be safely used. There aren't any changes to terms of use of the SLAs we provide."

@bobbeauch

With the release of SQL Server Denali approaching, (some Denali features are already available in SQL Azure Database), I've decided its time to sift a few Denali features into my 5-day SQLskills Developer Skills Immersion Event on the week of October 24 in Chicago.  This will be the first time I've presented with the expanded canon, but some of Denali features are game-changers. Expanded windowing functions, for example, will change the way you look at solving common problems like running sums and balances. And expanded full-text search functionality (including semantic search) along with the filetable concept might change how you think of storage and search in your database application.

This does NOT mean that the event has changed its focus from being a hardcore performance and best-practice-oriented, developer-oriented look at SQL Server; it will still include all the topics that are required to write high-performance, robust, maintainable code on the server. I'm just moving the familar things forward a bit, to further enhance the education experience.

There should be an expanded outline up soon on the website, and there are still some seats left for this one. See you in Chicago at October's end.

@bobbeauch

Categories:
SQL Server 2012

The new Semantic Search feature of SQL Server Denali enhances full-text search by letting you do a search on the meaning of documents. To use this feature, its required that you install the SemanticsDB. The SemanticsDB ships with SQL Server or can be downloaded as part of the SQL Server Denali CTP3 Feature Pack. One you have attached the SemanticsDB to your SQL Server instance and provided appropriate access to it, you can use the STATISTICAL_SEMANTICS keyword of CREATE FULLTEXT INDEX or ALTER FULLTEXT INDEX to build semantic indices, and use them with the three table-valued functions I mentioned in an earlier blog post. So what does the SemanticsDB provide?

The SemanticsDB contains Statistical Language Models. A starter description of this can be found in Wikipedia, but these models are used to build the semantic indexes that are used with the semantic search table-valued functions. There is actually one language model internal table for each supported language. The SemanticDB is static and you can't customize it or provide your own. It's not updated during semantic index processing, in fact, you can make the SemanticDB read-only with no ill effects to the feature. It's used to make the index-building scalable, as the speed of semantic index building is not dependant on the size of the corpus of documents.

Hope this gets you intrigued about this interesting new feature.

@bobbeauch

I've been looking at full-text query plans in Denali lately, and looking at the SQL query plan for the full-text part, there isn't much to see (modulo docid lookup or lack of it, see a couple of posts ago). Everything seemed to be hidden behind the TableValuedFunction iterator for FulltextMatch function, and although there's the usual iterator info on this one, there's nothing about how they do the full-text query. Not even in the parameters passed to the TVF. So what does FulltextMatch do?

Notice I said "seemed" to be hidden. Just ran across some new extended events for FTS that are quite interesting.These are fulltext_exec_query_stats and fulltext_query_recompile. Note: I don't find these events pre-Denali. Sounds promising. Started up an Extended Event session with these, set on "Watch Live Data" (did I already say a few times how much I liked the new "extended event trace" GUI?). Along with the action for sql_text.

For a full-text query that had a predicate that read like this: "WHERE CONTAINS(file_stream, 'NEAR(("data", "SQL"), 5, FALSE)');", fulltext_exec_query_stats retrieved this lovely chunk of XML. Hope it doesn't get munged in the poster or your reader.

<Root FragmentSelectionTimestamp="0x000001ec34" QueryHandle="1241413008" IsParallel="false">
 <Scalar Name="ContainsTableSSERankForNear" InclusiveTime="0ms" RowCount="92">
  <AND Name="AND" InclusiveTime="0ms" RowCount="92">
   <Filter Name="SingleFragmentDocidFilter" InclusiveTime="0ms" RowCount="124" FilterTableStrategy="Scan" SetRangeCalls="2">
    <Filter Name="SingleFragmentSeekFilter" InclusiveTime="0ms" RowCount="124">
     <Fragment Keyword="SQL" InclusiveTime="0ms" RowCount="124">
      <PhysicalRead InclusiveTime="0ms" RowCount="19" Scans="1" LogicalReads="6" PhysicalReads="0" ReadAheads="0" LobLogicalReads="0" LobPhysicalReads="0" LobReadAheads="0" />
      <Decompression InclusiveTime="0ms" RowCount="124" />
     </Fragment>
    </Filter>
  </Filter>
  <Filter Name="SingleFragmentDocidFilter" InclusiveTime="0ms" RowCount="124" FilterTableStrategy="Scan" SetRangeCalls="2">
   <Filter Name="SingleFragmentSeekFilter" InclusiveTime="0ms" RowCount="124">
    <Fragment Keyword="data" InclusiveTime="0ms" RowCount="124">
     <PhysicalRead InclusiveTime="0ms" RowCount="18" Scans="1" LogicalReads="6" PhysicalReads="0" ReadAheads="0" LobLogicalReads="0" LobPhysicalReads="0" LobReadAheads="0" />
     <Decompression InclusiveTime="0ms" RowCount="124" />
    </Fragment>
   </Filter>
   </Filter>
  </AND>
 </Scalar>
</Root>

VERY cool. So, although there's no magic decoder ring for this one (yet, I'm working on it), you can deduce (OK, guess) the following just by reading:
   FTS in Denali has the ability to execute this function in Parallel (although I haven't made it do this yet)
   The CONTAINS predicate appears to be implemented using CONTAINSTABLE to begin with. They seem to be using a "rank for NEAR" rather than the CONTAINSTABLE rank.
   I think for "SSE" may have something to do with FTS using streaming SIMD Extensions
   The FTS portion of the query has its own cache (haven't hit the full_text_recompile event yet).
   Each FTS "Filter iterator" (for lack of a better term) has its own strategy and records its own I/O info, timing, and row count.
   BTW, There are a lot more topmost-level iterators than in this relatively simple query. There's one (sometimes multiple) for FREETEXT, TOP N, etc.

Nice. If you're tired of wondering, "what are those FTS predicates and TVFs doing", check out this mechanism for looking into it.

@bobbeauch

I've been working lately with SQL Server Denali's Semantic Search feature. There's a more complete description of this feature (as well as how to set it up, its an addition to full-text indexing) in BOL, but I just wanted to summarize my one sentence "raison d'etre" for it on a function by function basis. And mention a couple of the more obvious use cases.

First, the three functions:

SEMANTICKEYPHRASETABLE - Without a source_key argument, this is "show me all the keyphrases that appear in my entire corpus of documents". With a source_key, just substitute "...in this particular document". You have to know what's in your document collection, after all.

SEMANTICSIMILARITYTABLE - Given an exemplar document, this one is "show me the (usually TOP N) documents that have similar content to this one" on a scale of 0 to 1.0.

SEMANTICSIMILARITYDETAILSTABLE - Given two examplar documents, this one is "show me why you think these are the same" in terms of specific semantic keyphrases.
 
Here's a use case. Want to hire the perfect job candidate? Make up an exemplar resume in a standard format, filling in the qualifications of your perfect candidate. Then find, in your existing resume pile, the ones that are most like your perfect candidate. Check your work by checking the matching keyphrases for each of the TOP N candidates. Be careful though, candidates have ways of knowing what keyphrases you might want to hear... Do the interview anyway. And keep track of how semantic search's predication was.

Here's a couple more. Reading a book? Buying a product? The library or the store doesn't have *exactly* what you want? Find the closest match. And then the answer to why its the closest match.

And finally, a friend of mine, Andrew Fryer, did a blog post and set of videos about semantic search. Sounds like (from the name of the table in his blog post) that he might be matching up similar PowerPoint decks. Or at least the table is called MyDecks.

Hope this gets you going with semantic search. Can't wait till they can find similar songs and pictures, a la their presentation at PDC about it a few years ago. Maybe next release?

@bobbeauch

SQL Server Denali's new Filetable feature and full-text search are made for each other. Set up database for non-transactional filestream access with a directory name, create a table "AS FILETABLE" with a subdirectory name, drag over a bunch of files and turn on full-text search. Voila, instant search with two predicates and two TVFs. Even better, if you add the keyword SEMANTIC_STATISTICS after your column name in the CREATE FULLTEXT INDEX statement you have three new semantic search table-valued functions as well (provided that they're supported in your language). Add a property list and you can search on documents' extended properties. What could be easier? Looks quite like the functionality of the long-forgotten WinFS, if you ask me (although I don't remember any Semantic Search in WinFS, just property search).

There's just one thing. Creating a fulltext index on using a FILETABLE uses an additional internal table the fulltext docidmap.The reason for this table is to map full-text's docid (an integer) to the the FTS index's key column. In the FILETABLE's case this would be the stream_id column, which as marked as a ROWGUIDCOL. CREATE FULLTEXT INDEX will use the ROWGUIDCOL column as a key column if you don't specify a key column. FTS uses the docid to do its lookup.

But...

In SQL Server 2008, an optimization was introduced if and only if the FTS index key column was an INT or BIGINT. In the case where FTS index key column is an INT/BIGINT, no docidmap table is created when the FTS index is. Better yet, this optimization does away with the docidmap lookup step in the query plan. This step which basically adds another JOIN to every FTS and every Semantic Search query, is simply removed, improving the plan.

Because the FILETABLE's FTS key column is a UNIQUEIDENTIFIER, it can't take advantage of this optimization. You can't add columns to a FILETABLE either, so this isn't an option. Darn. One last question...because a UNIQUEIDENTIFIER (GUID) has a  bigger value space than BIGINT, what happen's when you get more than VALUESPACE_OF(BIGINT) documents in your FTS/Semantic corpus? Guess I'd need to have over 18,446,744,073,709,551,616 documents to find out...wait, how many are on that C drive again?

@bobbeauch

I've been trying this out every day or so since I'd heard about the update (originally named the July 2011 Service Release), but I'd forgotten about it for about a week. So I don't exactly know when this happened on my SQL Azure server. But tonight, around midnight, SELECT @@version returns:

Microsoft SQL Azure (RTM) - 11.0.1465.26 Aug 10 2011 22:54:49 Copyright (c) Microsoft Corporation

So cool, we're at Denali on SQL Azure. Or are we just on Denali spatial (which was the new feature that was mentioned in the announcement). Or, where are we?

First to try some T-SQL Denali features.
  The new THROW statement works, as does OFFSET and FETCH, but...
  Creating a sequence fails, as does LAG/LEAD and EXECUTE WITH RESULT SETS
 
How about spatial? Not trying to be tremendously rigorous for now, I tested some obvious things.
  The new HasM and HasZ properties work, as does the new ShortestLineTo method. Probably some other methods (like the new ST-methods on the geography type) do too.
  The items that were caught by sys.dm_db_objects_impacted_on_version_change WERE affected by the change.
  A test of the new spatial precision indicates that the Denali improved precision is being used but...
  Cannot create a FULLGLOBE type or a CIRCULARSTRING type
  The new AUTO_GRID indexes are not supported
  Attempting to use geometry::UnionAggregate produces "A severe error occurred on the current command. The results, if any, should be discarded."

And BTW, both SSMS from SQL Server Denali and SQL Server 2008 R2 SP1 connect to the updated SQL Azure version just fine. I don't have any earlier SSMS versions to try right now.

Hmmm...well that's kind of a mixed bag now, isn't it? I know we're not in Kansas anymore, but we're not all in Denali either. I'm sure someone will come up with a feature matrix for this SQL Azure version (or I will), but there's enough of a subset of new features to make it interesting. More info as it arrives.

@bobbeauch

Once upon a time (well, now, currently), there were two memory allocators in SQL Server, the single-page allocator and the multi-page allocator. The single-page allocator was used for almost everything (data buffers, caches, etc); the multi-page allocator was used for somewhat more esoteric things. Like most of SQLCLR (that's CLR code that runs inside SQL Server). Well, as the SQLOS team blog says (and I've confirmed with experiments), in the Denali release of SQL Server, there is only one allocator. And that allocator covers SQLCLR memory. The fields in some of the DMVs pertaining to memory need to change, naturally, so watch out for this if you write you own monitoring queries (and who doesn't?). 

Why is this interesting? Before Denali, on 32-bit systems, you were always futzing with the -g startup switch if you wanted to give SQLCLR more room. And on 64-bit systems (where -g is meaningless), max_server_memory might have to be reduced, because SQLCLR memory wasn't covered by max_server_memory and you needed to leave some. Well, now it is covered. The "any size memory allocator" handles it all. Nice.

Now BTW, in case this needs to be said, just because SQLCLR is using the same allocator and being coordinated with the other buffer pools, doesn't mean you should go wild and, for example, use DataSets (or other memory hogs) in SQLCLR stored procedures. Every byte that you allocate, every garbage-collected heap, is taking memory from *the rest* of the buffer pools. Remember that. If you don't, you can still be rewarded with: "Msg 6532, Level 16, State 49, Procedure your_proc, Line 0 .NET Framework execution was aborted by escalation policy because of out of memory." The SQL Server memory manager looks out for its own.

@bobbeauch

Categories:
SQL Server 2012 | SQLCLR

I'm not usually one for repeating stuff you can find in BOL. Usually, a waste of your time and mine. But this feature, for SQL Server Denali, is pretty well buried; well at least from me, and I knew what I was looking for.

I've always been interested in (well, wrote a whitepaper on) using client-side/network trace (known as BID trace) with server-side trace and coordinating/merging the traces using ETW. Extended Events and even, although its not recommended, even trace events (but DO use Extended Events), support ETW.The biggest drawback to this is that, in order to see the correlation you almost have to be running your traces in a vacuum. There's no client/server correlation ID.

Well, in SQL Server Denali, at least for some clients currently, there IS. Only the SNAC11 provider (that's ODBC and OLE DB) supports this. All the information you need to get you going (if you've done this before, if not, read the whitepaper), is doc'd in BOL here. Two cavaets: the ETW trace for clients contains one (two in some cases) columns related to IDs and correlation. One's called ActivityID, the other RelatedActivityID. The correlation ID is NOT in these, but is in the field after the UserData field. You'll see it as "ActivityID m_id:[some guid]". In the Extended Event (server) trace, it shows up as an "Activity ID Transfer Event". The other cavaet is that its not on every client-side event, only some of them.

I'll be showing client-server tracing at the SQLskills immersion event in Bellevue next week.

@bobbeauch

In the previous post, I insinuated that allowing LAG/LEAD to be sensitive to value RANGEs might help with series of data with missing values. It won't do that unless there's some logic that allows the offset that LAG/LEAD uses to be calculated on a row-by-row basis (i.e. LAG/LEAD uses an expression based on the window values). That's a little too much to expect; notice I've updated the original post to back off a bit on that one. I'll need my densified data for that.

However, a common scenario that would be helped by an expansion of use of RANGE in the window specification is rolling totals/rolling balances. Here's an example.

Suppose I wanted to calculate a 3-month rolling average (or rolling total) of sales on a per-employee basis. Looking at an employee who doesn't make at least one sale every month, I get a skewed total/average. The totals don't reflect the last three calendar months, but instead they reflect the last three months in which the employee had a nonzero sales total. Skews the figures upward a little....

SELECT  Employee,
        [Year] ,
        [Month] ,
        EmployeeTotal AS SalesThisMonth,
        COUNT(*) OVER (PARTITION BY Employee ORDER BY [Year], [Month] ROWS 2 PRECEDING) AS NumberOfMonths,
        SUM(EmployeeTotal) OVER (PARTITION BY Employee ORDER BY [Year], [Month] ROWS 2 PRECEDING) AS ThreeMonthTotal,
        AVG(EmployeeTotal) OVER (PARTITION BY Employee ORDER BY [Year], [Month] ROWS 2 PRECEDING) AS ThreeMonthAverage
FROM EmployeeSalesByMonth
WHERE Employee = 272
ORDER BY Employee, [Year], [Month];

                        SalesThisMonth Number   ThreeMonthTotal ThreeMonthAverage
272     2005    8       20544.7015      1       20544.7015      20544.7015
272     2005    9       2039.994          2       22584.6955      11292.3477  
272     2005    11     6341.551          3       28926.2465      9642.0821
272     2006    2       61206.4782      3       69588.0232      23196.0077
272     2006    3       18307.746        3       85855.7752      28618.5917
272     2006    4       33406.7043      3       112920.9285     37640.3095

Suppose I had a way to specify a time interval and use RANGE instead of ROWS. Something like: "RANGE BETWEEN '2 MONTHS' PRECEEDING AND CURRENT ROW" where '2 MONTHS' represents a time-interval data type. Then I'd get the "non-skewed" answer. Currently SQL Server Denali doesn't even allow that form the RANGE specification (RANGE BETWEEN <unsigned value specification> PRECEDING) with any data type currently. Instead, I can use the densified data (see previous post) with the rows window and get that "non-skewed" answer.

WITH SalesByMonth AS
(
SELECT Employee,
       DATEPART(yyyy,dt) as Year,
    DATEPART(MONTH,dt) as Month,
    ISNULL(EmployeeTotal, 0) as EmployeeTotal
FROM #allMonths m
OUTER APPLY (
SELECT  EmployeeTotal
FROM    EmployeeSalesByMonth e
WHERE m.Employee = e.Employee  
 AND    DATEPART(yyyy,dt) = [Year]
 AND    DATEPART(mm,dt) = [Month]
) AS t
)
SELECT  Employee,
        [Year] ,
        [Month] ,
        EmployeeTotal AS SalesThisMonth,
        COUNT(*) OVER (PARTITION BY Employee ORDER BY [Year], [Month] ROWS 2 PRECEDING) AS NumberOfMonths,
 SUM(EmployeeTotal) OVER (PARTITION BY Employee ORDER BY [Year], [Month] ROWS 2 PRECEDING) AS ThreeMonthTotal,
        AVG(EmployeeTotal) OVER (PARTITION BY Employee ORDER BY [Year], [Month] ROWS 2 PRECEDING) AS ThreeMonthAverage
FROM SalesByMonth
WHERE Employee = 272
ORDER BY Employee,
        [Year],
        [Month];
GO

                        SalesThisMonth Number   ThreeMonthTotal ThreeMonthAverage  
272     2005    7       0.00                 1       0.00                 0.00
272     2005    8       20544.7015      2       20544.7015      10272.3507
272     2005    9       2039.994          3       22584.6955      7528.2318
272     2005    10     0.00                 3       22584.6955      7528.2318
272     2005    11     6341.551          3       8381.545          2793.8483

Notice, however, that there's another subtle change in the answer. In choosing to start the time series exactly on 2005-07 (rather than the month when each employee had his/her first sale), the averages for the first couple of months for this employee are a little low. We can account for this, if we want, by basing the #allmonths table on each employee's hire date, for example. Yet another instance of "always pay attention to what you're actually measuring".

@bobbeauch
 

You've heard my rant before "measure what you think you are measuring". If not, follow the link. Here's an example using the LAG function, new in SQL Server Denali, to measure sales trends. We'll start with a view from AdventureWorksDW2008R2 (the double cast is for clarity).

CREATE VIEW dbo.EmployeeSalesByMonth
AS
SELECT  EmployeeKey as [Employee],
        DATEPART(yyyy, CAST(CAST(OrderDateKey as varchar(8)) as DATE)) AS [Year] ,
        DATEPART(mm, CAST(CAST(OrderDateKey as varchar(8)) as DATE)) AS [Month] ,
        SUM(SalesAmount) AS [EmployeeTotal]
FROM    dbo.FactResellerSales
WHERE    EmployeeKey IS NOT NULL
GROUP BY EmployeeKey,
        DATEPART(yyyy, CAST(CAST(OrderDateKey as varchar(8)) as DATE)),
        DATEPART(mm, CAST(CAST(OrderDateKey as varchar(8)) as DATE))
GO

Let's use LAG to get sales for this month, previous month, and three months ago. I'm choosing a specific employee, just leave the WHERE clause out to get everyone.

SELECT  Employee,
        [Year] ,
        [Month] ,
        EmployeeTotal AS SalesThisMonth,
        LAG(EmployeeTotal, 1, 0.00) OVER (PARTITION BY Employee ORDER BY [Year], [Month]) AS SalesLastMonth ,
        LAG(EmployeeTotal, 3, 0.00) OVER (PARTITION BY Employee ORDER BY [Year], [Month]) AS SalesThreeMonthsAgo
FROM dbo.EmployeeSalesByMonth
WHERE Employee = 272
ORDER BY Employee, [Year], [Month];
GO

The LAG function works perfectly. Or does it...? Let's look at the first few rows.

                               This Month      Last Month      Three Months Ago
272     2005    8       20544.7015      0.00               0.00
272     2005    9       2039.994         20544.7015     0.00
272     2005    11      6341.551        2039.994         0.00
272     2006    2       61206.4782      6341.551        20544.7015
272     2006    3       18307.746       61206.4782     2039.994

Although this sounded like a good idea, and the function is working as advertised, the answer isn't what we want. Look at the 2005-11 line, for example. Last month was 2005-10, but there was no sales that month, so we get 2005-9 sales. Three months ago (2005-08) there were sales, but we get zero. What gives?

We're only producing rows for months where there were sales for employee 272. Probably not what you'd want. Although we think we're counting months with this LAG function, we're actually counting ROWS. SQL Server BOL implementation of LAG states that the offset (first parameter) is "The number of rows back from the current row from which to obtain a value."

So, can we fix things to get "reasonable" values? Because LAG uses rows, we'd need to come up with some blank (zero total) rows for months where there are no sales. Some databases have a special data densification syntax, like "partition-by joins" to fill in the gaps. Using Itzik Ben-Gan's dbo.GetNums TVF, we can make a table of allMonths and all Employees between a date range. (Note: this code is a quick hack and pretty fragile, but we will end up with the right answer. It also assumes you have dbo.GetNums in tempdb, see the SQLMag article for that function).

DECLARE
 @startdt AS DATE = '20050701',  -- first date we care about
 @enddt   AS DATE = '20080731'   -- last date we care about
 
SELECT DATEADD(month, n-1, @startdt) AS dt, EmployeeKey AS Employee
INTO #allMonths
FROM tempdb.dbo.GetNums(DATEDIFF(month, @startdt, @enddt) + 1) AS Nums
CROSS JOIN
(
SELECT DISTINCT EmployeeKey
FROM dbo.FactResellerSales
) AS A; 
GO

With this table in hand, we can use an OUTER APPLY to "add in" the months with no sales, and our sales trend query becomes:

WITH SalesByMonth AS
(
SELECT Employee,
       DATEPART(yyyy,dt) as Year,
    DATEPART(MONTH,dt) as Month,
    ISNULL(EmployeeTotal, 0) as EmployeeTotal
FROM #allMonths m
OUTER APPLY (
SELECT  EmployeeTotal
FROM    EmployeeSalesByMonth e
WHERE m.Employee = e.Employee  
 AND    DATEPART(yyyy,dt) = [Year]
 AND    DATEPART(mm,dt) = [Month]
) AS t
)
SELECT  Employee,
        [Year] ,
        [Month] ,
        EmployeeTotal AS SalesThisMonth,
        LAG(EmployeeTotal, 1, 0.00) OVER (PARTITION BY Employee ORDER BY [Year], [Month]) AS SalesLastMonth ,
        LAG(EmployeeTotal, 3, 0.00) OVER (PARTITION BY Employee ORDER BY [Year], [Month]) AS SalesThreeMonthsAgo
FROM SalesByMonth
WHERE Employee = 272
ORDER BY Employee,
        [Year],
        [Month];
GO

A few corresponding rows...the sales figures now agree with the column headings

                               This Month      Last Month      Three Months Ago
272     2005    8       20544.7015      0.00               0.00
272     2005    9       2039.994         20544.7015     0.00
272     2005    10     0.00                2039.994         0.00
272     2005    11     6341.551         0.00               20544.7015
272     2005    12     0.00                6341.551        2039.994
272     2006    1       0.00                0.00                0.00
272     2006    2       61206.4782     0.00                 6341.551
272     2006    3       18307.746       61206.4782      0.00

So remember...although the default window for the OVER clause really is RANGE UNBOUNDED PRECEDING AND CURRENT ROW, the LAG function only counts ROWs, not RANGEs.

@bobbeauch  

A (couple of days late) thanks to all the folks that attended my "What's New in Denali" demo fest at Portland SQL Server User Group last Thursday. I hope the code-based explanation of the "zen" of Columnstore Index, Semantic Search/Fulltext Enhancements/FileTable, and Extended Event Graphic Session support was clear.

Modulo the heat in the room ;-), I hope you liked it.

@bobbeauch

Categories:
SQL Server 2012

So, quickly on the heels of the first window clause and last_value() question, came a followup:

OK smartie, why does last_value work fine here? I didn't have to change from the default window. What gives?

select SalesPersonID, SalesOrderID,
 first_value(SalesOrderID) over(partition by SalesPersonID order by SalesPersonID) as FirstOrderForSalesPerson,
 last_value(SalesOrderID) over(partition by SalesPersonID order by SalesPersonID) as LastOrderForSalesPerson
from Sales.SalesOrderHeader
where SalesPersonID IS NOT NULL
order by SalesPersonID, SalesOrderID

It's close to the last example, except that we're ordering by SalesPersonID instead of SalesOrderID. And, partitioning by (works like group by except that we keep all the detail rows in each group) SalesPersonID. So why DOES it produce the "right" answer for each SalesPersonID?

Remember that the query above is equivalent to:

select SalesPersonID, SalesOrderID,
 first_value(SalesOrderID) over(partition by SalesPersonID order by SalesPersonID
                                range between unbounded preceding and current row) as FirstOrderForSalesPerson,
 last_value(SalesOrderID) over(partition by SalesPersonID order by SalesPersonID
                               range between unbounded preceding and current row) as LastOrderForSalesPerson
from Sales.SalesOrderHeader
where SalesPersonID IS NOT NULL
order by SalesPersonID, SalesOrderID

And it gets the intuitive "right" answer because of the way the "range" windowing spec handles ties. Since the order by clause specifies "SalesPersonID", all the rows for the same SalesPersonID are ties. The window frame with range includes all rows with the same value (all ties) as the part of the frame. So, in the query above, SalesPersonID X has multiple (tied) rows in the window, range considers all of them.

Contrast to this, using the "rows" windowing spec, which *doesn't* consider multiple tied rows as part of the window, we're back to our "weird" answer.

select SalesPersonID, SalesOrderID,
 first_value(SalesOrderID) over(partition by SalesPersonID order by SalesPersonID
                                rows between unbounded preceding and current row) as FirstOrderForSalesPerson,
 last_value(SalesOrderID) over(partition by SalesPersonID order by SalesPersonID
                               rows between unbounded preceding and current row) as LastOrderForSalesPerson
from Sales.SalesOrderHeader
where SalesPersonID IS NOT NULL
order by SalesPersonID, SalesOrderID

A different way to prove to yourself that RANGE considers ties part of the frame and ROWS doesn't is to use the use the SUM aggregate instead of FIRST/LAST_VALUE().

-- all of SumOfTotal for a specific SalesPersonID are equal with RANGE
-- SumOfTotal is equal for each row, each SalesPersonID
select SalesPersonID, SalesOrderID, TotalDue,
 sum(TotalDue) over(partition by SalesPersonID order by SalesPersonID) as SumOfTotal
from Sales.SalesOrderHeader
where SalesPersonID IS NOT NULL

-- you get a "running total" when using ROWS
select SalesPersonID, SalesOrderID, TotalDue,
 sum(TotalDue) over(partition by SalesPersonID order by SalesPersonID
                    rows between unbounded preceding and current row) as SumOfTotal
from Sales.SalesOrderHeader
where SalesPersonID IS NOT NULL

However, notice that the running total isn't necessarily by SalesOrderID becuase without an "order by" in the OVER clause, the spec doesn't guarentee ordering by both columns. Note for example, that in the query above, for SalesPersonID 281, SalesOrderID 48327 comes after 48370 (at least it does on my machine, without additional indexes on SalesOrderHeader table).

If you truly want running total by SalesOrderID, in SalesOrderID order, add SalesOrderID to the "ORDER BY" in the OVER clause. When you add SalesOrderID to ORDER BY, now there are no ties in ordering. So either ROWS or RANGE will do the trick.

-- same answer as next query
select SalesPersonID, SalesOrderID, TotalDue,
 sum(TotalDue) over(partition by SalesPersonID order by SalesPersonID, SalesOrderID
                    rows between unbounded preceding and current row) as SumOfTotal
from Sales.SalesOrderHeader
where SalesPersonID IS NOT NULL

-- same answer as previous query, uses RANGE by default
select SalesPersonID, SalesOrderID, TotalDue,
 sum(TotalDue) over(partition by SalesPersonID order by SalesPersonID, SalesOrderID) as SumOfTotal
from Sales.SalesOrderHeader
where SalesPersonID IS NOT NULL

Hope this helps, Bob

@bobbeauch

I've been working with the new Denali T-SQL windowing functionality and ran into someone who asked about this "problem". It's almost sure to become an FAQ.

Why does last_value not always "work right"?

Take, as an example, the following query (against Adventureworks2008R2)

select  AccountNumber,  SalesPersonID, SalesOrderID,
 first_value(SalesOrderID) over(partition by AccountNumber order by SalesOrderID) as FirstOrderForAcct,
 last_value(SalesOrderID) over(partition by AccountNumber  order by SalesOrderID) as LastOrderForAcct
from Sales.SalesOrderHeader
where SalesPersonID IS NOT NULL

When you look as the results, FirstOrderForAcct seems OK, but LastOrderForAcct always returns the same number as the current account for each row. Not the last number. Why?

The reason is that, when using first/last_value with a window of rows, the default window is "range between unbounded preceding and current row". So the query above with the "weird" result is equivalent to this one.

select  AccountNumber,  SalesPersonID, SalesOrderID,
 first_value(SalesOrderID) over(partition by AccountNumber order by SalesOrderID
                                range between unbounded preceding and current row) as FirstOrderForAcct,
 last_value(SalesOrderID) over(partition by AccountNumber  order by SalesOrderID
                               range between unbounded preceding and current row) as LastOrderForAcct
from Sales.SalesOrderHeader
where SalesPersonID IS NOT NULL

If you want the "intuitive" answer for last_value, just change the window clause to:


select  AccountNumber,  SalesPersonID, SalesOrderID,
 first_value(SalesOrderID) over(partition by AccountNumber order by SalesOrderID
                                range between unbounded preceding and unbounded following) as FirstOrderForAcct,
 last_value(SalesOrderID) over(partition by AccountNumber  order by SalesOrderID
                               range between unbounded preceding and unbounded following) as LastOrderForAcct
from Sales.SalesOrderHeader
where SalesPersonID IS NOT NULL

@bobbeauch

I've been trying out the windowing extensions (the OVER clause extensions and friends) with different kinds of functions (aggregate functions, first_value, last_value() etc), and thought I'd use a SQLCLR aggregate. After all, they're just "regular" aggregates, right? You can't build a user-defined ranking aggregate in SQLCLR, so they can't be confused with one of those. And you can use 'em (user-defined aggregates) with the OVER() clause in previous versions of SQL Server. Unlike the CHECKSUM function, that the BOL points out which is the only aggregate that can't be with OVER() at all, though CHECKSUM_AGG doesn't seem to have this problem.

-- these worked before
select *,
dbo.covar_pop(i1, i2) over()
from test_covar

select *,
dbo.covar_pop(i1, i2) over(partition by i1)
from test_covar

Hope I was hopeful. Well it didn't quite come out the way I'd hoped...

select *,
dbo.covar_pop(i1, i2) over(partition by i1 order by i2)
from test_covar

Msg 156, Level 15, State 1, Line 2
Incorrect syntax near the keyword 'order'.

A built-in aggregate worked fine.

select *,
avg(i2) over(partition by i1 order by i2)
from test_covar

So it appears that SQLCLR user-defined aggregates are specifically being "called out", ie, don't work with the new Denali functionality. Perhaps the state machine in the SQLCLR UDA architecture doesn't support/scale with the intermediate ordered results. We've got some nice new built-in distribution functions (like , but I'd like to use my covar_pop (and others) with the full functionality of windowing, rather than having to wait. I've put in a connect item for it, we'll have to wait and see.

Update: Feedback from my connect item (that was fast...) is that windowing over SQLCLR UDAs "didn't make the cut in Denali. Maybe next time".

@bobbeauch

Categories:
SQL Server 2012 | SQLCLR

I'm always leery when I hear people say "statistics show that..." followed by whatever their opinion is. Scientists do it. And your users probably do it too. I worked with a product called SAS once, on statistics for response time. Got some lovely reports and statistics, *from which other folks draw conclusions*. It's important to step back every once in a while, and make sure you're measuring what you think you're measuring. And that it really does add up. I've made that into kind of a koan "I don't want to hear your conclusion, I want to see your raw data". And, if I'm really interested, I'll do the math "by hand". Amazingly (well maybe not so amazingly) the conclusion doesn't always jibe with the raw data and/or what folks *think* they're measuring.

End of rant.

So SQL Server Denali CTP3 added some new analytical functions: PERCENT_RANK, CUME_DIST, PERCENTILE_DISC and PERCENTILE_CONT. The first three, I have no problem with. The Denali BOL lists how it works and I can confirm this using sample data. Good. The PERCENTILE_CONT function though...that's a little different.

BTW, There is a slight typo in BOL for PERCENT_RANK. It says "The range of values returned by PERCENT_RANK is greater than 0 and less than or equal to 1." Actually, PERCENT_RANK can, and is, in their example, sometimes *equal to 0*. It's CUME_DIST that is greater than 0 and less than or equal to 1. Fine.

For PERCENTILE_CONT, the SQL Server BOL says: "Calculates a percentile based on a continuous distribution of the column value... The result is interpolated and might not be equal to any of the specific values in the column." That's sufficiently vague, interpolated how? And "might not be equal to any of the specific values in the column"? That's as opposed to PERCENTILE_DISC which is always equal to one of the specific values of the column, by definition.

I've also read that "PERCENTILE_CONT(X) examines the percent_rank of values in a group until it finds one greater than or equal to X." In two or three places on the web. Nope. Not true. Doesn't jibe with "might not be equal to any of the specific values in the column".

So, I'm intrigued . And, last night, tried to figure it out by running a variation of BOL sample query.

USE AdventureWorks2008R2;

SELECT Name AS DepartmentName
      , ph.Rate
   ,PERCENT_RANK() OVER(PARTITION BY Name ORDER BY ph.Rate) as Percent_rank
      ,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ph.Rate)
                            OVER (PARTITION BY Name) AS MedianCont
   ,CUME_DIST() OVER(PARTITION BY Name ORDER BY ph.Rate) as Cume_Dist
      ,PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY ph.Rate)
                            OVER (PARTITION BY Name) AS MedianDisc
FROM HumanResources.Department AS d
INNER JOIN HumanResources.EmployeeDepartmentHistory AS dh
    ON dh.DepartmentID = d.DepartmentID
INNER JOIN HumanResources.EmployeePayHistory AS ph
    ON ph.BusinessEntityID = dh.BusinessEntityID
WHERE dh.EndDate IS NULL;

Let's look at a few groups in the raw data (the three columns that matter have headers):

                Rate     Percent Rank            MedianCont
-----------------------------------------------------------------------------------------------
Executive 39.06    0                              54.32695              0.25   48.5577
Executive 48.5577 0.333333333333333 54.32695                0.5   48.5577
Executive 60.0962 0.666666666666667 54.32695              0.75   48.5577
Executive 125.50   1                             54.32695                   1   48.5577

Tool Design 8.62       0                          25                             0.166666666666667 25.00
Tool Design 23.72     0.2                       25                             0.333333333333333 25.00
Tool Design 25.00     0.4                       25                             0.666666666666667 25.00
Tool Design 25.00     0.4                       25                             0.666666666666667 25.00
Tool Design 28.8462 0.8                       25                             0.833333333333333 25.00
Tool Design 29.8462 1                          25                                                       1   25.00

For "Executive", the value is interpolated, like the BOL says. Not a value in the set. For "Tool Design", the MedianCont (PERCENTILE_CONT(0.5)) is 25, a value in the set, but not a value where percent rank is > 0.5 (its 0.4). Oh. So I ripped out the rows that produced the resultset and played with these for a while, adding new rows to each set and watching how the results changed. No joy. After yet another web search (I DID use web search, I swear, but missed this one the first 10 times) I came across this in Oracle Database SQL Reference:

"The result of PERCENTILE_CONT is computed by linear interpolation between values after ordering them. Using the percentile value (P) and the number of rows (N) in the aggregation group, we compute the row number we are interested in after ordering the rows with respect to the sort specification. This row number (RN) is computed according to the formula RN = (1+ (P*(N-1)). The final result of the aggregate function is computed by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN).
 
The final result will be:
 If (CRN = FRN = RN) then the result is
    (value of expression from row at RN)
  Otherwise the result is
    (CRN - RN) * (value of expression for row at FRN) +
    (RN - FRN) * (value of expression for row at CRN)
".

SQL Server's numbers do agree with Oracle's for the "demo" (that's scott/tiger) database examples. I tried them (yes, I really have a "demo" database on Denali, it's lamer than even pubs. But they've replaced it now, with more robust samples DBs). So let's work these two out.

For "Tool Design":
declare @p float = 0.5
declare @n int = 6
select 1+ (@p*(@n-1)) --3.5
select ceiling(1+ (@p*(@n-1))) --4
select floor(1+ (@p*(@n-1))) --3
-- value at row 3 = 25,value at row 4 = 25
select .5*25 + .5*25 -- 25

25 is the right answer

For "Executive":
declare @p float = 0.5
declare @n int = 4
select 1+ (@p*(@n-1)) --2.5
select ceiling(1+ (@p*(@n-1))) --3
select floor(1+ (@p*(@n-1))) --2
-- value at row 2 = ,value at row 3 =
select .5*48.5577 + .5*60.0962 -- 54.32695

54.32695 is the right answer

Got it? If you think you have it, run the sample query and work out dept "Finance". Then (if you're up late like I was, and I worked out most/all of the groups in the sample query), go to sleep. You've earned it. And...you may not ever use PERCENTILE_CONT in your day-to-day work.

[Rant on] To reiterate, when looking at reports that contain statistical calculations, make sure you know what the statistic means. And check it to make sure it DOES mean that, especially if you think the numbers look "weird" (ie, don't jibe with the description). We aren't all rocket scientists. And rocket scientists may not have a background in statistics anyhow. But I'll bet they (like everyone) have opinions. And are hankerin' to "prove" them.

@bobbeauch

After downloading CTP3, I immediately (well, almost immediately) went to set the options to view Books Online from local storage. You'all do know how to do that, Help and Community off the start menu, Manage Help Settings. But downloading BOL gets CTP1 also.  Online BOL is still CTP1, last time I tried. Bummer.

So I was about to suggest a game of "find the feature", when I found the books online. It's off the feature pack site here. Direct link is here. Ignore the fact that the page says CTP1. The file says CTP3. And it is....

Just in case you haven't done this before:
1. Download the books online exe
2. Run it, choose a location to unzip the contents. They unzip into a directory called "DenaliCTP3Layout".
3. You should now fire up Help and Community/Manage Help Settings
4. You should already be configured for local help. After doing this..
5. Choose "Install Content From Disk"
6. Navigate the to "DenaliCTP3Layout" directory (you did put it where you could find it, right?)
7. Select the .msha file (HelpContentSetup.msha) and you're good to go.
8. A few minutes later, open up your new Denali CTP3 Books Online. Voila!

So *now* lets play "find the feature". With the aid of the Books Online. Much easier that way. The feature list (or most of it) is here. What's not in CTP1 IS in CTP3. Post up if you find something not in the list (Aaron Nelson's already mentioned PowerShell features and PowerShell for SSAS). Enjoy!

Update: CTP3 Books Online in web mode is live. Here. Hooray.

@bobbeauch

Categories:
SQL Server 2012

So I was awake late last night and, lo and behold, there was a web link for Denali CTP3 posted on Twitter. Downloaded and installed it. Brought it up and looked for a few new items (features) announced at SQLPASS last year and TechEd US 2011 this year that should be there. They were. Maybe it was late, but I did notice a few things along the way. So maybe they're only interesting to me.

I installed on Windows Server 2008 R2 SP1 (as the release notes suggested) and, on the "choose your service account" page, the well-known Service SIDs appeared. No need to change 'em, unless you're planning to turn off service SIDs. See my posting on Service SIDs and SQL Server here.

From there, after typing in "declare @g geometry = 'POINT(0 0)';" in a query window for good measure, I proceeded to the log viewer window. And was rewarded with the following messages:

CLR Version v4.0.30319 loaded.
Commmon language runtime (CLR) functionality initialized using CLR version v4.0.30319 from C:\Windows\Microsoft.NET\Framework64\v4.0.30319\

So SQLCLR is using .NET 4.0 now, eh? That was a feature on no one's radar. Not even my list... That seems to be a long sought-after feature, judging from the complaints on the SQLCLR forum (called .NET Framework Inside SQL Server).

From there, I meandered up to the Extended Events folder, clicked on sessions and right-clicked on the system_health session, already installed as usual. There's a nice selection "Watch Live Data"... the long-sought after Extended Events GUI. And a nice "New Session Wizard" and "New Session" (for big kids), off the  Session folder. Nice.

By then it was around 3am. Time for sleeping...I'll write about this tomorrow. And so I did, just now. Enjoy.

@bobbeauch

Categories:
SQL Server 2012

Just to let you know...

I'll be speaking at the Portland, OR, SQL Server User Group meeting this month on July 28. The topic will be "What's New in SQL Server Denali". There's a lot of new features (and more coming soon), so I'll try not to just list off the name of each of every feature, but pick out a few that I like and show 'em off. And try and leave time for Q&A. If you really, really want a list, there's one here, I recently updated it. Hope to see you there.

@bobbeauch

Categories:
SQL Server 2012

The next version of Data-Tier Applications is version 2.0. You can get a CTP of it today; where the CTP is located really gives away the game. It's located at SQL Azure Labs, listed under SQL Azure "Import/Export". The writeup on this page identifies a few interesting things:

1. It's a preview of the Denali Data-Tier Application Framework (although its not the version in Denali CTP1).
2. It's meant to be "a solution for archival and migration scenarios".
3. (Easy implication on my part) - they're concentrating on SQL Azure "database movement" for now. Although you can use it with SQL Server, coverage includes exactly what's in SQL Azure.

With respect to the second point listed, DAC 2.0 supports two new operations: import and export. This import and export is on an entire database at a time, as opposed to say, BCP, which is table-by-table. It also imports/exports data and schema at once, unlike BCP which needs a schema to be defined ahead of time. This import/export is NOT backup/restore. There's no point-in-time recovery, and, if the export is executed on an "active" database, there's no guarentee that its transactionally consistent. It's just of copy of all of the schema and data into one container called (are you ready for this?) a .bacpac. The data is actually stored row-at-a-time, in JSON (Javascript Object Notation) format.

Better than that (depending on your point of view), DAC 2.0 adds support for additional database objects and types, including ALL of the database objects supported by SQL Azure Database (but not objects not supported by SQL Azure, like Service Broker objects). Objects added in the CTP include:

Geometry, Geography, and HierarchyId data types
Spatial Indexes
Statistics
Synonyms
CursorParams
Security-Related Objects
  Permissions
  Role Memberships
  Logins and Users (I don't see Denali's Contained Database logins, but I bet these will be added eventually)


The CTP offering consists of five components: new versions of the four components in earlier DAC versions (DAC framework, SMO, SQLClrTypes, T-SQL LanguageService) and one new component: the T-SQL ScriptDom. The T-SQL ScriptDom is a component that generates SQL Server version-specific scripts for create and alter/update.

When you install DAC 2.0 CTP (the components actually show up in Add/Remove Programs as e.g. "Microsoft SQL Server "Denali" Data-Tier App Framework (CTP 2.1)", there are four such components), it does NOT replace existing DAC components. It installs side-by-side in the GAC and in the "110" subdirectory SQL Server in program files. I didn't install it on Denali CTP1 system, but installed it on an OS with SQL Server 2008 R2. It doesn't affect any GUI components, and existing VS 2010 (RTM or SP1) and SQL Server 2008 R2 (RTM or SP1) DAC 1.0 or 1.1 functionality continues to work as it used to.

Because there's no GUI components or new VS Projects for now, it comes with a command-line utility. Unless you program it using the object model, you interact using the utility for now (although the code for the command-line utility is available on CodePlex). The utility, DACImportExportCli.exe (sounds like they'll implement a Windows Azure service over it, no?), contains at least one exciting feature: BACPACs can be imported/exported from/to Windows Azure Blob storage. You just name your Windows Azure storage server and cert as command-line parameters and "it all happens in the cloud" (and no, I'm not contracting to write advertising slogans, thanks).

So, to wrap this up, both SQL Azure Database and SQL Server Denali work with the concept of the portable, movable, database. It's meant to be unencombered by server-level (or MSDB-level) objects, and supports a subset of database objects that increases with each new SQL Azure Database Service Release. And the vehicle for wrapping these databases up and moving them around is the DAC. In addition to working with multiple SQL Server versions, it also encapsulates the differences between applying database schema changes to disperate versions. As an example, at TechEd 2011, Adrian Bethune illustrated how to change the data type of a clustering key on SQL Server Denali and SQL Azure Database. It's different on Azure because you can't simply drop and re-create the clusting key, so DAC accomplishes it differently on each instance. Very interesting...

@bobbeauch

I'd been fretting over the lack of a true visualizer for the new circularstring and compoundcurve spatial types in SQL Server Denali. I'd installed SSMS from SQL Server 2008 R2, and although it works just fine for Denali spatial types (even though the binary format has changed since 2008/R2), trying to use it with either of the new "curve" types produces an empty graph pane. Because of this lack of visualization, I also wondered how these types would ever work with map websites/controls that only support Points, LineStrings, and Polygons.

Silly me. When I mentioned this to Ed Katibah (aka Spatial Ed), he pointed out that this was exactly the purpose of STCurveToLine(). I thought this method would only produce LineStrings that had as many vertices as the curve, resulting in a "square" rendition of a circle, for example. But no... this routine "densifies" the linestring, resulting in a linestring with more points, that really looks like a curve/circle. Nice! If this doesn't look circular enough (or is too many points) you can always use CurveToLineWithTolerance() instead of STCurveToLine(). Thanks, Ed! Here's a short example:

declare @g geometry = 'CIRCULARSTRING(0 2, 2 0, 4 2, 2 4, 0 2)' -- here's a circle
select @g.STIsValid() -- ensure it's valid
select @g.STNumPoints() -- 5 points in the definition

-- show in the SSMS 2008/R2 "spatial results tab"
select @g.STCurveToLine()
select @g.CurveToLineWithTolerance(0.01,0)

-- circle densification
select @g.STCurveToLine().STNumPoints(), -- 65 points
         @g.CurveToLineWithTolerance(0.01,0).STNumPoints() -- 33 points

@bobbeauch

One of the interesting spatial features in SQL Server Denali is the inclusion of spatial aggregates, namely CollectionAggregate, ConvexHullAggregate, EnvelopeAggregate, and UnionAggregate. Here's an example of unioning two squares together to make a rectangle.

create table t1 (g geometry)

insert t1 values('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))')
insert t1 values('POLYGON((1 0, 2 0, 2 1, 1 1, 1 0))')

select geometry::UnionAggregate(g) from t1

Beside being useful, they are interesting because its the first use of SQLCLR-based aggregates (that I'm aware of) as part of the SQL Server database engine code itself.

The what's new spatial whitepaper has an interesting comment about the spatial aggregates: "The new aggregates are exposed in SQL Server only and are not exposed in the underlying spatial library". I asked (spatial) Ed about this and pointed out that you can use the spatial aggregates (or any .NET-based user-defined aggregate, for that matter) on the client as well as the server. You just need to find the correct class in the library in the spatial library, which is public (and hopefully it will stay public in the released version). Here's the same aggregation in client-side code.

//collection of SqlGeometry
List<SqlGeometry> glist = new List<SqlGeometry>();
glist.Add(SqlGeometry.Parse("POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))"));
glist.Add(SqlGeometry.Parse("POLYGON((1 0, 2 0, 2 1, 1 1, 1 0))"));

GeometryUnionAggregate agg = new GeometryUnionAggregate();
// call the appropriate methods on the aggregate
agg.Init();
foreach (SqlGeometry geom in glist)
    agg.Accumulate(geom);
SqlGeometry theanswer = agg.Terminate();
Console.WriteLine("answer is {0}", theanswer);

The only strange thing is that, in T-SQL, the aggregate appears to be a static property on the geometry class (you use geometry::UnionAggregate to invoke it in T-SQL). But using reflection on the SqlGeometry class in the library reveals no such public (or private) property. But that's fine; you CAN use the spatial aggregates directly; you just need to know the name of the aggregate class. Conceivably, you could even parallelize the client-side aggregation and call Merge() at the appropriate time.

@bobbeauch

 

When I originally wrote the SQL Server Denali full-text search on properties cheap demo, I used property sets with properties (GUIDs) that I got from the SQL Server BOL example (although the example itself didn't work). As it says in BOL, CREATE SEARCH PROPERTY LIST creates an empty property list. Which got me to wondering, where do the property GUIDs for these properties come from? There must be a canonical list somewhere.

And indeed, there is. There's a list in the documentation for "Windows Properties" located on the web here. And there's a LOTS of 'em on which to search. After going back and forth from that web page a few hundred times, I thought that there must be a file that contains them in an easy-to-parse format. Well, there's one of those too; propkey.h from platform SDK. Nice.

But...having the properties/property sets is not enough. You need to have an IFilter that parses out these individual properties as well. Although I've found some third party IFilters on the web that deal with some (few) properties, I'm still looking for one that parses all the EXIF properties out of a JPG file from GPS-enabled cameras. If you find one of those, please let me know.

Happy property searching...

@bobbeauch

I was telling students this week about the story of the semicolon and SQL Server. And, when going over some Denali demos, it came up again.

SQL Server does not mandate semicolons as end-of-statement indicators as some databases do (as well as the ANSI standard). You can use a semicolon (or as many semicolons as you like, really) as end-of-statement, but its most often used as a statement separator for folks that submit multiple SQL statements in a database API (e.g. ADO.NET) CommandText. But you can also use CR-LF or even a single space (if the statements are simple enough) to separate statements in a batch.

In SQL Server 2005, they introduced some ambiguous (to the parser) keywords for the first time. These were CTEs ("WITH"), SEND, RECEIVE, BEGIN DIALOG CONVERSATION, and BEGIN CONVERSATION TIMER. With these, if they are not the first statement in a batch, the statement before them must end with a semicolon.

At that point folks started wondering if they should go back through all their code and put a semi after every statement. My current thought is "not for existing code (unless you're doing maintenance and need it), but yes for new code". Why? They could have more such rules in future and you'd need to memorize them all.

In SQL Server 2008, there is yet another rule, "MERGE" statement must be terminated with a semicolon.

And in Denali, it occurs again. The new "THROW" statement has the same requirement as "WITH"; if its not the first statement in the CATCH block, the statement before it must end with a semicolon. Same thing if THROW is used to throw an exception in a batch, a la RAISERROR, then the statement before it must end with a semi. And, just in case you decide to put a semicolon after everything that looks like a T-SQL statement, an interesting thing about TRY-CATCH blocks. I can put a semicolon after BEGIN TRY, BEGIN CATCH, or END CATCH without incident. But putting a semicolon after END TRY causes an error.

So watch those semicolons and know the rules. Or you'll run into them the hard way.

@bobbeauch

Categories:
SQL Server 2012

I'll be speaking at the New York City SQL Server user group next Thursday (Jan 27) on "What's New in the SQL Server Denali Database Engine". I'll go through almost all of the new features with demos on each one, including contained databases, additional extended event and PowerShell support, T-SQL enhancements, spatial data, and full-text search enhancements. I'll also provide the reasoning behind the utility of each feature. The meeting is being held, as usual, at the Microsoft Manhattan office. For more details and directions, visit the user group website. See you there.

@bobbeauch

Categories:
SQL Server 2012

At SQLPASS a couple of months ago, Microsoft announced SQL Server Denali and, at during the conference, handed out CTP1. Nice. But, the conference keynotes and sessions on it alluded to many more of the upcoming Denali features than appear on the CTP1. That's all well and good, but I've heard tell of one person who installed the CTP1 to show a client a feature he'd heard about, then spent time in front of the client looking for the feature (and no, it wasn't me).

To clarify things, you can reference the What's New for CTP1 in BOL. There's even a blogger (I forget the reference) who cut-and-pasted all the "what's new"s into a blog entry (for reference without using all the BOL links, I guess). But BOL What's New is somewhat incomplete in some areas. So, here's a list I made up after listening to the SQLPASS keynotes yesterday. For your experimenting pleasure.


Available in CTP1

Always On HADR - subset of features (see team blog post)
Full-text - Property-based searches, better NEAR predicate
Spatial Enhancements (see whitepaper)
T-SQL Enhancements
Contained databases
SSIS Enhancements
SSIS Server
Impact Analysis and Lineage
Project Atlanta
User-defined Server Roles
Tools enhancements and SSMS uses VS2010 shell
XEvent enhancements - subset of enhancements
XEvents and SSIS Server in PowerShell provider, and Always On cmdlets.

Announced at SQLPASS, but not available in CTP1

Always on (additional features coming)
Full-text - FileTable
Columnar Storage/Indexes (Vertipak in Engine and SSAS)
Project Crescent
Project Juneau
BIDS integration with Visual Studio 2010
Data Quality Services - SSIS and Standalone
Default schemas for groups
DAC improvements
Semantic Search
Single Model for Analysis and Reporting
Master Data Services tab in Excel

Announced at TechEd 2011, but not in CTP1

T-SQL Analytical Function and OVER clause enhancements
Additional T-SQL functions
XEvent Session create/display GUI and parity with SQLProfiler
XEvents in SMO
Expansion of XEvents to more subsystems (SSAS, Replication, PDW)
Audit Enhancements (including Server Audit Specification for all SKUs)
Cryptography Enhancements
More Spatial enhancements
StreamInsight 1.2 (technically not part of Denali)

As always, corrections or additions are always welcome. You'll have to send them to me in email or on Twitter, though, I had to turn blog comments off.

@bobbeauch

Categories:
SQL Server 2012

The nearest neighbor query is one of the most common queries against spatial data. How many people haven't gone to a mapping app, typed in their current location and asked for the 10 nearest [your favorite thing goes here]? The obvious way to phrase the spatial query for this, given an STDistance method on the SqlGeometry data type, would be:

DECLARE @me = 'POINT (-121.626 47.8315)'
SELECT TOP(10) Location, Description, Location.STDistance(@me) AS distance
FROM [spatial_table]
ORDER BY distance

Unfortunately (for your performance, it does return the correct answer) in SQL Server 2008/R2, this query doesn't use the spatial index. And attempting to hint the spatial index results in the "could not produce a plan with your hints" error. Oh.

In SQL Server Denali, this just works. You do have to include a predicate that refers to STDistance, like:

...FROM [spatial_table]
WHERE Location.STDistance(@me) < 1000 -- 'where-distance' query
ORDER BY distance

but the predicate can also be something as unintrusive (ie you don't have to commit to a max distance) as:

...FROM [spatial_table]
WHERE Location.STDistance(@me) IS NOT NULL -- 'where is not null' query
ORDER BY distance

This uses the spatial index without hinting. And its way faster than the 2008 version of the same query. Excellent. But...when the "nearest neighbor" question came up once too often in SQL Server 2008, Isaac Kunen came up with an alternate algorithm that uses a numbers table.  And its quite fast, although you do have to hint the spatial index (or, let's say, I've never got it to use the index without hinting). So how does this algorithm do against the new, automatic spatial index use plan?

In the simple example where I tried this out, here's the plan cost:
Isaac's algorithm and hint:             0.517
Denali with WHERE and distance:  4.06683
Denali with WHERE..IS NOT NULL: 14.7

But the differences in query plans that involve spatial don't always seem to the "revelent" (the QO does sometimes underestimate the big difference using the spatial index will make). All the queries are subsecond, compared to 10 seconds or so when not using the index. So, any other comparison? How about I/O or worker time?

Issac's algorithm and hint:   1x worker time/reads/clrtime
Denali with wHERE and distance: 10x worker time/reads/clrtime
Denali with WHERE..IS NOT NULL:  3x worker time/reads/clrtime

This is interesting due to the fact that the plan cost was much less for the where-distance query vs. where-isnotnull query, but the resource cost is exactly the opposite. And more interesting is the fact that Isaac's numbers table-based algorithm wins both ways.

Caveats: The obvious big caveat is that this is only a single test case, albeit a fairly common one (both "@me" and the points-of-interest are points). YMMV. Second, none of the queries I used, run in SSMS, allow for a proper cardinality estimate at plan creation time (aka parameter sniffing). But attempting to use sp_executesql or a sproc so that it could be estimated, causes the Denali "where-distance" query to loop endlessly. Bug reported; it is afterall, CTP1. And sp_executesql with the "where-isnotnull" case, doesn't do much better, still worse than "numbers table". Third: Isaac's original algorithm is fragile in some edge cases (blank spatial table/no qualifying points, IIRC), Ed and I handled these cases in the slightly revised version in the chapter in Inside SQL Server 2008 Programming.

So, it appears that, even though the query is far less straightforward to write and the spatial index needs to be hinted, the numbers table method wins, so far. However, I'd still say that the nearest neighbor optimization is useful to have in the product. But it is still Denali CTP1. And it goes without saying, if you have a test that disproves this, send it right along, I'm always happy to see it.

@bobbeauch

Now about that "other" feature that could be called fullglobe in Denali that I mentioned. I don't know which feature will retain the moniker or whether they both will...that's why I'm not in marketing.

Here's the deal... the SQL Server spatial geography type is limited to a single hemisphere in size in SQL Server 2008 and R2. That doesn't mean that a single feature can't span hemispheres, it means the feature can't be bigger than a hemisphere. And it means that there is a "left-foot rule" that you must follow for geography polygons. You might also have heard it called "left-hand rule", but Ed renamed it, as there already is a well-known right-hand rule in mathematics and physics.

The rule is really meant for disambiguation; if I code up a polygon that describes Iceland, am I representing Iceland or "everything on earth that isn't Iceland"? The left-foot rule makes it unambiguous.

Anyhow, Denali removes the hemisphere limit, so you can represent "not Iceland". And provides a cool convenience feature to make this easy. So here's code inspired by my first (but not my last) teaching gig in Reykjavik, a few years ago, in December (thanks Gunner). I enjoyed myself immensely, especially in the geothermal-heated public pool near downtown (yes, its open in December, sorry, no link).

-- here's Iceland
SELECT geog
FROM dbo.cntry00
WHERE cntry_name = 'Iceland'

-- here's everywhere that's not Iceland
-- well, it better be "not Iceland" I can't see it in spatial results
SELECT geog.ReorientObject()
FROM dbo.cntry00
WHERE cntry_name = 'Iceland'

The only fallout I can envision from this is that some folks might have used the error to indicate their ring-orientation was wrong. Now it will just work, and you'll wind up with "big 'ol hunk o' geography" (TM pending) instead. So if you are one of those people beware...it works now. For the rest of the things that used to break and that this feature fixes reference Ed and Milan's excellent whitepaper.

@bobbeauch

Once upon a time, I wrote a blog entry about CircularString, CompoundCurve, and CurvePolygon (the new geometric types supported in Denali). And mentioned that they came from ISO-ANSI SQL/MM and weren't in the OGC spec. Well, lo and behold, while browsing the latest OGC Simple Features - Part 1 spec, the last incantation (version 1.2.1) does support 'em. We don't have all the OGC types (missing Curve, Surface, multi-versions of those, Polyhedral Surface, and TIN) but it's good to see the OGC support/add the ones we do have.

Speaking of the new types, I'm still looking for a visualizer for them. Spatial result tab was removed from SSMS CTP1 Denali, so I've got SSMS 2008 R2 installed. But of course that doesn't support the new geometric types, so curves, fullglobe, and types greater than a hemisphere large (these were added in Denali too) draw a blank. So I can draw them on graph paper. And in my mind... Neither Bing nor Google, that I'm aware of, support them. Wonder if SAFE FME visualizer does, hmm... You usually approximate them with linestrings/polygons, STCurveToLine and CurveToLineWithTolerance are your friends.

@bobbeauch

So what's the area of the earth? Well...depends on who's measurement you want to use.

There an interesting SQL Server Denali spatial feature known as fullglobe. Actually there's two features with that name, more later; I was using the FULLGLOBE geometric type (geometric type is the term used by OGC, this one only works with SQL Server's GEOGRAPHY data type). You create one like this:

DECLARE @g geography = ('FULLGLOBE', 4326) -- where 4326 is your SRID

It's a special extension to the OGC (that doesn't have a specific geography type as such either) that specifies the entire earth. So you can get the area of the earth by using @g.STArea(). After doing the "figure out the water area by subtracting the union of the countries' area" thing with it, I thought this might be a cool way to demonstrate one facet of SRIDs (that is spatial reference system identifiers).

There's a table called sys.spatial_reference_systems that lists all of the SRIDs supported by SQL Server's geography type (they use these in the computations). The also includes the OCG well-known text that describes them. One parameter describes the ellipsoid used by this SRID. Ellipsoids' measurements differ for two main reasons; different surveys and the fact that the earth changes shape slightly over time.

So you could probably guess (and you'd be right) that different SRIDs would produce different areas for the earth. Here's the code. Note that as a special bonus for using it, I found two new SRIDs in Denali, Unit Sphere (SRID 101) and Spherical Earth (SRID 126). Nice.

SELECT spatial_reference_id AS SRID,
       dbo.measure_globe(spatial_reference_id) AS Area,
       unit_of_measure,
       well_known_text,
       dbo.get_ellipsoid(well_known_text) AS Ellipsoid
FROM sys.spatial_reference_systems
ORDER BY area, ellipsoid, spatial_reference_id – whatever you want

CREATE FUNCTION dbo.measure_globe(@srid INT)
RETURNS FLOAT
AS
BEGIN
DECLARE @g geography = geography::STGeomFromText('FULLGLOBE',@srid);
RETURN @g.STArea()
END

CREATE FUNCTION dbo.get_ellipsoid(@wkt NVARCHAR(4000))
RETURNS NVARCHAR(4000)
AS
BEGIN
-- grungy string-handling alert
DECLARE @begin INT = PATINDEX('%ELLIPSOID%',@wkt) + 11
DECLARE @rest NVARCHAR(4000) = SUBSTRING(@wkt, @begin, 4000)
DECLARE @end INT = CHARINDEX('"', @rest)
RETURN SUBSTRING(@rest, 1, @end-1)
END
GO

@bobbeauch

I've been trying out the new Denali "AUTO_GRID" spatial index (that is ...USING GEOGRAPHY_AUTO_GRID or GEOMETRY_AUTO_GRID in the CREATE SPATIAL INDEX DDL) on some of my old spatial performance use cases. It's worth repeating (from Ed's excellent whitepaper) that the AUTOGRID index uses 8 levels of tessellation instead of the 4 levels that the original "manual grid" spatial index uses. Let's go on from there. BTW, If you're not familiar with spatial indexes you might want to skip to the last paragraph for the motivation first.

In the manual-grid index, you can specify a grid resolution of high/medium/low for each of the 4 levels. This equates to a grid cell size of 16 (low), 64 (medium), or 256 (high) cells for grid level. I've adopted a convension of calling an index with grid level of GRIDS = (HIGH, HIGH, HIGH, HIGH ), as "HHHH", GRIDS = (MEDIUM, LOW, MEDIUM, HIGH ) as "MLMH", etc.

Using this convention, the new AUTOGRID index is "HLLLLLLL" that is high-res at the first level, low-res at the other 7 levels. You can't specify an 8 level index AND specify grid-res at the same time, the only 8- level index supported is the "AUTOGRID". And you can't get the grid-res of an AUTOGRID via the usual mechanism, sys.spatial_index_tessellations reports NULL for these fields. I got the info from the spatial diagnostic stored procs.

So far, I've tried it on two use cases: point-in-polygon with a zipcode polygon and intersecting polygons (zipcode and congressional district). With the intersecting polygons, a "HHHH" index seems to outperform the autogrid by 17 vs 24 seconds with a "primed" data cache. With the point-in-polygon case, the autogrid beats the "HHHH" index by 14 vs 21 seconds with a primed data cache and 1000 points. I'm also encouraged by the fact that, given multiple spatial indexes, the query optimizer picked the correct (best) one in both cases.

However, I'm less impressed by the spatial diagnostic procedures, which picked the new autogrid as best both times, that is, Primary_Filter_Efficiency and Percentage_Of_Rows_NotSelected_By_Primary_Filter were better for the autogrid spatial index even when its not the better choice. Hmmm...well I can still adjust cells_per_object and try again... And perhaps I used a non-representative query sample in the disgnostic procs for the polygon case.

So what's this autogrid index for anyhow? It's there for software products that can automatically create a single spatial index for each spatial table/column. These products can get a "usually best" and at least "good enough" default spatial index without doing any special analysis of the spatial objects in the table or knowing what spatial queries you're going to perform. Autogrid is supposed to be often the best choice and certainly never a bad choice (as the old default of "MMMM" sometimes was. And if you can find a better spatial index for your specific queries, you're still allowed to use it, of course. Nice feature.

So here's a cheap demo of FTS property search in Denali. For the motivation for this feature, see the previous blog entry. I'm a minimalist (minimalism is a teaching aid), so its just the basics. In some places, I may be doing more than is minimally required for clarity.

First off, I didn't use the AdventureWorks version for Denali plus BOL examples to do this. I tried, though. AdventureWorks Production.Document table only has .doc files; the IFilter for these doesn't support extended property extraction according to BOL. And, just trying to put the pieces in place eventually gives me Msg 30045 "fulltext index error...". Don't know why. Rather than trying to debug this, I'll start from scratch.

First, acquire an IFilter that extracts extended properties. The Office 2010 IFilter Packs do. You must install the FilterPack on the SQL Server machine, naturally. After installation, refresh FTS's IFilter list from the registry.

exec sp_fulltext_service 'load_os_resources', 1

Now, create yourself an Office 2010 document (actually sounds like Office 2007 and above). Type some text. Change the Author and Title properties, check it with Windows Explorer (or you can accomplish this with Windows Explorer).

Create a database. Create a table with three columns. Here's mine:
create table dbo.docs (
 id int identity not null,
 doctype varchar(20),
 doc varbinary(max),
   constraint pk_id primary key (id)
)

Add your Office 2010 document from the file system using OPENROWSET BULK and specifying .docx in the doctype column.

Create a search property list. The example from BOL works fine, and they used the correct canonical GUIDs for Title, Author, and Tags.

Create a fulltext catalog and fullindex index. The fulltext index should reference your search property list. Specify change_tracking auto (the default) on the index or cause a full population of the fulltext index.

Now you can use the new systax to search for properties:
SELECT * FROM dbo.docs WHERE CONTAINS(PROPERTY(doc,'Author'), 'Bob');

Here's a script that does this. And even a document too. Enjoy.

fts_with_propertylist.sql (2.28 kb)

SampleDoc.docx (12.25 kb)

Years after its inception, I think I've figured out what WinFS was supposed to do. It sometime takes me a while. I got a new mp3 player (I don't have appliances at my house with cute monikers that start with the letter "i") and wondered why it sometime played songs "out of order". I like my songs (in any musical genre) in "album" order. You wouldn't play movements of a symphony out of order, order-sensitivity adds to the enjoyment. Finally realized that the player was not ordering songs by filename like my old player did, but by the extended attributes in the file (Title, Authors, Track, etc). There's lots of these in common multimedia files, have a look at the "Propeties" page in Windows Explorer if you have pictures taken with a digital camera.

What does this have to do with SQL Server, you ask? In SQL Server Denali, they added another full-text predicate to distinguish searches on specific properties. In previous versions, these properties would show up in a search but you couldn't distinguish "Author property contains bob" from "document contains bob". In Denali, you can.

Normally, you might not store items like pictures directly in a database just to search on property metadata. With the upcoming filetable feature (non-transactional access to filestreams, standard table layout, filestream storage) it might be worth your while just to have a cheap property/content search. BTW, "upcoming" means filetable doesn't work in CTP1, though they showed the feature at SQLPASS. But FTS + properties work.

Having figured out the reason for this feature (I think, you tell me if I'm wrong), in the next entry, I'll show you how to use it.

TechEd Europe 2010 ended yesterday. I thought I'd write about it when I got home, but now I'm unsure if that will happen in a timely manner. I'm waiting at the fourth different gate in Dulles for my flight home (last of three) to Portland. It's just been delayed by 3 hours, and I hope I get home before it's time for the next trip out. If I'd have just taken that direct-from-Frankfurt Luftansa flight, I'd have been home half a *day* ago. Sigh.

Well, back to TechEd. I had a great time and met up with lots of old friends. Attended the PowerShell dinner with the MVPs. Made some new friends. Spoke about some controversial topics and participated in a Denali-thon.

First talk was about the DAC (not dedicated admin connection, but data-tier application). After explaining the ultimate purpose behind it and the 5 reasons folks are afraid of it/think it sucks now...a miracle occured. These folks actually got it! Of course it helped to point out that incremental deployment was announced at SQLConnections last week (no timeframe). And DAC seems to be an obvious delivery mechanism for the new contained database in Denali.

There were two iterations of LINQ to Entities vs. Stored Procedures. The database afficianados seemed to think I was too easy on LINQ and the LINQ fans said I couldn't get object graphs with sprocs or anonymous types. Both were right, of course ;-)

Finally, my favorite, on Thursday noon. We were permitted to do an overview of Denali, and, even though the talk was announced the day before, the room was packed and an angry mob gathered outside; they were short on seats. We promised to redo things at the SQL Server booth, and I actually redid most of the demos at least twice.

Kevin Ashby started off the proceedings, followed by Justin Erikson, who gen'd up a cluster, implemented HADRON, and failed it over in 15 minutes. Impressive. Ralph Kemperdick talked about FTS and Semantic Search. His favorite. Then it was my turn. I approached the podium and screwed up my best Johnny Knoxville impression: "I'm Bob Beauchemin and this is 'how many Denali programming demos can you do in 15 minutes'". I did 12, but the feedback indicated that I was speaking a bit fast at the end. Steffen Krause finished up with a walkthrough of Project Cresant, which looked impressive.

Thanks to everyone that attended. Demos of my stuff will be posted to the SQLskills website soon.

Categories:
SQL Server 2012

Quick correction to my PowerShell Denali CTP1 post here. Max Trinidad did the obvious get-command call and found some new PowerShell cmdlets related to the new HADR feature. Knew I should have tried that. ;-)

Categories:
PowerShell | SQL Server 2012

OK, last one for the evening, its getting late in Berlin. And BTW, we might have some Denali goodness here at TechEd. Search for "futures", perhaps.


There a new hint for spatial queries in Denali CTP1, namely, SPATIAL_WINDOW_MAX_CELLS. This is a table hint that, as far as I can tell, this hint affects the query sample (ie, the "other side" of the STIntersects, for example) rather than affecting the spatial index itself. If you run this example (from the census database we used with SQL Jumpstart, else substitute your own query that uses a spatial index), you can see this in the query plan.

select * from zipcodes z with(SPATIAL_WINDOW_MAX_CELLS=1536)
join Congress c
on z.shapegeog.STIntersects(c.shapegeog)=1
where c.id = 116
 
1.       Run the query with "include actual execution plan" ON.
2.       Left-Click  on the step at the extreme right, that is, "Table-valued function, Get_Geography_Tessellation_Varbinary" to select it.
3.       Hit F4 to bring up the properties window (it’s also available on the View menu)
4.       Click on the line "Parameter List", second to the bottom. Click the three dots to being up a window with parameters.
5.       Without the hint, the value of the 2-nd last parameter is 1024 (max cells that they use to tessellation the other side (in the case, the congress table) to match to zipcode table (spatial index side)
6.       With the hint (of 1536), the value of the 2-nd last parameter is 1536
 
The default value is 1024 for both geography and geometry. That’s at odds with the Denali spatial whitepaper that claims "The default value is 512 for geometry and 768 for geography".
 
Also, note the variant that uses the table hint as a query hint, this one:
select * from zipcodes z
join Congress c
on z.shapegeog.STIntersects(c.shapegeog)=1
where c.id = 116
OPTION (TABLE HINT(z, SPATIAL_WINDOW_MAX_CELLS=1536))
 
This executes without syntax errors, but doesn’t use the hint. As it should, see http://technet.microsoft.com/en-us/library/ms181714.aspx . Look for "Using table hints as query hints" and see the examples at the bottom of the BOL page, where you would use this in a plan guide (which is the whole point. Being able to use it in a plan guide, for vendor-generated code, for example, might be a big deal.

Only because I've been waiting for it since I first saw try-catch back in 2005 beta 2 or something. In dirt-simple code:

begin try
  delete jobs
end try
begin catch
  throw
end catch

Nuff said. Error 547 is sooooo much more backward compatible than error 50000. Realize that, if throw is preceeded by another statement (begin catch is part of the codeblock, not a statement), the preceeding statement must end with a semicolon (you were putting in those semicolons when they were almost always optional, weren't you?):

begin try
delete jobs
end try
begin catch
print 'in catch block';
throw
end catch

Finally, all we need is a finally block. Well, maybe they could go all the way with ANSI SQL standard error handling. Some day. But it would have to be combined with a backward-compat switch. Maybe at a database level or something.

Categories:
SQL Server 2012

Once upon a time, there was to be a SQL Server-based file system for Windows. This file system would not only let you search on file properties like filename and file length, but also on what's known as "Extended Properties". For an idea of what extended properties refers to, have a look at the "Properties" tab on any Office Document, or jpg file (especially if you have a GPS camera) or MP3 files, or...other files like [your favorite file type goes here]. Wikipedia defines extended properties as "a file system feature that enables users to associate computer files with metadata not interpreted by the filesystem". To find some extended properties viewers and searchers, I simply did a search on "Windows OS Extended Properties". The APIs to enumerate these properties are the Windows Shell APIs.

In SQL Server Denali CTP1, you can use a new feature of Fulltext Search to search data stored in FTS, not only on content, but if the IFilter exposes it, on extended properties as well. According to BOL, you first need to create SEARCH_PROPERTY_LISTs and add the extended properties you'd like to search on:

CREATE SEARCH PROPERTY LIST DocumentProperties;
GO
ALTER SEARCH PROPERTY LIST DocumentProperties
   ADD 'Title'
   WITH ( PROPERTY_SET_GUID = 'F29F85E0-4FF9-1068-AB91-08002B27B3D9', PROPERTY_INT_ID = 2,
      PROPERTY_DESCRIPTION = 'System.Title - Title of the item.' );
GO -- add more properties

And alter the fulltext index to use your search property list:

ALTER FULLTEXT INDEX ON Production.Document
   SET SEARCH PROPERTY LIST DocumentProperties
   WITH NO POPULATION;
GO

ALTER FULLTEXT INDEX ON Production.Document
   START FULL POPULATION;

the available metadata looks like this:

select * from sys.registered_search_properties
select * from sys.registered_search_property_lists
select * from sys.dm_fts_index_keywords_by_property

Then you'd use them with the improved FTS functions:

SELECT Document FROM Production.Document WHERE CONTAINS(PROPERTY(Document,'Title'), 'Maintenance OR Repair');

Cool, eh? Now Office 2007 and above IFilters support these properties on Office docs. What is needed now is a way to extract property lists using the Windows Shell APIs. And more IFilters that support this. And....given that there's already FTS support over FILESTREAM columns, a way to easily expose the native filesystem as a set of tables with filestream columns.

WinFS, anyone...?

SQL Server Denali CTP1 was released today. I don’t want to be the fiftieth person to post the link to the download; reference Ed's blog posting for the location.

This is a repost (with some corrections now that the CTP is actually out) of a blog posting I did when someone discovered a web location of an updated serialization spec. It attempted to give an overview of where the new spatial types (like circular string and curves) came from and what they mean. I took down the blog posting when the spec "disappeared". But now that Ed Katibah blogged about it (be sure to read his excellent whitepaper), I guess it’s *really* OK now.

Everyone's on the lookout for info about features of SQL Server Denali. The docs folks may have given us some insight in the publication of an update of the "Microsoft SQL Server CLR Types Serialization Formats [MS-SSCLRT]" spec. This includes updates that "apply to SQL Server Denali CTP1" according to footnotes in the spec. A message about this spec update was posted on the Technet SQL Server spatial forum. The spec is publically posted.

Besides a bit indicating "this geography instance is bigger than half a hemisphere", there's four new types listed under OpenGIS Types: CircularString, CompoundCurve, CurvePolygon, and FullGlobe. The 1.2 versions of the OGC specs mention the first three. No mention of FullGlobe, that must technically not be a new type. So what are the three remaining types and what are they good for? That's more interesting than just knowing they exist. The best information I could find comes from the ISO-ANSI SQL/MM Part 3 spec.

In OGC's original type hierarchy (I couldn't find an updated one in their new specs, maybe I didn't look hard enough), LineString is the only direct subtype of Curve and Line and LinearRing derive from LineString. In SQL/MM's type hierarchy LineString, CircularString, and CompoundCurve all directly derive from Curve (an abstract type). So both CircularString and CompoundCurve "is a" Curve, just as LineString is. There is no Line and LinearRing in SQL/MM spec. In a somewhat similar fashion, Surface is declared as abstract type, CurvePolygon is the direct subtype, and Polygon is a subtype of CurvePolygon.

Back to LineString vs. CircularString vs CompoundCurve. A single segment of a LineString consists of two points connected by a straight line. But a single segment of a CircularString consists of three points that form an arc: start point, end point, and any point on the arc in between. You can have multiple segments; the end point of one segment can be the start point of the next segment. "A combination of linear and circular strings can be modeled using the ST_CompoundCurve type." (quote from the spec). Ah...so it would seem to be less work to define an arc as three points than approximate an arc with many small linestrings. More accurate too. That's (possibly) the goodness.

For CurvePolygon, to relate it to Polygon (which we all know and love) and Surface (the abstract class), from the spec: "Surfaces, as 2-dimensional geometries, are defined in the same way as curves using a sequence of points. The boundary of each surface is a curve, or a set of curves if the surface has any holes in it. The boundary of a surface consists of a set of rings, where each ring is a curve. The type ST_CurvePolygon stands for such a generalized surface, and the subtype ST_Polygon restricts the conditions for the rings of the boundary to linear strings."

Finnaly there was a question I just had to find the answer to: how then do you define a Circle? According to the new CTP1 docs, you need to define a circle in halves. So something like: declare @g geometry = 'COMPOUNDCURVE(CIRCULARSTRING(0 2, 2 0, 4 2), CIRCULARSTRING(4 2, 2 4, 0 2))'; might do the trick.

Just having a look at PowerShell support in SQL Server Denali CTP1. In Denali you must install PowerShell V2 or it will be installed as part of the installation process. Although the minishell architecture of SQLPS is unchanged so far. Rumor is that changing this (maybe into a module, like Chad Miller did?) might in the offing.

Not new cmdlets yet, unless I'm missing something, but there are enhancements to the provider. There are two new "subdirectories" in the provider, under the SQLSERVER: psdrive: XEvent and IntegrationServices. XEvent has two subdirectories, Packages and Sessions. Packages is a hierarchical rendering of metadata; the information you'd get from the sys.dm_xe_* metadata views. Sessions is a hierarchical rendering of information about currently defined sessions.The IntegrationServices subdirectory contains a Packages subdirectory (after the subdirectory specifying server\instance, naturally).

Having XEvents in the provides means that there is, of course, a set of classes that encapsulate XEvents in SMO. This was something that was missing in SQL Server 2008.So Jonathan Kehayias wrote one for his Extended Events Manager And where there is SMO, there is likely a set of dialogs in SSMS. Well, not really... there's a node off of the Management Node in SSMS and you can import event sessions from XML files. Not quite Extended Events Manager.

By browsing through the hierarchy (or looking at the sys.dm_xe_* views, you can also observe that XEvents has been greatly expanded in SQL Server 2011 CTP1. At the top level, there are three new XEvent packages, ucs, sqlclr, and a second sqlserver (same name but different package id). XEvent events in Denali have increased in number from 254 to 449; almost twice as many items can be tracked by extended events. I selected each event metadata set into tables on both releases and did the SQL equivalent of a diff, there's something new in almost every kind of metadata. For example There is a target called the "router" target that "routes events to listeners". But...routes to whom?

There may be a hint in two new DMVs for extended events, dbo.trace_xe_event_map and dbo.trace_xe_action_map. These DMVs map XEvent events to profiler events and XEvent actions to profiler columns. Perhaps an XEvent-based/profiler trace synergy of some kind?

Theme design by Nukeation based on Jelle Druyts