I've been looking at full-text query plans in Denali lately, and looking at the SQL query plan for the full-text part, there isn't much to see (modulo docid lookup or lack of it, see a couple of posts ago). Everything seemed to be hidden behind the TableValuedFunction iterator for FulltextMatch function, and although there's the usual iterator info on this one, there's nothing about how they do the full-text query. Not even in the parameters passed to the TVF. So what does FulltextMatch do?

Notice I said "seemed" to be hidden. Just ran across some new extended events for FTS that are quite interesting.These are fulltext_exec_query_stats and fulltext_query_recompile. Note: I don't find these events pre-Denali. Sounds promising. Started up an Extended Event session with these, set on "Watch Live Data" (did I already say a few times how much I liked the new "extended event trace" GUI?). Along with the action for sql_text.

For a full-text query that had a predicate that read like this: "WHERE CONTAINS(file_stream, 'NEAR(("data", "SQL"), 5, FALSE)');", fulltext_exec_query_stats retrieved this lovely chunk of XML. Hope it doesn't get munged in the poster or your reader.

<Root FragmentSelectionTimestamp="0x000001ec34" QueryHandle="1241413008" IsParallel="false">
 <Scalar Name="ContainsTableSSERankForNear" InclusiveTime="0ms" RowCount="92">
  <AND Name="AND" InclusiveTime="0ms" RowCount="92">
   <Filter Name="SingleFragmentDocidFilter" InclusiveTime="0ms" RowCount="124" FilterTableStrategy="Scan" SetRangeCalls="2">
    <Filter Name="SingleFragmentSeekFilter" InclusiveTime="0ms" RowCount="124">
     <Fragment Keyword="SQL" InclusiveTime="0ms" RowCount="124">
      <PhysicalRead InclusiveTime="0ms" RowCount="19" Scans="1" LogicalReads="6" PhysicalReads="0" ReadAheads="0" LobLogicalReads="0" LobPhysicalReads="0" LobReadAheads="0" />
      <Decompression InclusiveTime="0ms" RowCount="124" />
     </Fragment>
    </Filter>
  </Filter>
  <Filter Name="SingleFragmentDocidFilter" InclusiveTime="0ms" RowCount="124" FilterTableStrategy="Scan" SetRangeCalls="2">
   <Filter Name="SingleFragmentSeekFilter" InclusiveTime="0ms" RowCount="124">
    <Fragment Keyword="data" InclusiveTime="0ms" RowCount="124">
     <PhysicalRead InclusiveTime="0ms" RowCount="18" Scans="1" LogicalReads="6" PhysicalReads="0" ReadAheads="0" LobLogicalReads="0" LobPhysicalReads="0" LobReadAheads="0" />
     <Decompression InclusiveTime="0ms" RowCount="124" />
    </Fragment>
   </Filter>
   </Filter>
  </AND>
 </Scalar>
</Root>

VERY cool. So, although there's no magic decoder ring for this one (yet, I'm working on it), you can deduce (OK, guess) the following just by reading:
   FTS in Denali has the ability to execute this function in Parallel (although I haven't made it do this yet)
   The CONTAINS predicate appears to be implemented using CONTAINSTABLE to begin with. They seem to be using a "rank for NEAR" rather than the CONTAINSTABLE rank.
   I think for "SSE" may have something to do with FTS using streaming SIMD Extensions
   The FTS portion of the query has its own cache (haven't hit the full_text_recompile event yet).
   Each FTS "Filter iterator" (for lack of a better term) has its own strategy and records its own I/O info, timing, and row count.
   BTW, There are a lot more topmost-level iterators than in this relatively simple query. There's one (sometimes multiple) for FREETEXT, TOP N, etc.

Nice. If you're tired of wondering, "what are those FTS predicates and TVFs doing", check out this mechanism for looking into it.

@bobbeauch

SQL Server Denali's new Filetable feature and full-text search are made for each other. Set up database for non-transactional filestream access with a directory name, create a table "AS FILETABLE" with a subdirectory name, drag over a bunch of files and turn on full-text search. Voila, instant search with two predicates and two TVFs. Even better, if you add the keyword SEMANTIC_STATISTICS after your column name in the CREATE FULLTEXT INDEX statement you have three new semantic search table-valued functions as well (provided that they're supported in your language). Add a property list and you can search on documents' extended properties. What could be easier? Looks quite like the functionality of the long-forgotten WinFS, if you ask me (although I don't remember any Semantic Search in WinFS, just property search).

There's just one thing. Creating a fulltext index on using a FILETABLE uses an additional internal table the fulltext docidmap.The reason for this table is to map full-text's docid (an integer) to the the FTS index's key column. In the FILETABLE's case this would be the stream_id column, which as marked as a ROWGUIDCOL. CREATE FULLTEXT INDEX will use the ROWGUIDCOL column as a key column if you don't specify a key column. FTS uses the docid to do its lookup.

But...

In SQL Server 2008, an optimization was introduced if and only if the FTS index key column was an INT or BIGINT. In the case where FTS index key column is an INT/BIGINT, no docidmap table is created when the FTS index is. Better yet, this optimization does away with the docidmap lookup step in the query plan. This step which basically adds another JOIN to every FTS and every Semantic Search query, is simply removed, improving the plan.

Because the FILETABLE's FTS key column is a UNIQUEIDENTIFIER, it can't take advantage of this optimization. You can't add columns to a FILETABLE either, so this isn't an option. Darn. One last question...because a UNIQUEIDENTIFIER (GUID) has a  bigger value space than BIGINT, what happen's when you get more than VALUESPACE_OF(BIGINT) documents in your FTS/Semantic corpus? Guess I'd need to have over 18,446,744,073,709,551,616 documents to find out...wait, how many are on that C drive again?

@bobbeauch

When I originally wrote the SQL Server Denali full-text search on properties cheap demo, I used property sets with properties (GUIDs) that I got from the SQL Server BOL example (although the example itself didn't work). As it says in BOL, CREATE SEARCH PROPERTY LIST creates an empty property list. Which got me to wondering, where do the property GUIDs for these properties come from? There must be a canonical list somewhere.

And indeed, there is. There's a list in the documentation for "Windows Properties" located on the web here. And there's a LOTS of 'em on which to search. After going back and forth from that web page a few hundred times, I thought that there must be a file that contains them in an easy-to-parse format. Well, there's one of those too; propkey.h from platform SDK. Nice.

But...having the properties/property sets is not enough. You need to have an IFilter that parses out these individual properties as well. Although I've found some third party IFilters on the web that deal with some (few) properties, I'm still looking for one that parses all the EXIF properties out of a JPG file from GPS-enabled cameras. If you find one of those, please let me know.

Happy property searching...

@bobbeauch

So here's a cheap demo of FTS property search in Denali. For the motivation for this feature, see the previous blog entry. I'm a minimalist (minimalism is a teaching aid), so its just the basics. In some places, I may be doing more than is minimally required for clarity.

First off, I didn't use the AdventureWorks version for Denali plus BOL examples to do this. I tried, though. AdventureWorks Production.Document table only has .doc files; the IFilter for these doesn't support extended property extraction according to BOL. And, just trying to put the pieces in place eventually gives me Msg 30045 "fulltext index error...". Don't know why. Rather than trying to debug this, I'll start from scratch.

First, acquire an IFilter that extracts extended properties. The Office 2010 IFilter Packs do. You must install the FilterPack on the SQL Server machine, naturally. After installation, refresh FTS's IFilter list from the registry.

exec sp_fulltext_service 'load_os_resources', 1

Now, create yourself an Office 2010 document (actually sounds like Office 2007 and above). Type some text. Change the Author and Title properties, check it with Windows Explorer (or you can accomplish this with Windows Explorer).

Create a database. Create a table with three columns. Here's mine:
create table dbo.docs (
 id int identity not null,
 doctype varchar(20),
 doc varbinary(max),
   constraint pk_id primary key (id)
)

Add your Office 2010 document from the file system using OPENROWSET BULK and specifying .docx in the doctype column.

Create a search property list. The example from BOL works fine, and they used the correct canonical GUIDs for Title, Author, and Tags.

Create a fulltext catalog and fullindex index. The fulltext index should reference your search property list. Specify change_tracking auto (the default) on the index or cause a full population of the fulltext index.

Now you can use the new systax to search for properties:
SELECT * FROM dbo.docs WHERE CONTAINS(PROPERTY(doc,'Author'), 'Bob');

Here's a script that does this. And even a document too. Enjoy.

fts_with_propertylist.sql (2.28 kb)

SampleDoc.docx (12.25 kb)

Years after its inception, I think I've figured out what WinFS was supposed to do. It sometime takes me a while. I got a new mp3 player (I don't have appliances at my house with cute monikers that start with the letter "i") and wondered why it sometime played songs "out of order". I like my songs (in any musical genre) in "album" order. You wouldn't play movements of a symphony out of order, order-sensitivity adds to the enjoyment. Finally realized that the player was not ordering songs by filename like my old player did, but by the extended attributes in the file (Title, Authors, Track, etc). There's lots of these in common multimedia files, have a look at the "Propeties" page in Windows Explorer if you have pictures taken with a digital camera.

What does this have to do with SQL Server, you ask? In SQL Server Denali, they added another full-text predicate to distinguish searches on specific properties. In previous versions, these properties would show up in a search but you couldn't distinguish "Author property contains bob" from "document contains bob". In Denali, you can.

Normally, you might not store items like pictures directly in a database just to search on property metadata. With the upcoming filetable feature (non-transactional access to filestreams, standard table layout, filestream storage) it might be worth your while just to have a cheap property/content search. BTW, "upcoming" means filetable doesn't work in CTP1, though they showed the feature at SQLPASS. But FTS + properties work.

Having figured out the reason for this feature (I think, you tell me if I'm wrong), in the next entry, I'll show you how to use it.

Once upon a time, there was to be a SQL Server-based file system for Windows. This file system would not only let you search on file properties like filename and file length, but also on what's known as "Extended Properties". For an idea of what extended properties refers to, have a look at the "Properties" tab on any Office Document, or jpg file (especially if you have a GPS camera) or MP3 files, or...other files like [your favorite file type goes here]. Wikipedia defines extended properties as "a file system feature that enables users to associate computer files with metadata not interpreted by the filesystem". To find some extended properties viewers and searchers, I simply did a search on "Windows OS Extended Properties". The APIs to enumerate these properties are the Windows Shell APIs.

In SQL Server Denali CTP1, you can use a new feature of Fulltext Search to search data stored in FTS, not only on content, but if the IFilter exposes it, on extended properties as well. According to BOL, you first need to create SEARCH_PROPERTY_LISTs and add the extended properties you'd like to search on:

CREATE SEARCH PROPERTY LIST DocumentProperties;
GO
ALTER SEARCH PROPERTY LIST DocumentProperties
   ADD 'Title'
   WITH ( PROPERTY_SET_GUID = 'F29F85E0-4FF9-1068-AB91-08002B27B3D9', PROPERTY_INT_ID = 2,
      PROPERTY_DESCRIPTION = 'System.Title - Title of the item.' );
GO -- add more properties

And alter the fulltext index to use your search property list:

ALTER FULLTEXT INDEX ON Production.Document
   SET SEARCH PROPERTY LIST DocumentProperties
   WITH NO POPULATION;
GO

ALTER FULLTEXT INDEX ON Production.Document
   START FULL POPULATION;

the available metadata looks like this:

select * from sys.registered_search_properties
select * from sys.registered_search_property_lists
select * from sys.dm_fts_index_keywords_by_property

Then you'd use them with the improved FTS functions:

SELECT Document FROM Production.Document WHERE CONTAINS(PROPERTY(Document,'Title'), 'Maintenance OR Repair');

Cool, eh? Now Office 2007 and above IFilters support these properties on Office docs. What is needed now is a way to extract property lists using the Windows Shell APIs. And more IFilters that support this. And....given that there's already FTS support over FILESTREAM columns, a way to easily expose the native filesystem as a set of tables with filestream columns.

WinFS, anyone...?

Theme design by Nukeation based on Jelle Druyts