The new Semantic Search feature of SQL Server Denali enhances full-text search by letting you do a search on the meaning of documents. To use this feature, its required that you install the SemanticsDB. The SemanticsDB ships with SQL Server or can be downloaded as part of the SQL Server Denali CTP3 Feature Pack. One you have attached the SemanticsDB to your SQL Server instance and provided appropriate access to it, you can use the STATISTICAL_SEMANTICS keyword of CREATE FULLTEXT INDEX or ALTER FULLTEXT INDEX to build semantic indices, and use them with the three table-valued functions I mentioned in an earlier blog post. So what does the SemanticsDB provide?

The SemanticsDB contains Statistical Language Models. A starter description of this can be found in Wikipedia, but these models are used to build the semantic indexes that are used with the semantic search table-valued functions. There is actually one language model internal table for each supported language. The SemanticDB is static and you can't customize it or provide your own. It's not updated during semantic index processing, in fact, you can make the SemanticDB read-only with no ill effects to the feature. It's used to make the index-building scalable, as the speed of semantic index building is not dependant on the size of the corpus of documents.

Hope this gets you intrigued about this interesting new feature.

@bobbeauch

I've been working lately with SQL Server Denali's Semantic Search feature. There's a more complete description of this feature (as well as how to set it up, its an addition to full-text indexing) in BOL, but I just wanted to summarize my one sentence "raison d'etre" for it on a function by function basis. And mention a couple of the more obvious use cases.

First, the three functions:

SEMANTICKEYPHRASETABLE - Without a source_key argument, this is "show me all the keyphrases that appear in my entire corpus of documents". With a source_key, just substitute "...in this particular document". You have to know what's in your document collection, after all.

SEMANTICSIMILARITYTABLE - Given an exemplar document, this one is "show me the (usually TOP N) documents that have similar content to this one" on a scale of 0 to 1.0.

SEMANTICSIMILARITYDETAILSTABLE - Given two examplar documents, this one is "show me why you think these are the same" in terms of specific semantic keyphrases.
 
Here's a use case. Want to hire the perfect job candidate? Make up an exemplar resume in a standard format, filling in the qualifications of your perfect candidate. Then find, in your existing resume pile, the ones that are most like your perfect candidate. Check your work by checking the matching keyphrases for each of the TOP N candidates. Be careful though, candidates have ways of knowing what keyphrases you might want to hear... Do the interview anyway. And keep track of how semantic search's predication was.

Here's a couple more. Reading a book? Buying a product? The library or the store doesn't have *exactly* what you want? Find the closest match. And then the answer to why its the closest match.

And finally, a friend of mine, Andrew Fryer, did a blog post and set of videos about semantic search. Sounds like (from the name of the table in his blog post) that he might be matching up similar PowerPoint decks. Or at least the table is called MyDecks.

Hope this gets you going with semantic search. Can't wait till they can find similar songs and pictures, a la their presentation at PDC about it a few years ago. Maybe next release?

@bobbeauch

SQL Server Denali's new Filetable feature and full-text search are made for each other. Set up database for non-transactional filestream access with a directory name, create a table "AS FILETABLE" with a subdirectory name, drag over a bunch of files and turn on full-text search. Voila, instant search with two predicates and two TVFs. Even better, if you add the keyword SEMANTIC_STATISTICS after your column name in the CREATE FULLTEXT INDEX statement you have three new semantic search table-valued functions as well (provided that they're supported in your language). Add a property list and you can search on documents' extended properties. What could be easier? Looks quite like the functionality of the long-forgotten WinFS, if you ask me (although I don't remember any Semantic Search in WinFS, just property search).

There's just one thing. Creating a fulltext index on using a FILETABLE uses an additional internal table the fulltext docidmap.The reason for this table is to map full-text's docid (an integer) to the the FTS index's key column. In the FILETABLE's case this would be the stream_id column, which as marked as a ROWGUIDCOL. CREATE FULLTEXT INDEX will use the ROWGUIDCOL column as a key column if you don't specify a key column. FTS uses the docid to do its lookup.

But...

In SQL Server 2008, an optimization was introduced if and only if the FTS index key column was an INT or BIGINT. In the case where FTS index key column is an INT/BIGINT, no docidmap table is created when the FTS index is. Better yet, this optimization does away with the docidmap lookup step in the query plan. This step which basically adds another JOIN to every FTS and every Semantic Search query, is simply removed, improving the plan.

Because the FILETABLE's FTS key column is a UNIQUEIDENTIFIER, it can't take advantage of this optimization. You can't add columns to a FILETABLE either, so this isn't an option. Darn. One last question...because a UNIQUEIDENTIFIER (GUID) has a  bigger value space than BIGINT, what happen's when you get more than VALUESPACE_OF(BIGINT) documents in your FTS/Semantic corpus? Guess I'd need to have over 18,446,744,073,709,551,616 documents to find out...wait, how many are on that C drive again?

@bobbeauch

Theme design by Nukeation based on Jelle Druyts