WebsiteSpark

Saturday 21 January 2012

Introduction to SQL 2008 full text search

In this introduction, we will see how to setup full-text, setup full-text indexing and basic search on full-text and some full-text management scripts. In the next advanced full-text, we will see the advanced features of full-text search options.  
Full text search actually is a search based on special index called full-text index it’s not regular search based on SQL LIKE operator with % and ? wild characters. In order to perform Full-text search, first thing needed is a full-text-catalog on the database and next full-text index on the table. The full text-index can include all char based columns namely char, varchar, text, nchar, nvarchar, ntext, and varbinary(max).
We will be performing our demo on Microsoft’s famous AdventureWorks2008R2 database. You can get a copy of this sample database from http://msftdbprodsamples.codeplex.com/releases/view/55926
Let’s take the first step by building a full-text catalog on AdventureWorks2008R2 database.

CREATE FULLTEXT CATALOG AWFTC
AS DEFAULT;
GO


Name of our full-text catalog is AWFTC and it’s the default catalog meaning all the full-text indexes will be created on this catalog if catalog is unspecified. For detailed documentation, visit BoL at
http://msdn.microsoft.com/en-us/library/ms189520.aspx
Commands for altering or to drop FTC are as follows

-- rebuild
ALTER FULLTEXT CATALOG AWFTC REBUILD;
GO
--reorganize
ALTER FULLTEXT CATALOG AWFTC REORGANIZE;
GO
--Drop
DROP FULLTEXT CATALOG AWFTC ;
GO


The difference between REBUILD & REORGANIZE is that REBUILD is an offline process while REORGANIZE is an online process, hence making REORGANIZE a slow process than REBUILD.
Best place to begin using full-text queries is the ProductDescription table within Production schema. One good advantage of this sample database is that we have data structure and data in place to play with.

select count(1) from Production.Product
------------------------------
SELECT
      p.Name,
      p.ProductID,
      d.Description,
      pd.CultureID
FROM
      Production.Product p
      LEFT JOIN Production.ProductModelProductDescriptionCulture pd
                                    ON pd.ProductModelID  = p.ProductModelID
                                    and pd.CultureID = 'en'
      LEFT JOIN Production.ProductDescription d
                                    ON d.ProductDescriptionID = pd.ProductDescriptionID
ORDER BY 2,1,4
------------------------------
SELECT
      Count(1)
FROM
      Production.Product p
      Inner JOIN Production.ProductModelProductDescriptionCulture pd
                                    ON pd.ProductModelID  = p.ProductModelID
                                    --and pd.CultureID = 'en'
      Inner JOIN Production.ProductDescription d
                                    ON d.ProductDescriptionID = pd.ProductDescriptionID
--ORDER BY 2,1,4



There are 504 products and not all the product has English description and some of the products have description in multiple languages defined.
Now let’s build a full-text index on Production.ProductDescription table on Description column.

CREATE FULLTEXT INDEX ON Production.ProductDescription
([Description])
KEY INDEX PK_ProductDescription_ProductDescriptionID
ON AWFTC
WITH CHANGE_TRACKING AUTO;


The “On” is optional if we didn’t specify this, then the full-text index will be created on the default catalog which is “AWFTC” in our case. WITH CHANGE_TRACKING defines tracking nature of our index in case of and DML operation on the table. The options are AUTO, MANUAL and OFF. In case of AUTO, SQL server automatically updates the index whenever a DML is performed on the table, thereby hampering performance of the DML operation. In manual mode, SQL server tracks the changes but does not populate the index and manually we have to issue the command to populate the index through scheduling or running the “ALTER FULLTEXT INDEX … START UPDATE POPULATION” statement on the index. Similarly for OFF, where SQL server does not even track the changes made by DML on the table and waits for user to issue “START FULL POPULATION” or “START INCREMENTAL POPULATION” statement.
Selecting using full-text indexes there are two basic ways for selecting using the full-text indexes, we shall leave rest to be covered in advanced topic.
The first one is CONTAINS and the other is FREETEXT.

SELECT ProductDescriptionID as PDID, [Description], *
FROM [Production].[ProductDescription] pd
WHERE CONTAINS(*, 'light');
go


-- freetext
SELECT ProductDescriptionID as PDID, [Description], *
FROM [Production].[ProductDescription] pd
WHERE FREETEXT(*, 'light');
go



The first query returns 7 rows while the second one returns 9 rows. CONTAINS returns the rows with has exactly matching word “light” within it, while FREETEXT in addition returns words like “Lighter” and “Lightest”.
Some more examples of select using CONTAINS clause

SELECT ProductDescriptionID as PDID, [Description], *
FROM [Production].[ProductDescription] pd
WHERE CONTAINS(*, '"delivers performance"');
go

SELECT ProductDescriptionID as PDID, [Description], *
FROM [Production].[ProductDescription] pd
WHERE CONTAINS(*, 'delivers ~ performance');
go

SELECT ProductDescriptionID as PDID, [Description], *
FROM [Production].[ProductDescription] pd
WHERE CONTAINS(*, 'delivers NEAR performance');
go


SELECT ProductDescriptionID as PDID, [Description], *
FROM [Production].[ProductDescription] pd
WHERE CONTAINS(*, 'delivers OR performance');
go

SELECT ProductDescriptionID as PDID, [Description], *
FROM [Production].[ProductDescription] pd
WHERE CONTAINS(*, 'delivers AND performance');
go

No comments:

Post a Comment