SQL 2008 Fulltext Query Performance issue using FREETEXT and CONTAINS June 6, 2010
Posted by Chirag in Full Text Search, SQL 2005, SQL 2008, SQL 2008 R2, SQL Server 2008.Tags: Fulltext, Performance, SQL 2008, SQL 2008 R2, T-SQL
trackback
Of all the things that have improved in SQL 2008 Fulltext seems to have taken a step backwards. We were helping one of our clients upgrade from SQL 2005 to SQL 2008. They are a 24×7 shop with a framework that supports various websites for their customers. Fulltext is heavily used for text based searches on the websites. We had set up a test rig for them on SQL 2008 to test their applications. The testing process revealed sever performance issues with the text based searches. We investigated and identified the cause of the performance degradation to be the FREETEXT clause used in a multiple OR condition.
Issue:
A fulltext query using the FREETEXT predicate in a multiple OR combination works magnitudes slower in SQL 2008 than in SQL 2005 like for like vanilla installs. We tested this in SQL 2008/R2 and the same issue persists so it pointed to an engine issue in the way fulltext query plans were generated and handled in SQL 2008.
Eg.
SELECT A.Col1, A.Col2, B.Col1, B.Col1….
FROM TableA AS A LEFT JOIN TableB AS B ON A.ID = B.A_ID
WHERE FREETEXT((A.Name, A.Description), ‘SomeSearchText’)
OR FREETEXT((B.Name, B.Description),’SomeSearchText’)
The fulltext catalog encapsulates multiple tables which each have a single fulltext index.
Workaround:
After thorough investigation we identified a workaround for this issue, it is to separate the query using a UNION clause which has extremely good performance on SQL 2008/R2.
Eg.
SELECT A.Col1, A.Col2, B.Col1, B.Col1….
FROM TableA AS A LEFT JOIN TableB AS B ON A.ID = B.A_ID
WHERE FREETEXT((A.Name, A.Description), ‘SomeSearchText’)
UNION
SELECT A.Col1, A.Col2, B.Col1, B.Col1….
FROM TableA AS A LEFT JOIN TableB AS B ON A.ID = B.A_ID
WHERE FREETEXT((B.Name, B.Description),’SomeSearchText’)
Unfortunately, this leads to code bloat and may not be the best approach to resolve this issue as it could mean many man hours to update the related code which could be hundreds of stored procedures as was in our clients case.
We raised a PSS case with Microsoft to identify the root cause and possibly provide a solution for the same.
I have attached the repro scripts I provided Microsoft to help them recreate the issue:
- FREETEXT_Issue_CreateSchema.sql => Contains the Sample Schema Creation Script
- FREETEXT_Issue_Query.sql => Contains the Sample Problem Queries and the Workarounds we have found
Instructions to Reproduce the issue(Run in SQL 2005 and SQL 2008 instances preferably similar hardware/software settings):
- Run the FREETEXT_Issue_CreateSchema.sql Script => this will create the schema and populate the tables and the fulltext catatlog
- Wait for the fulltext catalog to finish population
- In a separate window run the FREETEXT_Issue_Query.sql script, switch to the messages tab and compare the CPU times in the output.
Microsoft confirmed that this was a BUG in the SQL 2008/R2 and infact affected the CONATINS clause too. They confirmed that it would take a substantial design change to fix this issue. After much persistence they have confirmed that a fix will be released as a Cumulative update “soon”. I will update this blog with the appropriate link for the Cumulative update as soon as it is released.
UPDATE: (24th August 2010)
Microsoft has released a patch which resolves this issue in the SQL 2008 SP1 CU9 rollout. You can obtain it from here: http://support.microsoft.com/kb/2083921. I am still awaiting confirmation for SQL 2008 R2. Will update as soon as I have more news.
UPDATE: (27th January 2011)
Microsoft has released a patch which resolves this issue in SQL 2008 R2 CU4 rollout. You can obtain it from here: http://support.microsoft.com/kb/2264562. Although many readers have informed me that the workaround still works faster. Test it and let me know.
Fantastic! My full-text query has now gone from 90 seconds to ~3 seconds. Thanks for posting this – I’ll keep an eye out for the cumulative update.
Doesn’t look like it’s been fixed in CU2:
http://support.microsoft.com/kb/2072493/en-us
This is the exact problem I’ve been running into as well (using CONTAINS) but its not just SQL 2008…I see it also in SQL 2005 (more specifically SP2 on a Windows 2003 server…very standard install). Are you certain about SQL2005 not having the same issue?
Also, I’m curious if there is any official Microsoft links related to this issue that I can use for documentation? Thanks.
Hello Chirag,
Many thanks for this post.
We are facing a similar problem with full text searches with “OR” clauses.
I will appreciate if you can let us know about the cumulative update.
Regards,
Ed.
“Soon” isn’t soon enough – We expeienced the same issue today when we upgraded to SQL 2008. A query that took 6 seconds to resolve on SQL 2005 took over 5 minutes to complete natively on SQL 2008. To say its a bug is an understatement.
We’re not going to re-write hundreds of SP’s either. I’ll use SQL 2005 on our new server instead!
I’ll keep my eye on your post for a patch.
Hi Rui
Good news! Just checked with Microsoft and seems they had released a patch for this with the SQL 2008 SP1 CU9 rollout. You can get it from here: http://support.microsoft.com/kb/2083921
I am still awaiting conformation for SQL 2008 R2.
Cheers
Chirag
Weird, in http://support.microsoft.com/kb/2264562/ it says in the title “SQL Server 2008″ and it only mentions 2008 / SP1 in the fix. However at the bottom of the article, in the “Applies To” section, it only lists 2008 R2, not 2008. Yet there is no fix to be found in any of the cumulative updates for R2.
Hi Aaron, I have already raised this with the Microsoft PSS Engineer who was handling the case. Seems a bit misleading on the KB. Will provide update on this as soon as I hear back. Many Thanks.
Regards
Chirag
Hello,
any updates from Microsoft, I face the same problem in R2
There appears to be a fix in CU4 for SQL2008 R2 and whilst it’s made some improvement in that it has reduced a simple FT search (using ‘contains’ and ‘OR’) from over 3 minutes down to 1min30, the union solution above runs in 4secs the first time it’s runs and 0secs for subsequent runs!
So thank you very much; once I’ve refactored my app that creates the where clauses, my users are going to be very much happier than they are at the moment.
Best wishes
Barry
Hi Barry
Thanks for the update on CU4. Glad at least the workaround works for you.
Regards
Chirag
Hi Nabeel
There is a fix in CU4 which can be downloaded from http://support.microsoft.com/kb/2264562
Though some have told me the workaround still produces faster results.
Regards
Chirag