简体   繁体   English

SQL服务器2005全文论坛搜索

[英]SQL Server 2005 Full Text forum Search

I'm working on a search stored procedure for our existing forums.我正在为我们现有的论坛开发一个搜索存储过程。

I've written the following code which uses standard SQL full text indexes, however I'm sure there is a better way of doing it and would like a point in the right direction.我编写了以下代码,它使用标准 SQL 全文索引,但是我确信有更好的方法可以做到这一点,并且希望指出正确的方向。

To give some info on how it needs to work, The page has 1 search text box which when clicked will search thread titles, thread descriptions and post text and should return the results with the title matches first, then descriptions then post data.为了提供一些关于它需要如何工作的信息,该页面有 1 个搜索文本框,单击该文本框将搜索主题标题、主题描述和发布文本,并且应该首先返回标题匹配的结果,然后是描述,然后发布数据。

Below is what I've written so far which works but is not elegant or as fast as I would like.以下是我到目前为止所写的内容,它有效,但并不优雅或没有我想要的那么快。 To give an example of performance with 20K threads and 80K posts it takes about 12 seconds to search for 5 random words.举一个 20K 线程和 80K 帖子的性能示例,搜索 5 个随机单词大约需要 12 秒。

ALTER PROCEDURE [dbo].[SearchForums]
(
    --Input Params
    @SearchText VARCHAR(200),
    @GroupId INT = -1,
    @ClientId INT,
    --Paging Params
    @CurrentPage INT,
    @PageSize INT,           
    @OutTotalRecCount INT OUTPUT
)
AS

--Create Temp Table to Store Query Data
CREATE TABLE #SearchResults
(
    Relevance INT IDENTITY,
    ThreadID INT,
    PostID INT,
    [Description] VARCHAR(2000),
    Author BIGINT
)

--Create and populate table of all GroupID's This search will return from
CREATE TABLE #GroupsToSearch
(
GroupId INT
)
IF @GroupId = -1
    BEGIN
        INSERT INTO #GroupsToSearch
        SELECT GroupID FROM SNetwork_Groups WHERE ClientId = @ClientId
    END
ELSE
    BEGIN
        INSERT INTO #GroupsToSearch
        VALUES(@GroupId)
    END

--Get Thread Titles
INSERT INTO #SearchResults
    SELECT 
        SNetwork_Threads.[ThreadId],
        (SELECT NULL) AS PostId,
        SNetwork_Threads.[Description],
        SNetwork_Threads.[OwnerUserId]
    FROM 
        SNetwork_Threads
        INNER JOIN SNetwork_Groups ON SNetwork_Groups.GroupId = SNetwork_Threads.GroupId        
    WHERE 
        FREETEXT(SNetwork_Threads.[Description], @SearchText) AND
        Snetwork_Threads.GroupID IN (SELECT GroupID FROM #GroupsToSearch) AND
        SNetwork_Groups.ClientId = @ClientId


--Get Thread Descriptions
INSERT INTO #SearchResults
    SELECT 
        SNetwork_Threads.[ThreadId],
        (SELECT NULL) AS PostId,
        SNetwork_Threads.[Description],
        SNetwork_Threads.[OwnerUserId]
    FROM 
        SNetwork_Threads
        INNER JOIN SNetwork_Groups ON SNetwork_Groups.GroupId = SNetwork_Threads.GroupId        
    WHERE 
        FREETEXT(SNetwork_Threads.[Name], @SearchText) AND
        Snetwork_Threads.GroupID IN (SELECT GroupID FROM #GroupsToSearch) AND
        SNetwork_Groups.ClientId = @ClientId


--Get Posts
INSERT INTO #SearchResults
    SELECT 
        SNetwork_Threads.ThreadId,
        SNetwork_Posts.PostId,
        SNetwork_Posts.PostText,
        SNetwork_Posts.[OwnerUserId]
    FROM 
        SNetwork_Posts 
        INNER JOIN SNetwork_Threads ON SNetwork_Threads.ThreadId = SNetwork_Posts.ThreadId
        INNER JOIN SNetwork_Groups ON SNetwork_Groups.GroupId = SNetwork_Threads.GroupId        
    WHERE 
        FREETEXT(SNetwork_Posts.PostText, @SearchText) AND
        Snetwork_Threads.GroupID IN (SELECT GroupID FROM #GroupsToSearch) AND
        SNetwork_Groups.ClientId = @ClientId


--Return Paged Result Sets
SELECT @OutTotalRecCount =  COUNT(*) FROM #SearchResults
SELECT  
    #SearchResults.[ThreadID],
    #SearchResults.[PostID],
    #SearchResults.[Description],
    #SearchResults.[Author]
FROM  
    #SearchResults          
WHERE  
    #SearchResults.[Relevance] >= (@CurrentPage - 1) * @PageSize + 1 AND 
    #SearchResults.[Relevance] <= @CurrentPage*@PageSize
ORDER BY Relevance ASC


--Clean Up
DROP TABLE #SearchResults
DROP TABLE #GroupsToSearch

I know its a bit long winded but just a nudge in the right direction would be well appreciated.我知道它有点啰嗦,但只要朝正确的方向轻推将不胜感激。

Incase it helps 80% of the query time is taken up when search posts and according to teh query plan is spent on "Clustered Index Scan" on the posts table.以防它有助于在搜索帖子时占用 80% 的查询时间,并且根据查询计划花费在帖子表上的“聚集索引扫描”上。 I cant see anyway around this.我无论如何都看不到这个。

Thanks谢谢

Gavin加文

I'd really have to see an explain plan to know where the slow parts were, as I don't see anything particularly nasty in your code.我真的必须看到一个解释计划才能知道慢的部分在哪里,因为我在你的代码中没有看到任何特别讨厌的东西。 Very first thing - make sure all your indexes are in good shape, they are being used, statistics are up to date, etc.第一件事 - 确保所有索引都处于良好状态,正在使用它们,统计数据是最新的,等等。

One other idea would be to do the search on thread title first, then use the results from that to prune the searches on thread description and post text.另一个想法是首先搜索线程标题,然后使用结果来修剪线程描述和发布文本的搜索。 Similarly, use the results from the thread description search to prune the post text search.同样,使用线程描述搜索的结果来修剪帖子文本搜索。

The basic idea here is that if you find the keywords in the thread title, why bother searching the description and posts?这里的基本思想是,如果您在线程标题中找到关键字,为什么还要搜索描述和帖子? I realize this may not work depending on how you are presenting the search results to the user, and it may not make a huge difference, but it's something to think about.我意识到这可能不起作用,具体取决于您向用户呈现搜索结果的方式,并且可能不会产生很大的不同,但这是需要考虑的事情。

80k records isn't that much. 80k 条记录并不多。 I'd recommend not inserting the resulting data into your temp table, and instead only inserting the IDs, then joining to that table afterward.我建议不要将结果数据插入临时表,而只插入 ID,然后再加入该表。 This will save on writing to the temp table, as you may store 10000 ints, instead of 10000 full posts (of which you discard all but one page of).这将节省写入临时表的时间,因为您可以存储 10000 个整数,而不是 10000 个完整的帖子(您丢弃除了一页之外的所有帖子)。 This may reduce the amount of time spent scanning posts, as well.这也可以减少扫描帖子所花费的时间。

It looks like you would need two temp tables, one for threads and one for posts.看起来您需要两张临时表,一张用于线程,一张用于帖子。 You would union them in the final select.您可以将它们合并到最终的 select 中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM