简体   繁体   中英

SQL Counting occurrences of document keywords

I'm trying to work out the most efficient way to count the number of occurrences that a keyword appears in my document table based on a specific list of document ids passed into my stored procedure.

The SP takes a parameter @DocIds as a comma seperated list eg 100, 2010, 2340

What I am wanting to do is select the records where the DocID exists in the comma seperated list I'm passing in and records the keywords into a temporary table but also keep a count if the keyword has already been added to my temp table.

So for example (document table):

DocID | Keywords
-----------------------------
100   | Test, Document, Info
2010  | Document, users
4     | ....    
2340  | users, client  

Temp table would return:

Keyword  | Count
Test     | 1
Document | 2
Info     | 1  
users    | 2
client   | 1

I'm sure some SQL guru has a great solution for this any help would be greatly appreciated.

Many thanks M

Here's a solution for SQL Server 2005+. It uses a recursive CTE to produce the counts of words

Sample data and temp table creation

CREATE Table #Temp ([Count] int, Keyword varchar(max) );

DECLARE @document AS TABLE ( 
  docid    INT, 
  keywords VARCHAR(MAX)) 

INSERT INTO @document 
VALUES      (100, 'Test, Document, Info'), 
            (2010, 'Document, users'), 
            (4, '....'), 
            (2340, 'users, client')

Query

 ; WITH cte(docid, word, keywords) 
         AS (SELECT docid, 
                    LEFT(keywords, Charindex(',', keywords + ',') - 1), 
                    Stuff(keywords, 1, Charindex(',', keywords + ','), '') 
             FROM   @document 
             UNION ALL 
             SELECT docid, 
                    LEFT(keywords, Charindex(',', keywords + ',') - 1), 
                    Stuff(keywords, 1, Charindex(',', keywords + ','), '') 
             FROM   cte 
             WHERE  keywords > '') 
    INSERT INTO #Temp ([Count], Keyword)
    SELECT COUNT(docid), 
           Ltrim(Rtrim(word)) 
    FROM   cte 
    GROUP  BY Ltrim(Rtrim(word)) 

    SELECT [Count], Keyword FROM #temp

Output

Count       Keyword
--------    -----
1           ....
1           client
2           Document
1           Info
1           Test
2           users

I think you have to run a query for each keyword

INSERT INTO tmp VALUES ('users',(
   SELECT COUNT(DocID) FROM Documents WHERE keywords LIKE '%users%')
)

Ask Tom , has a technique for selecting out of a keyword list. With that technique, and a GROUP BY keyword, you can get the COUNT(*) which is exactly what you are looking for.

Assuming you're using SQL Server, look at one of the many answers for splitting a string into individual rows eg How to split a string in T-SQL?

Then your steps are:
1. Parse the list of relevant doc ids into a temporary table (@selectedDocs) using this function (may need a data type conversion)
2. Populate another temporary table (@keywords) with the keywords used by these documents:
insert into @keywords (docID, keyword)
select d.docID, ltrim(rtrim(words.s))
from @selectedDocs sd
inner join @documents d on d.docID = sd.docID
cross apply (select * from dbo.Split(',', d.keywords)) words
3. Count how many times each keyword is used:
select k.keyword, count(k.docID)
from @keywords k
group by k.keyword

Note that you'd use a table variable usually only where you're expecting a few answers or a temporary table for more.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM