I'm trying to work out the most efficient way to count the number of occurrences that a keyword appears in my document table based on a specific list of document ids passed into my stored procedure.
The SP takes a parameter @DocIds
as a comma seperated list eg 100, 2010, 2340
What I am wanting to do is select the records where the DocID exists in the comma seperated list I'm passing in and records the keywords into a temporary table but also keep a count if the keyword has already been added to my temp table.
So for example (document table):
DocID | Keywords
-----------------------------
100 | Test, Document, Info
2010 | Document, users
4 | ....
2340 | users, client
Temp table would return:
Keyword | Count
Test | 1
Document | 2
Info | 1
users | 2
client | 1
I'm sure some SQL guru has a great solution for this any help would be greatly appreciated.
Many thanks M
Here's a solution for SQL Server 2005+. It uses a recursive CTE to produce the counts of words
Sample data and temp table creation
CREATE Table #Temp ([Count] int, Keyword varchar(max) );
DECLARE @document AS TABLE (
docid INT,
keywords VARCHAR(MAX))
INSERT INTO @document
VALUES (100, 'Test, Document, Info'),
(2010, 'Document, users'),
(4, '....'),
(2340, 'users, client')
Query
; WITH cte(docid, word, keywords)
AS (SELECT docid,
LEFT(keywords, Charindex(',', keywords + ',') - 1),
Stuff(keywords, 1, Charindex(',', keywords + ','), '')
FROM @document
UNION ALL
SELECT docid,
LEFT(keywords, Charindex(',', keywords + ',') - 1),
Stuff(keywords, 1, Charindex(',', keywords + ','), '')
FROM cte
WHERE keywords > '')
INSERT INTO #Temp ([Count], Keyword)
SELECT COUNT(docid),
Ltrim(Rtrim(word))
FROM cte
GROUP BY Ltrim(Rtrim(word))
SELECT [Count], Keyword FROM #temp
Output
Count Keyword
-------- -----
1 ....
1 client
2 Document
1 Info
1 Test
2 users
I think you have to run a query for each keyword
INSERT INTO tmp VALUES ('users',(
SELECT COUNT(DocID) FROM Documents WHERE keywords LIKE '%users%')
)
Ask Tom , has a technique for selecting out of a keyword list. With that technique, and a GROUP BY keyword, you can get the COUNT(*) which is exactly what you are looking for.
Assuming you're using SQL Server, look at one of the many answers for splitting a string into individual rows eg How to split a string in T-SQL?
Then your steps are:
1. Parse the list of relevant doc ids into a temporary table (@selectedDocs) using this function (may need a data type conversion)
2. Populate another temporary table (@keywords) with the keywords used by these documents:
insert into @keywords (docID, keyword)
3. Count how many times each keyword is used:
select d.docID, ltrim(rtrim(words.s))
from @selectedDocs sd
inner join @documents d on d.docID = sd.docID
cross apply (select * from dbo.Split(',', d.keywords)) words
select k.keyword, count(k.docID)
from @keywords k
group by k.keyword
Note that you'd use a table variable usually only where you're expecting a few answers or a temporary table for more.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.