I have a table of documents, and a table of tags. The documents are tagged with various values.
I am attempting to create a search of these tags, and for the most part it is working. However, I am getting extra results returned when it matches any tag. I only want results where it matches all tags.
I have created this to illustrate the problem http://sqlfiddle.com/#!3/8b98e/11
Tables and Data:
CREATE TABLE Documents
(
DocId INT,
DocText VARCHAR(500)
);
CREATE TABLE Tags
(
TagId INT,
TagName VARCHAR(50)
);
CREATE TABLE DocumentTags
(
DocTagId INT,
DocId INT,
TagId INT,
Value VARCHAR(50)
);
INSERT INTO Documents VALUES (1, 'Document 1 Text');
INSERT INTO Documents VALUES (2, 'Document 2 Text');
INSERT INTO Tags VALUES (1, 'Tag Name 1');
INSERT INTO Tags VALUES (2, 'Tag Name 2');
INSERT INTO DocumentTags VALUES (1, 1, 1, 'Value 1');
INSERT INTO DocumentTags VALUES (1, 1, 2, 'Value 2');
INSERT INTO DocumentTags VALUES (1, 2, 1, 'Value 1');
Code:
-- Set up the parameters
DECLARE @TagXml VARCHAR(max)
SET @TagXml = '<tags>
<tag>
<description>Tag Name 1</description>
<value>Value 1</value>
</tag>
<tag>
<description>Tag Name 2</description>
<value>Value 2</value>
</tag>
</tags>'
-- Create a table to store the parsed xml in
DECLARE @XmlTagData TABLE
(
id varchar(20)
,[description] varchar(100)
,value varchar(250)
)
-- Populate our XML table
DECLARE @iTag int
EXEC sp_xml_preparedocument @iTag OUTPUT, @TagXml
-- Execute a SELECT statement that uses the OPENXML rowset provider
-- to produce a table from our xml structure and insert it into our temp table
INSERT INTO @XmlTagData (id, [description], value)
SELECT id, [description], value
FROM OPENXML (@iTag, '/tags/tag',1)
WITH (id varchar(20),
[description] varchar(100) 'description',
value varchar(250) 'value')
EXECUTE sp_xml_removedocument @iTag
-- Update the XML table Id's to match existsing Tag Id's
UPDATE @XmlTagData
SET X.Id = T.TagId
FROM @XmlTagData X
INNER JOIN Tags T ON X.[description] = T.TagName
-- Check it looks right
--SELECT *
--FROM @XmlTagData
-- This is where things do not quite work. I get both doc 1 & 2 back,
-- but what I want is just document 1.
-- i.e. documents that have both tags with matching values
SELECT DISTINCT D.*
FROM Documents D
INNER JOIN DocumentTags T ON T.DocId = D.DocId
INNER JOIN @XmlTagData X ON X.id = T.TagId AND X.value = T.Value
(Note I am not a DBA, so there may be better ways of doing things. Hopefully I am on the right track, but I am open to other suggestions if my implementation can be improved.)
Can anyone offer any suggestions on how to get only results that have all tags?
Many thanks.
Use option with [NOT] EXISTS and EXCEPT operators in the last query
SELECT *
FROM Documents D
WHERE NOT EXISTS (
SELECT X.ID , X.Value
FROM @XmlTagData X
EXCEPT
SELECT T.TagId, T.VALUE
FROM DocumentTags T
WHERE T.DocId = D.DocId
)
Demo on SQLFiddle
OR
SELECT *
FROM Documents D
WHERE EXISTS (
SELECT X.ID , X.Value
FROM @XmlTagData X
EXCEPT
SELECT T.TagId, T.VALUE
FROM DocumentTags T
WHERE T.DocId != D.DocId
)
Demo on SQLFiddle
OR
Also you can use a simple solution with XQuery methods: nodes() , value() ) and CTE/Subquery.
-- Set up the parameters
DECLARE @TagXml XML
SET @TagXml = '<tags>
<tag>
<description>Tag Name 1</description>
<value>Value 1</value>
</tag>
<tag>
<description>Tag Name 2</description>
<value>Value 2</value>
</tag>
</tags>'
;WITH cte AS
(
SELECT TagValue.value('(./value)[1]', 'nvarchar(100)') AS value,
TagValue.value('(./description)[1]', 'nvarchar(100)') AS [description]
FROM @TagXml.nodes('/tags/tag') AS T(TagValue)
)
SELECT *
FROM Documents D
WHERE NOT EXISTS (
SELECT T.TagId, c.value
FROM cte c JOIN Tags T WITH(FORCESEEK)
ON c.[description] = T.TagName
EXCEPT
SELECT T.TagId, T.VALUE
FROM DocumentTags T WITH(FORCESEEK)
WHERE T.DocId = D.DocId
)
Demo on SQLFiddle
OR
-- Set up the parameters
DECLARE @TagXml XML
SET @TagXml = '<tags>
<tag>
<description>Tag Name 1</description>
<value>Value 1</value>
</tag>
<tag>
<description>Tag Name 2</description>
<value>Value 2</value>
</tag>
</tags>'
SELECT *
FROM Documents D
WHERE NOT EXISTS (
SELECT T2.TagId,
TagValue.value('(./value)[1]', 'nvarchar(100)') AS value
FROM @TagXml.nodes('/tags/tag') AS T(TagValue)
JOIN Tags T2 WITH(FORCESEEK)
ON TagValue.value('(./description)[1]', 'nvarchar(100)') = T2.TagName
EXCEPT
SELECT T.TagId, T.VALUE
FROM DocumentTags T WITH(FORCESEEK)
WHERE T.DocId = D.DocId
)
Demo on SQLFiddle
In order to improving performance(forced operation of index seek on the Tags and DocumentTags tables), use indexes and table hints(FORCESEEK hint added to the query above):
CREATE INDEX x ON Documents(DocId) INCLUDE(DocText)
CREATE INDEX x ON Tags(TagName) INCLUDE(TagId)
CREATE INDEX x ON DocumentTags(DocId) INCLUDE(TagID, VALUE)
I am not really sure of the syntax for SQL Server
, but I guess something like this should work
SELECT d.docId
FROM Documents D
INNER JOIN DocumentTags T ON T.DocId = D.DocId
INNER JOIN @XmlTagData X ON X.id = T.TagId AND X.value = T.Value
group by
documents.docid
having count(*) = 2 --[total of tags to be searched]
Add a where clause to check a not exists conditional:
SELECT DISTINCT D.*
FROM Documents D
INNER JOIN DocumentTags T ON T.DocId = D.DocId
INNER JOIN @XmlTagData X ON X.id = T.TagId AND X.value = T.Value
WHERE NOT EXISTS (SELECT 1 FROM Documents dt2
CROSS JOIN Tags t2
LEFT JOIN DocumentTags dt3
ON t2.TagId = dt3.TagId
AND dt2.DocId = dt3.DocId
WHERE dt3.DocTagId IS NULL
AND dt2.DocId = D.DocId)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.