简体   繁体   中英

Selecting from Table A where it joins to all data in Table B

I have a table of documents, and a table of tags. The documents are tagged with various values.

I am attempting to create a search of these tags, and for the most part it is working. However, I am getting extra results returned when it matches any tag. I only want results where it matches all tags.

I have created this to illustrate the problem http://sqlfiddle.com/#!3/8b98e/11

Tables and Data:

CREATE TABLE Documents
(
 DocId INT,
 DocText VARCHAR(500)
);

CREATE TABLE Tags
(
  TagId INT,
  TagName VARCHAR(50)
);

CREATE TABLE DocumentTags
(
  DocTagId INT,
  DocId INT,
  TagId INT,
  Value VARCHAR(50)
);

INSERT INTO Documents VALUES (1, 'Document 1 Text');
INSERT INTO Documents VALUES (2, 'Document 2 Text');

INSERT INTO Tags VALUES (1, 'Tag Name 1');
INSERT INTO Tags VALUES (2, 'Tag Name 2');

INSERT INTO DocumentTags VALUES (1, 1, 1, 'Value 1');
INSERT INTO DocumentTags VALUES (1, 1, 2, 'Value 2');
INSERT INTO DocumentTags VALUES (1, 2, 1, 'Value 1');

Code:

-- Set up the parameters
DECLARE @TagXml VARCHAR(max)
SET @TagXml = '<tags>
                  <tag>
                    <description>Tag Name 1</description>
                    <value>Value 1</value>
                  </tag>
                  <tag>
                    <description>Tag Name 2</description>
                    <value>Value 2</value>
                  </tag>
                </tags>'

-- Create a table to store the parsed xml in
DECLARE @XmlTagData TABLE 
(
    id varchar(20)
    ,[description] varchar(100)
    ,value varchar(250)
)

-- Populate our XML table
DECLARE @iTag int
EXEC sp_xml_preparedocument @iTag OUTPUT, @TagXml
-- Execute a SELECT statement that uses the OPENXML rowset provider
-- to produce a table from our xml structure and insert it into our temp table
INSERT INTO @XmlTagData (id, [description], value)
SELECT  id, [description], value
FROM    OPENXML (@iTag, '/tags/tag',1)
        WITH (id varchar(20),
                [description] varchar(100) 'description',
                value varchar(250) 'value')

EXECUTE sp_xml_removedocument @iTag

-- Update the XML table Id's to match existsing Tag Id's
UPDATE      @XmlTagData
SET         X.Id = T.TagId
FROM        @XmlTagData X
INNER JOIN  Tags T ON X.[description] = T.TagName

-- Check it looks right
--SELECT * 
--FROM @XmlTagData

-- This is where things do not quite work. I get both doc 1 & 2 back, 
-- but what I want is just document 1.
-- i.e. documents that have both tags with matching values
SELECT DISTINCT D.*
FROM Documents D
INNER JOIN DocumentTags T ON T.DocId = D.DocId
INNER JOIN @XmlTagData X ON X.id = T.TagId AND X.value = T.Value

(Note I am not a DBA, so there may be better ways of doing things. Hopefully I am on the right track, but I am open to other suggestions if my implementation can be improved.)

Can anyone offer any suggestions on how to get only results that have all tags?

Many thanks.

Use option with [NOT] EXISTS and EXCEPT operators in the last query

SELECT *
FROM Documents D
WHERE NOT EXISTS (
                  SELECT X.ID , X.Value
                  FROM @XmlTagData X 
                  EXCEPT
                  SELECT T.TagId, T.VALUE
                  FROM DocumentTags T
                  WHERE T.DocId = D.DocId
                  )

Demo on SQLFiddle

OR

SELECT *
FROM Documents D
WHERE EXISTS (
              SELECT X.ID , X.Value
              FROM @XmlTagData X 
              EXCEPT
              SELECT T.TagId, T.VALUE
              FROM DocumentTags T
              WHERE T.DocId != D.DocId
              )   

Demo on SQLFiddle

OR

Also you can use a simple solution with XQuery methods: nodes() , value() ) and CTE/Subquery.

-- Set up the parameters
DECLARE @TagXml XML
SET @TagXml = '<tags>
                  <tag>
                    <description>Tag Name 1</description>
                    <value>Value 1</value>
                  </tag>
                  <tag>
                    <description>Tag Name 2</description>
                    <value>Value 2</value>
                  </tag>              
                </tags>'               


;WITH cte AS
 (
  SELECT TagValue.value('(./value)[1]', 'nvarchar(100)') AS value,
         TagValue.value('(./description)[1]', 'nvarchar(100)') AS [description]       
  FROM @TagXml.nodes('/tags/tag') AS T(TagValue)
  )
  SELECT *
  FROM Documents D
  WHERE NOT EXISTS (
                    SELECT T.TagId, c.value
                    FROM cte c JOIN Tags T WITH(FORCESEEK) 
                      ON c.[description] = T.TagName
                    EXCEPT
                    SELECT T.TagId, T.VALUE
                    FROM DocumentTags T WITH(FORCESEEK)
                    WHERE T.DocId = D.DocId                          
                    )

Demo on SQLFiddle

OR

-- Set up the parameters
DECLARE @TagXml XML
SET @TagXml = '<tags>
                  <tag>
                    <description>Tag Name 1</description>
                    <value>Value 1</value>
                  </tag>
                  <tag>
                    <description>Tag Name 2</description>
                    <value>Value 2</value>
                  </tag>              
                </tags>'      

  SELECT *
  FROM Documents D
  WHERE NOT EXISTS (
                    SELECT T2.TagId,
                           TagValue.value('(./value)[1]', 'nvarchar(100)') AS value                           
                    FROM @TagXml.nodes('/tags/tag') AS T(TagValue)
                      JOIN Tags T2 WITH(FORCESEEK)
                        ON TagValue.value('(./description)[1]', 'nvarchar(100)') = T2.TagName                                        
                    EXCEPT
                    SELECT T.TagId, T.VALUE
                    FROM DocumentTags T WITH(FORCESEEK)
                    WHERE T.DocId = D.DocId                       
                    )

Demo on SQLFiddle

In order to improving performance(forced operation of index seek on the Tags and DocumentTags tables), use indexes and table hints(FORCESEEK hint added to the query above):

CREATE INDEX x ON Documents(DocId) INCLUDE(DocText)
CREATE INDEX x ON Tags(TagName) INCLUDE(TagId)
CREATE INDEX x ON DocumentTags(DocId) INCLUDE(TagID, VALUE)

I am not really sure of the syntax for SQL Server , but I guess something like this should work

SELECT d.docId
FROM Documents D
INNER JOIN DocumentTags T ON T.DocId = D.DocId
INNER JOIN @XmlTagData X ON X.id = T.TagId AND X.value = T.Value
group by 
documents.docid
having count(*) = 2 --[total of tags to be searched]

Add a where clause to check a not exists conditional:

SELECT DISTINCT D.*
FROM Documents D
INNER JOIN DocumentTags T ON T.DocId = D.DocId
INNER JOIN @XmlTagData X ON X.id = T.TagId AND X.value = T.Value
WHERE NOT EXISTS (SELECT 1 FROM Documents dt2
                  CROSS JOIN Tags t2 
                  LEFT JOIN DocumentTags dt3 
                  ON t2.TagId = dt3.TagId
                  AND dt2.DocId = dt3.DocId
                  WHERE dt3.DocTagId IS NULL
                  AND dt2.DocId = D.DocId)

SQL Fiddle.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM