SQL Server query by column pair

Question

I'm working on products filter (faceted search) like Amazon. I have a table with properties (color, ram, screen) like this:

ArticleID  PropertyID  Value
---------  ----------  ------------
1          1           Black
1          2           8 GB
1          3           15"
2          1           White
2          2           8 GB
3          3           13"

I have to select articles depending on what properties are selected. You can select multiple values for one property (for example RAM: 4 GB and 8 GB) and you can select multiple properties (for example RAM and screen size).

I need functionality like this:

SELECT ArticleID
FROM ArticlesProperties
WHERE (PropertyID = 2 AND Value IN ('4 GB', '8 GB'))
  AND (PropertyID = 3 AND Value IN ('13"'))

I used to do that by creating a dynamic query and then executing that query:

SELECT ArticleID
FROM ArticlesProperties
WHERE PropertyID = 2 AND Value IN ('4 GB', '8 GB')

INTERSECT

SELECT ArticleID
FROM ArticlesProperties
WHERE PropertyID = 3 AND Value IN ('13"')

But I don't think it is good way, there must be some better solution. There are millions of properties in the table, so optimization is necessary.

A solution should work on SQL Server 2014 Standard Edition without some add-ons or search engines like solr etc.

I am in a pickle so if someone has some idea or solution, I would really appreciate it. Thanks!

Answer 1

intersect is likely to work very well.

An alternative approach is to construct a where clause and use aggregation and having :

SELECT ArticleID
FROM ArticlesProperties
WHERE ( PropertyID = 2 AND Value IN ('4 GB', '8 GB') ) OR
      ( PropertyID = 3 AND Value IN ('13"') )
GROUP BY ArticleId
HAVING COUNT(DISTINCT PropertyId) = 2;

However, the INTERSECT method might make better use of an index on ArticlesProperties(PropertyId, Value) , so try that first to see what performance an alternative would have to beat.

Answer 2

I made a snippet showing the lines along which I would work. Good choice of indices is important to speed up queries. Always check the execution plan for tweaking of indices.

Notes:

The script uses temporary tables, but in essence they're not different from regular tables. Except for #select_properties , the temporary tables should become regular tables if you plan to use the way of working as outlined in the script.
Store the article properties with ID's for property choice values, instead of the actual choice values. This saves you disk space, and memory when these tables are cached by SQL Server. SQL Server will cache tables in memory as much as it can to service select statements faster.
If the article properties table is too big, it's possible that SQL Server will have to do disk IO to execute the select statement and that will surely slow the statement down.
Added benefit is that for lookups, you are looking for ID's (integers) rather than text ( VARCHAR 's). Lookup for integers is a lot faster than lookup for strings.
Provide suitable indices on tables to speed up queries. To that end it is a good practice to analyze queries by inspecting the Actual Execution Plan .
I've included several such indices in the snippet below. Depending on the number of rows in the article properties table and statistics, SQL Server will choose the best index to speed up the query.
If SQL Server thinks the query is missing a proper index for a SQL statement, the actual execution plan will have an indication saying that you are missing an index. It is good practice that when your queries become slow, to analyze these queries by inspecting the actual execution plan in SQL Server Management Studio.
The snippet uses a temporary table to specify what properties you are looking for: #select_properties . Supply the criteria in that table by inserting the property ID's and property choice value ID's. The final selection query selects articles where at minimum one of the property choice values applies for each property.
You would create this temporary table in the session in which you want to select articles. Then insert the search criteria, fire the select statement and finally drop the temporary table.

CREATE TABLE #articles(
    article_id INT NOT NULL,
    article_desc VARCHAR(128) NOT NULL,
    CONSTRAINT PK_articles PRIMARY KEY CLUSTERED(article_id)
);

CREATE TABLE #properties(
    property_id INT NOT NULL, -- color, size, capacity
    property_desc VARCHAR(128) NOT NULL,
    CONSTRAINT PK_properties PRIMARY KEY CLUSTERED(property_id)
);

CREATE TABLE #property_values(
    property_id INT NOT NULL,
    property_choice_id INT NOT NULL, -- eg color -> black, white, red
    property_choice_val VARCHAR(128) NOT NULL,
    CONSTRAINT PK_property_values PRIMARY KEY CLUSTERED(property_id,property_choice_id),
    CONSTRAINT FK_values_to_properties FOREIGN KEY (property_id) REFERENCES #properties(property_id)
);

CREATE TABLE #article_properties(
    article_id INT NOT NULL,
    property_id INT NOT NULL,
    property_choice_id INT NOT NULL
    CONSTRAINT PK_article_properties PRIMARY KEY CLUSTERED(article_id,property_id,property_choice_id),
    CONSTRAINT FK_ap_to_articles FOREIGN KEY (article_id) REFERENCES #articles(article_id),
    CONSTRAINT FK_ap_to_property_values FOREIGN KEY (property_id,property_choice_id) REFERENCES #property_values(property_id,property_choice_id)

);
CREATE NONCLUSTERED INDEX IX_article_properties ON #article_properties(property_id,property_choice_id) INCLUDE(article_id);

INSERT INTO #properties(property_id,property_desc)VALUES
    (1,'color'),(2,'capacity'),(3,'size');

INSERT INTO #property_values(property_id,property_choice_id,property_choice_val)VALUES
    (1,1,'black'),(1,2,'white'),(1,3,'red'),
    (2,1,'4 Gb') ,(2,2,'8 Gb') ,(2,3,'16 Gb'),
    (3,1,'13"')  ,(3,2,'15"')  ,(3,3,'17"');

INSERT INTO #articles(article_id,article_desc)VALUES
    (1,'First article'),(2,'Second article'),(3,'Third article');

-- the table you have in your question, slightly modified
INSERT INTO #article_properties(article_id,property_id,property_choice_id)VALUES 
    (1,1,1),(1,2,2),(1,3,2), -- article 1: color=black, capacity=8gb, size=15"
    (2,1,2),(2,2,2),(2,3,1), -- article 2: color=white, capacity=8Gb, size=13"
    (3,1,3),        (3,3,3); -- article 3: color=red, size=17"

-- The table with the criteria you are selecting on
CREATE TABLE #select_properties(
    property_id INT NOT NULL,
    property_choice_id INT NOT NULL,
    CONSTRAINT PK_select_properties PRIMARY KEY CLUSTERED(property_id,property_choice_id)
);
INSERT INTO #select_properties(property_id,property_choice_id)VALUES
    (2,1),(2,2),(3,1); -- looking for '4Gb' or '8Gb', and size 13"

;WITH aid AS (  
    SELECT ap.article_id
    FROM #select_properties AS sp
         INNER JOIN #article_properties AS ap ON
            ap.property_id=sp.property_id AND
            ap.property_choice_id=sp.property_choice_id
    GROUP BY ap.article_id
    HAVING COUNT(DISTINCT ap.property_id)=(SELECT COUNT(DISTINCT property_id) FROM #select_properties)
    -- criteria met when article has a number of properties matching, equal to the distinct number of properties in the selection set
)
SELECT a.article_id,a.article_desc
FROM aid 
     INNER JOIN #articles AS a ON 
         a.article_id=aid.article_id
ORDER BY a.article_id;
-- result is the 'Second article' with id 2

DROP TABLE #select_properties;
DROP TABLE #article_properties;
DROP TABLE #property_values;
DROP TABLE #properties;
DROP TABLE #articles;

Answer 3

XML parameter

Your procedure takes XML parameter @criteria XML a couple things I used to debug: drop table #properties drop table #criteria

create table #properties (propertyId int)
insert into #properties values (1), (2) --presuming that you have a list of all the possible properties somewhere

-- This would be passed in by the application
declare @criteria XML = '<criteria>
<property id="1">
    <item value="8 GB" />
    <item value="4 GB" />
</property>
<property id="2">
    <item value="13 in" /> 
    <item value="4 in" />
</property>
</criteria>'

--encode the '"' and replace 'in' as needed

Code you need starts here:

create table #criteria 
(propertyId int, searchvalue nvarchar(20))


insert into #criteria (propertyId, searchvalue)
select  
    cc.propertyId,
    c.value('@value','nvarchar(20)')  
from #properties cc
cross apply @criteria.nodes(N'/criteria/property[@id=sql:column("PropertyID")]/item') t(c)

SELECT ArticleID, count(1)
FROM ArticlesProperties ap
join #criteria cc on  cc.propertyId = ap.propertyId and cc.searchvalue = ap.value
group by ArticleID 
having count(1) = (select count(distinct propertyid from #criteria))

Answer 4

I'm assuming (ArticleID, PropertyID) is a key.

This looks like an entity-attribute-value (EAV) table or an "open schema" design, so there's essentially no good way to query anything. You might even consider setting up dynamic PIVOTs, but that's rather complex.

One method for this is EXISTS expressions:

SELECT DISTINCT ArticleID
FROM ArticlesProperties ap
WHERE EXISTS (SELECT 1 FROM ArticlesProperties 
        WHERE ArticleID = ap.ArticleID AND PropertyID = 2 AND Value IN ('4 GB', '8 GB'))
    AND (SELECT 1 FROM ArticlesProperties 
        WHERE ArticleID = ap.ArticleID AND PropertyID = 3 AND Value IN ('13"'));

Or you might try OR combined with a COUNT() and HAVING :

SELECT ArticleID
FROM ArticlesProperties
WHERE (PropertyID = 2 AND Value IN ('4 GB', '8 GB'))
    OR (PropertyID = 3 AND Value IN ('13"'))
GROUP BY ArticleID
HAVING COUNT(PropertyID) = 2;

SQL Server query by column pair

Question

4 answers

solution1
1 2016-02-29 21:42:14

solution2
1 ACCPTED 2016-02-29 22:44:25

solution3
0 2016-02-29 21:43:06

solution4
0 2016-02-29 21:43:52

SQL Server query by column pair

Question

4 answers

solution1 1 2016-02-29 21:42:14

solution2 1 ACCPTED 2016-02-29 22:44:25

solution3 0 2016-02-29 21:43:06

solution4 0 2016-02-29 21:43:52

solution1
1 2016-02-29 21:42:14

solution2
1 ACCPTED 2016-02-29 22:44:25

solution3
0 2016-02-29 21:43:06

solution4
0 2016-02-29 21:43:52