简体   繁体   English

SQL Server按列对查询

[英]SQL Server query by column pair

I'm working on products filter (faceted search) like Amazon. 我正在开发像亚马逊这样的产品过滤器(分面搜索)。 I have a table with properties (color, ram, screen) like this: 我有一个包含属性(颜色,ram,屏幕)的表,如下所示:

ArticleID  PropertyID  Value
---------  ----------  ------------
1          1           Black
1          2           8 GB
1          3           15"
2          1           White
2          2           8 GB
3          3           13"

I have to select articles depending on what properties are selected. 我必须根据选择的属性选择文章。 You can select multiple values for one property (for example RAM: 4 GB and 8 GB) and you can select multiple properties (for example RAM and screen size). 您可以为一个属性选择多个值(例如RAM:4 GB和8 GB),您可以选择多个属性(例如RAM和屏幕大小)。

I need functionality like this: 我需要这样的功能:

SELECT ArticleID
FROM ArticlesProperties
WHERE (PropertyID = 2 AND Value IN ('4 GB', '8 GB'))
  AND (PropertyID = 3 AND Value IN ('13"'))

I used to do that by creating a dynamic query and then executing that query: 我曾经通过创建动态查询然后执行该查询来做到这一点:

SELECT ArticleID
FROM ArticlesProperties
WHERE PropertyID = 2 AND Value IN ('4 GB', '8 GB')

INTERSECT

SELECT ArticleID
FROM ArticlesProperties
WHERE PropertyID = 3 AND Value IN ('13"')

But I don't think it is good way, there must be some better solution. 但我不认为这是好方法,必须有一些更好的解决方案。 There are millions of properties in the table, so optimization is necessary. 表中有数百万个属性,因此需要进行优化。

A solution should work on SQL Server 2014 Standard Edition without some add-ons or search engines like solr etc. 解决方案应该适用于SQL Server 2014 Standard Edition,而不需要一些附加组件或搜索引擎,如solr等。

I am in a pickle so if someone has some idea or solution, I would really appreciate it. 我在泡菜中,所以如果有人有一些想法或解决方案,我会非常感激。 Thanks! 谢谢!

intersect is likely to work very well. intersect可能会很好。

An alternative approach is to construct a where clause and use aggregation and having : 另一种方法是构造一个where子句并使用聚合并having

SELECT ArticleID
FROM ArticlesProperties
WHERE ( PropertyID = 2 AND Value IN ('4 GB', '8 GB') ) OR
      ( PropertyID = 3 AND Value IN ('13"') )
GROUP BY ArticleId
HAVING COUNT(DISTINCT PropertyId) = 2;

However, the INTERSECT method might make better use of an index on ArticlesProperties(PropertyId, Value) , so try that first to see what performance an alternative would have to beat. 但是, INTERSECT方法可能会更好地使用ArticlesProperties(PropertyId, Value)上的索引,因此首先尝试查看替代方案必须达到的性能。

I made a snippet showing the lines along which I would work. 我制作了一个片段,展示了我将要工作的线条。 Good choice of indices is important to speed up queries. 良好的指数选择对于加快查询非常重要。 Always check the execution plan for tweaking of indices. 始终检查执行计划以调整索引。

Notes: 笔记:

  • The script uses temporary tables, but in essence they're not different from regular tables. 该脚本使用临时表,但实质上它们与常规表没有区别。 Except for #select_properties , the temporary tables should become regular tables if you plan to use the way of working as outlined in the script. #select_properties外,如果您计划使用脚本中概述的工作方式,则临时表应成为常规表。

  • Store the article properties with ID's for property choice values, instead of the actual choice values. 存储文章属性,其中包含属性选择值的ID,而不是实际的选择值。 This saves you disk space, and memory when these tables are cached by SQL Server. 这节省了SQL Server缓存这些表时的磁盘空间和内存。 SQL Server will cache tables in memory as much as it can to service select statements faster. SQL Server将尽可能多地在内存中缓存表,以便更快地为select语句提供服务。

    If the article properties table is too big, it's possible that SQL Server will have to do disk IO to execute the select statement and that will surely slow the statement down. 如果文章属性表太大,则SQL Server可能必须执行磁盘IO才能执行select语句,这肯定会降低语句的速度。

    Added benefit is that for lookups, you are looking for ID's (integers) rather than text ( VARCHAR 's). 额外的好处是,对于查找,您正在寻找ID(整数)而不是文本( VARCHAR的)。 Lookup for integers is a lot faster than lookup for strings. 查找整数比查找字符串快得多。

  • Provide suitable indices on tables to speed up queries. 在表上提供合适的索引以加速查询。 To that end it is a good practice to analyze queries by inspecting the Actual Execution Plan . 为此,通过检查实际执行计划来分析查询是一种很好的做法。

    I've included several such indices in the snippet below. 我在下面的代码段中包含了几个这样的索引。 Depending on the number of rows in the article properties table and statistics, SQL Server will choose the best index to speed up the query. 根据文章属性表和统计信息中的行数,SQL Server将选择最佳索引来加速查询。

    If SQL Server thinks the query is missing a proper index for a SQL statement, the actual execution plan will have an indication saying that you are missing an index. 如果SQL Server认为查询缺少SQL语句的正确索引,则实际执行计划将指示您缺少索引。 It is good practice that when your queries become slow, to analyze these queries by inspecting the actual execution plan in SQL Server Management Studio. 优秀的做法是,当您的查询变慢时,通过检查SQL Server Management Studio中的实际执行计划来分析这些查询。

  • The snippet uses a temporary table to specify what properties you are looking for: #select_properties . 该代码段使用临时表来指定您要查找的属性: #select_properties Supply the criteria in that table by inserting the property ID's and property choice value ID's. 通过插入属性ID和属性选择值ID来提供该表中的条件。 The final selection query selects articles where at minimum one of the property choice values applies for each property. 最终选择查询选择至少一个属性选择值适用于每个属性的文章。

    You would create this temporary table in the session in which you want to select articles. 您可以在要在其中选择文章的会话中创建此临时表。 Then insert the search criteria, fire the select statement and finally drop the temporary table. 然后插入搜索条件,触发select语句,最后删除临时表。


CREATE TABLE #articles(
    article_id INT NOT NULL,
    article_desc VARCHAR(128) NOT NULL,
    CONSTRAINT PK_articles PRIMARY KEY CLUSTERED(article_id)
);

CREATE TABLE #properties(
    property_id INT NOT NULL, -- color, size, capacity
    property_desc VARCHAR(128) NOT NULL,
    CONSTRAINT PK_properties PRIMARY KEY CLUSTERED(property_id)
);

CREATE TABLE #property_values(
    property_id INT NOT NULL,
    property_choice_id INT NOT NULL, -- eg color -> black, white, red
    property_choice_val VARCHAR(128) NOT NULL,
    CONSTRAINT PK_property_values PRIMARY KEY CLUSTERED(property_id,property_choice_id),
    CONSTRAINT FK_values_to_properties FOREIGN KEY (property_id) REFERENCES #properties(property_id)
);

CREATE TABLE #article_properties(
    article_id INT NOT NULL,
    property_id INT NOT NULL,
    property_choice_id INT NOT NULL
    CONSTRAINT PK_article_properties PRIMARY KEY CLUSTERED(article_id,property_id,property_choice_id),
    CONSTRAINT FK_ap_to_articles FOREIGN KEY (article_id) REFERENCES #articles(article_id),
    CONSTRAINT FK_ap_to_property_values FOREIGN KEY (property_id,property_choice_id) REFERENCES #property_values(property_id,property_choice_id)

);
CREATE NONCLUSTERED INDEX IX_article_properties ON #article_properties(property_id,property_choice_id) INCLUDE(article_id);

INSERT INTO #properties(property_id,property_desc)VALUES
    (1,'color'),(2,'capacity'),(3,'size');

INSERT INTO #property_values(property_id,property_choice_id,property_choice_val)VALUES
    (1,1,'black'),(1,2,'white'),(1,3,'red'),
    (2,1,'4 Gb') ,(2,2,'8 Gb') ,(2,3,'16 Gb'),
    (3,1,'13"')  ,(3,2,'15"')  ,(3,3,'17"');

INSERT INTO #articles(article_id,article_desc)VALUES
    (1,'First article'),(2,'Second article'),(3,'Third article');

-- the table you have in your question, slightly modified
INSERT INTO #article_properties(article_id,property_id,property_choice_id)VALUES 
    (1,1,1),(1,2,2),(1,3,2), -- article 1: color=black, capacity=8gb, size=15"
    (2,1,2),(2,2,2),(2,3,1), -- article 2: color=white, capacity=8Gb, size=13"
    (3,1,3),        (3,3,3); -- article 3: color=red, size=17"

-- The table with the criteria you are selecting on
CREATE TABLE #select_properties(
    property_id INT NOT NULL,
    property_choice_id INT NOT NULL,
    CONSTRAINT PK_select_properties PRIMARY KEY CLUSTERED(property_id,property_choice_id)
);
INSERT INTO #select_properties(property_id,property_choice_id)VALUES
    (2,1),(2,2),(3,1); -- looking for '4Gb' or '8Gb', and size 13"

;WITH aid AS (  
    SELECT ap.article_id
    FROM #select_properties AS sp
         INNER JOIN #article_properties AS ap ON
            ap.property_id=sp.property_id AND
            ap.property_choice_id=sp.property_choice_id
    GROUP BY ap.article_id
    HAVING COUNT(DISTINCT ap.property_id)=(SELECT COUNT(DISTINCT property_id) FROM #select_properties)
    -- criteria met when article has a number of properties matching, equal to the distinct number of properties in the selection set
)
SELECT a.article_id,a.article_desc
FROM aid 
     INNER JOIN #articles AS a ON 
         a.article_id=aid.article_id
ORDER BY a.article_id;
-- result is the 'Second article' with id 2

DROP TABLE #select_properties;
DROP TABLE #article_properties;
DROP TABLE #property_values;
DROP TABLE #properties;
DROP TABLE #articles;

XML parameter XML参数

Your procedure takes XML parameter @criteria XML a couple things I used to debug: drop table #properties drop table #criteria 您的过程采用XML参数@criteria XML我用来调试的几件事:drop table #properties drop table #criteria

create table #properties (propertyId int)
insert into #properties values (1), (2) --presuming that you have a list of all the possible properties somewhere

-- This would be passed in by the application
declare @criteria XML = '<criteria>
<property id="1">
    <item value="8 GB" />
    <item value="4 GB" />
</property>
<property id="2">
    <item value="13 in" /> 
    <item value="4 in" />
</property>
</criteria>'

--encode the '"' and replace 'in' as needed

Code you need starts here: 您需要的代码从这里开始:

create table #criteria 
(propertyId int, searchvalue nvarchar(20))


insert into #criteria (propertyId, searchvalue)
select  
    cc.propertyId,
    c.value('@value','nvarchar(20)')  
from #properties cc
cross apply @criteria.nodes(N'/criteria/property[@id=sql:column("PropertyID")]/item') t(c)

SELECT ArticleID, count(1)
FROM ArticlesProperties ap
join #criteria cc on  cc.propertyId = ap.propertyId and cc.searchvalue = ap.value
group by ArticleID 
having count(1) = (select count(distinct propertyid from #criteria))

I'm assuming (ArticleID, PropertyID) is a key. 我假设(ArticleID, PropertyID)是一个关键。

This looks like an entity-attribute-value (EAV) table or an "open schema" design, so there's essentially no good way to query anything. 这看起来像实体属性值(EAV)表或“开放模式”设计,因此基本上没有好的方法来查询任何东西。 You might even consider setting up dynamic PIVOTs, but that's rather complex. 您甚至可以考虑设置动态PIVOT,但这相当复杂。

One method for this is EXISTS expressions: 一种方法是EXISTS表达式:

SELECT DISTINCT ArticleID
FROM ArticlesProperties ap
WHERE EXISTS (SELECT 1 FROM ArticlesProperties 
        WHERE ArticleID = ap.ArticleID AND PropertyID = 2 AND Value IN ('4 GB', '8 GB'))
    AND (SELECT 1 FROM ArticlesProperties 
        WHERE ArticleID = ap.ArticleID AND PropertyID = 3 AND Value IN ('13"'));

Or you might try OR combined with a COUNT() and HAVING : 或者你可以尝试OR结合COUNT()HAVING

SELECT ArticleID
FROM ArticlesProperties
WHERE (PropertyID = 2 AND Value IN ('4 GB', '8 GB'))
    OR (PropertyID = 3 AND Value IN ('13"'))
GROUP BY ArticleID
HAVING COUNT(PropertyID) = 2;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM