简体   繁体   English

如何在SQL中实现过滤系统?

[英]How to implement filter system in SQL?

Right now I am planning to add a filter system to my site. 现在我打算在我的网站上添加一个过滤系统。

Examples: 例子:

(ID=apple, COLOR=red, TASTE=sweet, ORIGIN=US)
(ID=mango, COLOR=yellow, TASTE=sweet, ORIGIN=MEXICO)
(ID=banana, COLOR=yellow, TASTE=bitter-sweet, ORIGIN=US)

so now I am interested in doing the following: SELECT ID FROM thisTable WHERE COLOR='yellow' AND TASTE='SWEET' 所以现在我有兴趣做以下事情:SELECT ID FROM thisTable WHERE COLOR ='yellow'AND TASTE ='SWEET'

But my problem is I am doing this for multiple categories in my site, and the columns are NOT consistent. 但我的问题是我在我的网站中为多个类别执行此操作,并且列不一致。 (like if the table is for handphones, then it will be BRAND, 3G-ENABLED, PRICE, COLOR, WAVELENGTH, etc) (如果桌子是用于手机,那么它将是品牌,3G-ENABLED,价格,颜色,波长等)

how could I design a general schema that allows this? 我怎么能设计一个允许这个的通用模式?

Right now I am planning on doing: 现在我正计划做:

table(ID, KEY, VALUE)

This allows arbitary number of columns, but for the query, I am using SELECT ID FROM table WHERE (KEY=X1 AND VALUE=V1) AND (KEY=X2 AND VALUE=V2), .. which returns an empty set. 这允许任意数量的列,但对于查询,我使用SELECT ID FROM表WHERE(KEY = X1 AND VALUE = V1)AND(KEY = X2 AND VALUE = V2),..返回空集。

Can someone recommend a good solution to this? 有人可以推荐一个很好的解决方案吗? Note that the number of columns WILL change regularly 请注意,列数将定期更改

The entity-attribute-value model that you suggest could fit in this scenario. 您建议的实体属性值模型可适用于此方案。

Regarding the filtering query, you have to understand that with the EAV model you will sacrifice plenty of query power, so this can become quite tricky. 关于过滤查询,你必须明白,使用EAV模型你会牺牲大量的查询能力,所以这会变得非常棘手。 However this one way to tackle your problem: 然而,这是解决问题的一种方法:

SELECT    stuff.id 
FROM      stuff 
JOIN      (SELECT    COUNT(*) matches
           FROM      table
           WHERE     (`key` = X1 AND `value` = V1) OR 
                     (`key` = X2 AND `value` = V2) 
           GROUP BY  id
          ) sub_t ON (sub_t.matches = 2 AND sub_t.id = stuff.id)
GROUP BY  stuff.id;

One inelegant feature of this approach is that you need to specify the number of attribute/value pairs that you expect to match in sub_t.matches = 2 . 此方法的一个不优雅的特性是您需要指定您希望在sub_t.matches = 2匹配的属性/值对的数量。 If we had three conditions we would have had to specify sub_t.matches = 3 , and so on. 如果我们有三个条件,我们必须指定sub_t.matches = 3 ,依此类推。

Let's build a test case: 让我们构建一个测试用例:

CREATE TABLE stuff (`id` varchar(20), `key` varchar(20), `value` varchar(20));

INSERT INTO stuff VALUES ('apple',  'color',  'red');
INSERT INTO stuff VALUES ('mango',  'color',  'yellow');
INSERT INTO stuff VALUES ('banana', 'color',  'yellow');

INSERT INTO stuff VALUES ('apple',  'taste',  'sweet');
INSERT INTO stuff VALUES ('mango',  'taste',  'sweet');
INSERT INTO stuff VALUES ('banana', 'taste',  'bitter-sweet');

INSERT INTO stuff VALUES ('apple',  'origin',  'US');
INSERT INTO stuff VALUES ('mango',  'origin',  'MEXICO');
INSERT INTO stuff VALUES ('banana', 'origin',  'US');

Query: 查询:

SELECT    stuff.id 
FROM      stuff 
JOIN      (SELECT    COUNT(*) matches, id
           FROM      stuff
           WHERE     (`key` = 'color' AND `value` = 'yellow') OR 
                     (`key` = 'taste' AND `value` = 'sweet')
           GROUP BY  id
          ) sub_t ON (sub_t.matches = 2 AND sub_t.id = stuff.id)
GROUP BY  stuff.id;

Result: 结果:

+-------+
| id    |
+-------+
| mango |
+-------+
1 row in set (0.02 sec)

Now let's insert another fruit with color=yellow and taste=sweet : 现在让我们插入另一种color=yellowtaste=sweet水果:

INSERT INTO stuff VALUES ('pear', 'color', 'yellow');
INSERT INTO stuff VALUES ('pear', 'taste', 'sweet');
INSERT INTO stuff VALUES ('pear', 'origin', 'somewhere');

The same query would return: 相同的查询将返回:

+-------+
| id    |
+-------+
| mango |
| pear  |
+-------+
2 rows in set (0.00 sec)

If we want to restrict this result to entities with origin=MEXICO , we would have to add another OR condition and check for sub_t.matches = 3 instead of 2 . 如果我们想要将此结果限制为具有origin=MEXICO实体,我们将不得不添加另一个OR条件并检查sub_t.matches = 3而不是2

SELECT    stuff.id 
FROM      stuff 
JOIN      (SELECT    COUNT(*) matches, id
           FROM      stuff
           WHERE     (`key` = 'color' AND `value` = 'yellow') OR 
                     (`key` = 'taste' AND `value` = 'sweet') OR 
                     (`key` = 'origin' AND `value` = 'MEXICO')
           GROUP BY  id
          ) sub_t ON (sub_t.matches = 3 AND sub_t.id = stuff.id)
GROUP BY  stuff.id;

Result: 结果:

+-------+
| id    |
+-------+
| mango |
+-------+
1 row in set (0.00 sec)

As in every approach, there are certain advantages and disadvantages when using the EAV model. 与每种方法一样,使用EAV模型时存在某些优点和缺点。 Make sure you research the topic extensively in the context of your application. 确保在应用程序的上下文中广泛研究该主题。 You may even want to consider an alternative relational databases, such as Cassandra , CouchDB , MongoDB , Voldemort , HBase , SimpleDB or other key-value stores. 您甚至可能想要考虑其他关系数据库,例如CassandraCouchDBMongoDBVoldemortHBaseSimpleDB或其他键值存储。

The following worked for me: 以下对我有用:

SELECT * FROM mytable t WHERE 
    t.key = "key" AND t.value = "value" OR
    t.key = "key" AND t.value = "value" OR
    ....
    t.key = "key" AND t.value = "value"
GROUP BY t.id having count(*)=3;

count(*)=3 must match the amount of count(*)= 3必须与数量相匹配

t.key = "key" AND t.value = "value" t.key =“key”AND t.value =“value”

cases

What you are suggesting is known as an Entity-Attribute-Value structure and is highly discouraged. 您所建议的内容被称为实体 - 属性 - 价值结构,并且非常不鼓励。 One of the (many) big problems with EAV designs for example is in data integrity. 例如,EAV设计的(许多)大问题之一就是数据完整性。 How you do enforce that colors only consist of "red", "yellow", "blue" etc? 你如何强制执行这些颜色只包括“红色”,“黄色”,“蓝色”等? In short, you can't without a lot of hacks. 总之,你不能没有很多黑客。 Another problem rears itself in querying (as you have seen) and searching for data. 在查询(如您所见)和搜索数据时,另一个问题就出现了。

Instead, I would recommend creating a table that represents each type of entity and thus each table can have attributes (columns) that are specific to that type of entity. 相反,我建议创建一个表示每种类型实体的表,因此每个表都可以具有特定于该类型实体的属性(列)。

In order to convert the data into columns in a result query as you are seeking, you will need to create what is often called a crosstab query. 为了将数据转换为结果查询中的列,您需要创建通常称为交叉表查询的内容。 There are report engines that will do it and you can do it code but most database products will not do it natively (meaning without building the SQL string manually). 有报告引擎会这样做,你可以做代码,但大多数数据库产品不会本地执行(意味着没有手动构建SQL字符串)。 The performance of course will not be good if you have a lot of data and you will run into problems filtering on the data. 如果您有大量数据,那么当然性能不会很好,您将遇到过滤数据的问题。 For example, suppose that some of the values are supposed to be numeric. 例如,假设某些值应该是数字。 Because the value part of the EAV is likely to be a string, you will have to cast those values to an integer before you can filter on them and that presumes that the data will be convertible to an integer. 因为EAV的值部分可能是一个字符串,所以在对它们进行过滤之前,必须将这些值转换为整数,并假设数据可以转换为整数。

The price you pay for simplistic table design at this stage will cost you in terms of performance in the long run. 从长远来看,您在此阶段为简单的表格设计支付的价格将使您在性能方面付出代价。 Using ORM to reduce the cost of modifying the database to fit data in an appropriate structure would probably be a good time investment, even in spite of ORM's performance cost. 使用ORM来降低修改数据库以使数据适合于适当结构的成本可能是一个很好的时间投资,即使ORM的性能成本也是如此。

Otherwise, you may want to look for a "reverse ORM" that maps the code from your database, which has the benefit of being less expensive and having higher performance. 否则,你可能要寻找的代码数据库,其中有被更便宜,具有较高性能的好处映射“反向ORM”。 (Slightly higher starting cost compared to ORM, but better long-term performance and reliability.) (与ORM相比,起始成本略高,但长期性能和可靠性更高。)

It's a costly problem regardless of how you slice it. 无论你如何切片,这都是一个代价高昂的问题。 Do you want to pay now with development time or pay later when your performance tanks? 您想现在支付开发时间还是稍后在您的演出坦克时支付? ("Pay later" is the wrong answer.) (“稍后付款”是错误的答案。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM