简体   繁体   English

SQL:了解WHERE子句中的OR运算符

[英]SQL: Understanding the OR operator in a WHERE clause

I have tables called Movie, Genre and Keyword, from which I have created a view called 'genkeyword'. 我有一个名为Movie,Genre和Keyword的表,我创建了一个名为'genkeyword'的视图。 The view 'genkeyword' has a lot of tuples, so it can be accessed at DB Fiddle . 视图'genkeyword'有很多元组,所以可以在DB Fiddle访问它。

I have the following query: 我有以下查询:

SELECT title, 
       year, 
       Count(DISTINCT genre)   AS genre_freq, 
       Count(DISTINCT keyword) AS keyword_freq 
FROM   genkeyword 
WHERE  ( genre IN (SELECT genre 
                   FROM   genkeyword 
                   WHERE  title = 'Harry Potter and the  Deathly Hallows') 
          OR keyword IN (SELECT keyword 
                         FROM   genkeyword 
                         WHERE  title = 'Harry Potter and the  Deathly Hallows') ) 
       AND title <> 'Harry Potter and the Deathly Hallows' 
GROUP  BY title, 
          year 
ORDER  BY genre_freq DESC, 
          keyword_freq DESC; 

What I am intending to do with this query is to get the genre and keyword frequency for each movie that has genres and keywords that are in common with Harry Potter: The output should be: 我打算用这个查询来获取每个具有与哈利波特相同的类型和关键词的电影的流派和关键词频率:输出应该是:

title                      |      genre_freq    |    keyword_freq
Cinderella                        2                        2
The Shape of Water                2                        1
How to Train Your Dragon          2                        0
Enchanted                         1                        3

I know that the query is not correct, since the I get the following output instead: 我知道查询不正确,因为我得到以下输出:

    title                      |      genre_freq    |    keyword_freq
    The Shape of Water                4                  3       
    Enchanted                         3                  4
    Cinderella                        2                  5
    How to Train Your Dragon          2                  3              

However, I would like to clarify my understanding about how the query works. 但是,我想澄清一下我对查询如何工作的理解。

In the 'where' clause of my query: 在我的查询的'where'子句中:

where (genre in (select genre from genkeyword where title='Harry Potter') or 
keyword in (select keyword from genkeyword where title='Harry Potter')) 

Am I right in saying that there are two result sets generated, one containing all the tuples which have a genre that is in Harry Potter (let this be R1) and the other containing all the tuples that have a keyword that is in Harry Potter (let this be R2)? 我是否正确地说生成了两个结果集,一个包含所有具有Harry Potter中的类型的元组(让它为R1),另一个包含所有具有哈利波特关键字的元组(让这成为R2)?

If the tuple under consideration contains a genre that is in the genre result set R1 or a keyword that is in the keyword result set R2, then the genre/keyword is counted. 如果所考虑的元组包含类型结果集R1中的类型或关键字结果集R2中的关键字,则计算类型/关键字。 I am not sure how count(distinct genre) and count(distinct keyword) works in this case. 我不确定在这种情况下count(不同类型)和count(distinct keyword)是如何工作的。 If the tuple contains a genre that is in R1, is only the genre counted or is the keyword counted as well? 如果元组包含R1中的类型,则只计算类型或计算关键字? This is the same for the case when the tuple contains a keyword in R2, is the genre counted as well as the keyword? 这对于元组在R2中包含关键字的情况是相同的,是否计算了类型以及关键字?

I don't understand why I am getting the genre_freq and keyword_freq values wrong from my query. 我不明白为什么我从查询中得到genre_freq和keyword_freq值错误。 This is because I don't fully understand how the genre and keyword frequencies are getting counted behind-the-scenes. 这是因为我不完全理解在幕后如何计算类型和关键词频率。 Any insights are appreciated. 任何见解都表示赞赏。

One of the best-asked questions I have seen so far on SO. 到目前为止我在SO上看到的最常见问题之一。

To answer your question. 回答你的问题。 The OR clause basically pastes the result of both the keyword part and the genre part below each other. OR子句基本上将关键字部分和类型部分的结果粘贴在彼此之下。 SQL works in rows (or records), so you should always think in rows. SQL在行(或记录)中工作,因此您应该始终考虑行。

First, it selects all the rows containing the same genre like Harry Potter. 首先,它选择包含像哈利波特一样的所有类型的行。 Then it selects all the rows containing the keywords. 然后它选择包含关键字的所有行。 Then it performs the count. 然后它执行计数。 Obviously, this is too high, because you will also get all the records that do not have the same genre, but do have overlapping keywords. 显然,这太高了,因为你也会获得所有不具有相同类型的记录,但确实有重叠的关键字。 You will also get all rows that do have overlapping genres, but not overlapping keywords. 您还将获得具有重叠类型但不重叠关键字的所有行。

To properly count the records simply change OR to AND. 要正确计算记录,只需将OR更改为AND。 This will only select the records that have the same genre as well as contain keywords. 这将仅选择具有相同类型的记录以及包含关键字的记录。 Counting these will produce the correct result. 计算这些将产生正确的结果。

As Imre_G said, this is a good question, and his explanation of what is going wrong is spot on. 正如Imre_G所说,这是一个很好的问题,他对出现问题的解释就是现实。 You are basically picking genres and keywords you don't want, then counting these up because they share a common element. 你基本上选择你不想要的流派和关键词,然后计算这些,因为它们共享一个共同元素。

This is one way to fix it, maybe not the best, but the simplest: 这是修复它的一种方法,可能不是最好的,但最简单的方法:

SELECT
    COALESCE(a.title, b.title) AS title,
    COALESCE(a.year, b.year) AS year,
    a.genre_freq,
    b.keyword_freq
FROM
(SELECT title, year, count(distinct genre) as genre_freq FROM genkeyword where (genre in 
(select genre from genkeyword where title='Harry Potter and the Deathly Hallows') )
AND title <> 'Harry Potter and the Deathly Hallows'
group by title, year) a
LEFT JOIN
(select title, year, 
count(distinct keyword) as keyword_freq 
from genkeyword
where keyword in (select keyword from genkeyword where title='Harry Potter and the Deathly Hallows')
 and title <> 'Harry Potter and the Deathly Hallows' group by title, year) b
 ON b.title = a.title;

Now that solution only works if there's a keyword match for a movie. 现在该解决方案仅在电影的关键字匹配时才有效。 The proper solution would be to replace the LEFT JOIN with a FULL OUTER JOIN , but MySQL doesn't support FULL OUTER JOIN s for some reason. 正确的解决方案是用FULL OUTER JOIN替换LEFT JOIN ,但MySQL由于某种原因不支持FULL OUTER JOIN There is a solution for this as well, but it's long, and involves lots of UNION s ;( 这也有一个解决方案,但它很长,涉及很多UNION ;(

How to do a FULL OUTER JOIN in MySQL? 如何在MySQL中进行全面的连接?

You could invert your logic and drive from genre and keywords using a sub query before totaling 在合计之前,您可以使用子查询来反转您的逻辑和驱动器类型和关键字

select title,year,
        sum(case when src = 'g' then 1 else 0 end) as genre,
        sum(case when src = 'k' then 1 else 0 end) as keyword
from
(
select 'g' as src, g1.title ,g1.year, g1.genre
from genkeyword g
join genkeyword g1 on g1.genre = g.genre
where g.title =  'Harry Potter and the Deathly Hallows' and g1.title <> 'Harry Potter and the Deathly Hallows'
union
select 'k' as src, g1.title ,g1.year, g1.genre
from genkeyword g
join genkeyword g1 on g1.keyword = g.keyword
where g.title =  'Harry Potter and the Deathly Hallows' and g1.title <> 'Harry Potter and the Deathly Hallows'
) s
group by title , year;

+--------------------------+------+-------+---------+
| title                    | year | genre | keyword |
+--------------------------+------+-------+---------+
| Cinderella               | 2015 |     2 |       2 |
| Enchanted                | 2007 |     1 |       3 |
| How to Train Your Dragon | 2010 |     2 |       0 |
| The Shape of Water       | 2017 |     2 |       4 |
+--------------------------+------+-------+---------+
4 rows in set (0.10 sec)

Try this query. 试试这个查询。
I haven't used any of the views you created, but you can use those if you want. 我没有使用您创建的任何视图,但如果您愿意,可以使用它们。

MySQL MySQL的

SET @tmpMovieid = (SELECT DISTINCT id 
                   FROM Movie 
                   WHERE title = 'Harry Potter and the Deathly Hallows');

SELECT id,
       title,
       IFNULL(Max(CASE WHEN coltype = 'genre' THEN col end),   0) AS genre_freq,
       IFNULL(Max(CASE WHEN coltype = 'Keyword' THEN col end), 0) AS keyword_freq

FROM   (SELECT id,
               title,
               Count(g.genre) AS col,
               'genre'        AS colType
        FROM   Movie m
               INNER JOIN Genre g ON m.id = g.Movie_id
        WHERE  g.genre IN (SELECT DISTINCT genre
                           FROM   Genre
                           WHERE  Movie_id = @tmpMovieid)
        GROUP  BY id, title

        UNION ALL

        SELECT id,
               title,
               Count(k.keyword) AS col,
               'Keyword'        AS colType
        FROM   Movie m
               INNER JOIN Keyword k ON m.id = k.Movie_id
        WHERE  k.keyword IN (SELECT DISTINCT keyword
                             FROM   Keyword
                             WHERE  Movie_id = @tmpMovieid)
        GROUP  BY id, title) tmp

WHERE  id <> @tmpMovieid
GROUP  BY id, title
ORDER  BY genre_freq DESC, keyword_freq DESC;

Online Demo: https://www.db-fiddle.com/f/s1xLQ6r4Zwi5hVjCsdcwV8/0 在线演示: https//www.db-fiddle.com/f/s1xLQ6r4Zwi5hVjCsdcwV8/0


SQL Server SQL Server
Note: Since you have used 'text' as some of the column data types, it needed to convert for some operations. 注意:由于您已将'text'用作某些列数据类型,因此需要转换某些操作。 But then again, since you're using MySQL, you don't need this. 但话说回来,因为你使用的是MySQL,所以你不需要这个。 I wrote this anyway to show you the difference and for fun. 无论如何我写这篇文章是为了向你展示差异和乐趣。

DECLARE @tmpMovieID INT;
SET @tmpMovieID = (SELECT DISTINCT id
                   FROM   movie
                   WHERE  Cast(title AS NVARCHAR(MAX)) = 'Harry Potter and the Deathly Hallows');

SELECT tmpGenre.id                  AS id,
       tmpGenre.title               AS title,
       ISNULL(tmpGenre.genre, 0)    AS genre,
       ISNULL(tmpKeyword.keyword,0) AS keyword

FROM   (SELECT id,
               Cast(title AS NVARCHAR(MAX))          AS title,
               Count(Cast(g.genre AS NVARCHAR(MAX))) AS genre
        FROM   movie m
               INNER JOIN genre g ON m.id = g.movie_id
        WHERE  Cast(g.genre AS NVARCHAR(MAX)) IN (SELECT DISTINCT Cast(genre AS NVARCHAR(MAX))
                                                 FROM   genre
                                                 WHERE  movie_id = @tmpMovieID)
        GROUP  BY id, Cast(title AS NVARCHAR(MAX))) tmpGenre

       FULL OUTER JOIN (SELECT id,
                               Cast(title AS NVARCHAR(MAX))            AS title,
                               Count(Cast(k.keyword AS NVARCHAR(MAX))) AS Keyword
                        FROM   movie m
                               INNER JOIN keyword k ON m.id = k.movie_id
                        WHERE  Cast(k.keyword AS NVARCHAR(MAX)) IN
                               (SELECT DISTINCT Cast(keyword AS NVARCHAR(MAX))
                                FROM   keyword
                                WHERE  movie_id = @tmpMovieID)
                        GROUP  BY id, Cast(title AS NVARCHAR(MAX))) tmpKeyword

                    ON tmpGenre.id = tmpKeyword.id
WHERE  tmpGenre.id <> @tmpMovieID
ORDER  BY tmpGenre.genre DESC, tmpKeyword.keyword DESC;

Online Demo: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=a1ee14e1e08b7e55eff2e8e94f89a287&hide=1 在线演示: https//dbfiddle.uk/?drbms = sqlserver_2017&fiddle=a1ee14e1e08b7e55eff2e8e94f89a287&hide=1


Result 结果

+------+---------------------------+-------------+--------------+
| id   |          title            | genre_freq  | keyword_freq |
+------+---------------------------+-------------+--------------+
| 407  | Cinderella                |          2  |            2 |
| 826  | The Shape of Water        |          2  |            1 |
| 523  | How to Train Your Dragon  |          2  |            0 |
| 799  | Enchanted                 |          1  |            3 |
+------+---------------------------+-------------+--------------+

By the way, thank you for asking a clear question and giving out table schema, sample data, and desired output. 顺便说一句,感谢您提出一个明确的问题,并提供表格架构,示例数据和所需的输出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM