[英]SQL: Understanding the OR operator in a WHERE clause
我有一个名为Movie,Genre和Keyword的表,我创建了一个名为'genkeyword'的视图。 视图'genkeyword'有很多元组,所以可以在DB Fiddle访问它。
我有以下查询:
SELECT title,
year,
Count(DISTINCT genre) AS genre_freq,
Count(DISTINCT keyword) AS keyword_freq
FROM genkeyword
WHERE ( genre IN (SELECT genre
FROM genkeyword
WHERE title = 'Harry Potter and the Deathly Hallows')
OR keyword IN (SELECT keyword
FROM genkeyword
WHERE title = 'Harry Potter and the Deathly Hallows') )
AND title <> 'Harry Potter and the Deathly Hallows'
GROUP BY title,
year
ORDER BY genre_freq DESC,
keyword_freq DESC;
我打算用这个查询来获取每个具有与哈利波特相同的类型和关键词的电影的流派和关键词频率:输出应该是:
title | genre_freq | keyword_freq
Cinderella 2 2
The Shape of Water 2 1
How to Train Your Dragon 2 0
Enchanted 1 3
我知道查询不正确,因为我得到以下输出:
title | genre_freq | keyword_freq
The Shape of Water 4 3
Enchanted 3 4
Cinderella 2 5
How to Train Your Dragon 2 3
但是,我想澄清一下我对查询如何工作的理解。
在我的查询的'where'子句中:
where (genre in (select genre from genkeyword where title='Harry Potter') or
keyword in (select keyword from genkeyword where title='Harry Potter'))
我是否正确地说生成了两个结果集,一个包含所有具有Harry Potter中的类型的元组(让它为R1),另一个包含所有具有哈利波特关键字的元组(让这成为R2)?
如果所考虑的元组包含类型结果集R1中的类型或关键字结果集R2中的关键字,则计算类型/关键字。 我不确定在这种情况下count(不同类型)和count(distinct keyword)是如何工作的。 如果元组包含R1中的类型,则只计算类型或计算关键字? 这对于元组在R2中包含关键字的情况是相同的,是否计算了类型以及关键字?
我不明白为什么我从查询中得到genre_freq和keyword_freq值错误。 这是因为我不完全理解在幕后如何计算类型和关键词频率。 任何见解都表示赞赏。
到目前为止我在SO上看到的最常见问题之一。
回答你的问题。 OR子句基本上将关键字部分和类型部分的结果粘贴在彼此之下。 SQL在行(或记录)中工作,因此您应该始终考虑行。
首先,它选择包含像哈利波特一样的所有类型的行。 然后它选择包含关键字的所有行。 然后它执行计数。 显然,这太高了,因为你也会获得所有不具有相同类型的记录,但确实有重叠的关键字。 您还将获得具有重叠类型但不重叠关键字的所有行。
要正确计算记录,只需将OR更改为AND。 这将仅选择具有相同类型的记录以及包含关键字的记录。 计算这些将产生正确的结果。
正如Imre_G所说,这是一个很好的问题,他对出现问题的解释就是现实。 你基本上选择你不想要的流派和关键词,然后计算这些,因为它们共享一个共同元素。
这是修复它的一种方法,可能不是最好的,但最简单的方法:
SELECT
COALESCE(a.title, b.title) AS title,
COALESCE(a.year, b.year) AS year,
a.genre_freq,
b.keyword_freq
FROM
(SELECT title, year, count(distinct genre) as genre_freq FROM genkeyword where (genre in
(select genre from genkeyword where title='Harry Potter and the Deathly Hallows') )
AND title <> 'Harry Potter and the Deathly Hallows'
group by title, year) a
LEFT JOIN
(select title, year,
count(distinct keyword) as keyword_freq
from genkeyword
where keyword in (select keyword from genkeyword where title='Harry Potter and the Deathly Hallows')
and title <> 'Harry Potter and the Deathly Hallows' group by title, year) b
ON b.title = a.title;
现在该解决方案仅在电影的关键字匹配时才有效。 正确的解决方案是用FULL OUTER JOIN
替换LEFT JOIN
,但MySQL由于某种原因不支持FULL OUTER JOIN
。 这也有一个解决方案,但它很长,涉及很多UNION
;(
在合计之前,您可以使用子查询来反转您的逻辑和驱动器类型和关键字
select title,year,
sum(case when src = 'g' then 1 else 0 end) as genre,
sum(case when src = 'k' then 1 else 0 end) as keyword
from
(
select 'g' as src, g1.title ,g1.year, g1.genre
from genkeyword g
join genkeyword g1 on g1.genre = g.genre
where g.title = 'Harry Potter and the Deathly Hallows' and g1.title <> 'Harry Potter and the Deathly Hallows'
union
select 'k' as src, g1.title ,g1.year, g1.genre
from genkeyword g
join genkeyword g1 on g1.keyword = g.keyword
where g.title = 'Harry Potter and the Deathly Hallows' and g1.title <> 'Harry Potter and the Deathly Hallows'
) s
group by title , year;
+--------------------------+------+-------+---------+
| title | year | genre | keyword |
+--------------------------+------+-------+---------+
| Cinderella | 2015 | 2 | 2 |
| Enchanted | 2007 | 1 | 3 |
| How to Train Your Dragon | 2010 | 2 | 0 |
| The Shape of Water | 2017 | 2 | 4 |
+--------------------------+------+-------+---------+
4 rows in set (0.10 sec)
试试这个查询。
我没有使用您创建的任何视图,但如果您愿意,可以使用它们。
MySQL的
SET @tmpMovieid = (SELECT DISTINCT id
FROM Movie
WHERE title = 'Harry Potter and the Deathly Hallows');
SELECT id,
title,
IFNULL(Max(CASE WHEN coltype = 'genre' THEN col end), 0) AS genre_freq,
IFNULL(Max(CASE WHEN coltype = 'Keyword' THEN col end), 0) AS keyword_freq
FROM (SELECT id,
title,
Count(g.genre) AS col,
'genre' AS colType
FROM Movie m
INNER JOIN Genre g ON m.id = g.Movie_id
WHERE g.genre IN (SELECT DISTINCT genre
FROM Genre
WHERE Movie_id = @tmpMovieid)
GROUP BY id, title
UNION ALL
SELECT id,
title,
Count(k.keyword) AS col,
'Keyword' AS colType
FROM Movie m
INNER JOIN Keyword k ON m.id = k.Movie_id
WHERE k.keyword IN (SELECT DISTINCT keyword
FROM Keyword
WHERE Movie_id = @tmpMovieid)
GROUP BY id, title) tmp
WHERE id <> @tmpMovieid
GROUP BY id, title
ORDER BY genre_freq DESC, keyword_freq DESC;
在线演示: https : //www.db-fiddle.com/f/s1xLQ6r4Zwi5hVjCsdcwV8/0
SQL Server
注意:由于您已将'text'用作某些列数据类型,因此需要转换某些操作。 但话说回来,因为你使用的是MySQL,所以你不需要这个。 无论如何我写这篇文章是为了向你展示差异和乐趣。
DECLARE @tmpMovieID INT;
SET @tmpMovieID = (SELECT DISTINCT id
FROM movie
WHERE Cast(title AS NVARCHAR(MAX)) = 'Harry Potter and the Deathly Hallows');
SELECT tmpGenre.id AS id,
tmpGenre.title AS title,
ISNULL(tmpGenre.genre, 0) AS genre,
ISNULL(tmpKeyword.keyword,0) AS keyword
FROM (SELECT id,
Cast(title AS NVARCHAR(MAX)) AS title,
Count(Cast(g.genre AS NVARCHAR(MAX))) AS genre
FROM movie m
INNER JOIN genre g ON m.id = g.movie_id
WHERE Cast(g.genre AS NVARCHAR(MAX)) IN (SELECT DISTINCT Cast(genre AS NVARCHAR(MAX))
FROM genre
WHERE movie_id = @tmpMovieID)
GROUP BY id, Cast(title AS NVARCHAR(MAX))) tmpGenre
FULL OUTER JOIN (SELECT id,
Cast(title AS NVARCHAR(MAX)) AS title,
Count(Cast(k.keyword AS NVARCHAR(MAX))) AS Keyword
FROM movie m
INNER JOIN keyword k ON m.id = k.movie_id
WHERE Cast(k.keyword AS NVARCHAR(MAX)) IN
(SELECT DISTINCT Cast(keyword AS NVARCHAR(MAX))
FROM keyword
WHERE movie_id = @tmpMovieID)
GROUP BY id, Cast(title AS NVARCHAR(MAX))) tmpKeyword
ON tmpGenre.id = tmpKeyword.id
WHERE tmpGenre.id <> @tmpMovieID
ORDER BY tmpGenre.genre DESC, tmpKeyword.keyword DESC;
在线演示: https : //dbfiddle.uk/?drbms = sqlserver_2017&fiddle=a1ee14e1e08b7e55eff2e8e94f89a287&hide=1
结果
+------+---------------------------+-------------+--------------+
| id | title | genre_freq | keyword_freq |
+------+---------------------------+-------------+--------------+
| 407 | Cinderella | 2 | 2 |
| 826 | The Shape of Water | 2 | 1 |
| 523 | How to Train Your Dragon | 2 | 0 |
| 799 | Enchanted | 1 | 3 |
+------+---------------------------+-------------+--------------+
顺便说一句,感谢您提出一个明确的问题,并提供表格架构,示例数据和所需的输出。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.