[英]SQL : Calculating Percentage by joining a sub table to another
I've the above dataset, I need to report for each year the percentage of movies in that year with only female actors, and the total number of movies made that year.我有上面的数据集,我需要报告每年只有女演员的电影百分比,以及当年制作的电影总数。 For example, one answer will be: 1990 31.81 13522 meaning that in 1990 there were 13,522 movies, and 31.81%例如,一个答案是:1990 31.81 13522 意味着 1990 年有 13,522 部电影,31.81%
In order to get the moves with only female actors, wrote the following code:为了获得只有女演员的动作,编写了以下代码:
SELECT a.year as Year, COUNT(a.title) AS Female_Movies, a.title
FROM Movie a
WHERE a.title NOT IN (
SELECT b.title from Movie b
Inner Join M_cast c
on TRIM(c.MID) = b.MID
Inner Join Person d
on TRIM(c.PID) = d.PID
WHERE d.Gender='Male'
GROUP BY b.title
)
GROUP BY a.year,a.title
Order By a.year asc
The total movies in each year , can be found using the following:每年的电影总数,可以使用以下方法找到:
SELECT a.year, count(a.title) AS Total_Movies
FROM Movie a
GROUP BY a.year
ORDER BY COUNT(a.title) DESC
Combinig the both I wrote, the following code:结合我写的两者,代码如下:
SELECT z.year as Year, count(z.title) AS Total_Movies, count(x.title) as Female_movies, count(z.title)/ count(x.title) As percentage
FROM Movie z
Inner Join (
SELECT a.year as Year, COUNT(a.title) AS Female_Movies, a.title
FROM Movie a
WHERE a.title NOT IN (
SELECT b.title from Movie b
Inner Join M_cast c
on TRIM(c.MID) = b.MID
Inner Join Person d
on TRIM(c.PID) = d.PID
WHERE d.Gender='Male'
GROUP BY b.title
)
GROUP BY a.year,a.title
Order By a.year asc
)x
on x.year = z.year
GROUP BY z.year
ORDER BY COUNT(z.title) DESC
However, in th output I'm seeing the years with only female movies correctly, but the count of total movies is equal to female_movies so I'm getting 1%, I tried debugging the code, but not sure where this is going wrong.但是,在 th 输出中,我正确地看到了只有女性电影的年份,但电影总数等于女性电影,所以我得到了 1%,我尝试调试代码,但不确定这是哪里出了问题。 Any insights would be appreciated.任何见解将不胜感激。
You assume that your 'z' contains all movies but since you do an inner join on the female movies, they'll also only contain female movies.您假设您的“z”包含所有电影,但由于您对女性电影进行了内部连接,因此它们也将仅包含女性电影。 You could fix that with a 'left join'.你可以用“左连接”来解决这个问题。
Assuming your two queries are correct, you can join on them with a 'WITH' like this:假设您的两个查询是正确的,您可以像这样使用“WITH”加入它们:
WITH allmovies (year, cnt) as
(SELECT a.year, count(a.title) AS Total_Movies
FROM Movie a
GROUP BY a.year
ORDER BY COUNT(a.title) DESC)
,
femalemovies (year, cnt, title) as
(SELECT a.year as Year, COUNT(a.title) AS Female_Movies, a.title
FROM Movie a
WHERE a.title NOT IN (
SELECT b.title from Movie b
Inner Join M_cast c
on TRIM(c.MID) = b.MID
Inner Join Person d
on TRIM(c.PID) = d.PID
WHERE d.Gender='Male'
GROUP BY b.title
)
GROUP BY a.year,a.title
Order By a.year asc)
select * from allmovies left join femalemovies on allmovies.year = femalemovies.year
You can use conditional aggregation.您可以使用条件聚合。 In a CASE
expression check if no cast member that isn't female exists with a correlated subquery.在CASE
表达式中检查是否不存在具有相关子查询的非女性演员。 If the check is successful, return something not NULL
and count()
that to get the number of movies with only female cast members (or none at all).如果检查成功,则返回一些非NULL
和count()
以获取只有女性演员(或根本没有)的电影数量。
SELECT m.year,
count(*) count_all,
count(CASE
WHEN NOT EXISTS (SELECT *
FROM m_cast c
INNER JOIN person p
ON p.pid = c.pid
WHERE c.mid = m.mid
AND p.gender <> 'Female') THEN
1
END)
/
count(*)
*
100 percentage_only_female
FROM movie m
GROUP BY m.year;
Since in MySQL Boolean expressions in numerical context evaluate to 1
if true and to 0
otherwise, you could also use a sum()
over the NOT EXISTS
.由于在数值上下文中的 MySQL 布尔表达式中,如果为真,则为1
,否则为0
,因此您还可以在NOT EXISTS
使用sum()
。
SELECT m.year,
count(*) count_all,
sum(NOT EXISTS (SELECT *
FROM m_cast c
INNER JOIN person p
ON p.pid = c.pid
WHERE c.mid = m.mid
AND p.gender <> 'Female'))
/
count(*)
*
100 percentage_only_female
FROM movie m
GROUP BY m.year;
That however isn't compatible with most other DBMS in contrast to the first one.然而,与第一个相比,这与大多数其他 DBMS 不兼容。
I would use two levels of aggregation:我会使用两个级别的聚合:
SELECT m.MID, m.title, m.year,
COUNT(*) as num_actors,
SUM(gender = 'Female') as num_female_actors
FROM Movie m JOIN
M_cast c
ON c.MID = b.MID JOIN
Person p
ON p.PID = c.PID
GROUP BY m.MID, m.title, m.year;
Then a simple outer aggregation:然后是一个简单的外部聚合:
SELECT year,
COUNT(*) as num_movies,
SUM( num_actors = num_female_actors ) as num_female_only,
AVG( num_actors = num_female_actors ) as female_only_ratio
FROM (SELECT m.MID, m.title, m.year,
COUNT(*) as num_actors,
SUM(gender = 'Female') as num_female_actors
FROM Movie m JOIN
M_cast c
ON c.MID = b.MID JOIN
Person p
ON p.PID = c.PID
GROUP BY m.MID, m.title, m.year
) m
GROUP BY year;
Notes:笔记:
JOIN
ing unless necessary.除非必要,否则不要在过滤或JOIN
时使用函数。 I removed the TRIM()
.我删除了TRIM()
。 If you need it use it.如果您需要它,请使用它。 Or better yet, fix the data.或者更好的是,修复数据。 SELECT m.Year,COUNT(m.Year),x.t,
(COUNT(m.Year)*1.0/x.t*1.0)*100
FROM Movie m LEFT JOIN
(SELECT Year,COUNT(Year) AS t FROM Movie GROUP BY year) AS x
ON m.Year=x.Year
WHERE m.MID IN
(SELECT MID FROM M_Cast WHERE PID in
(SELECT PID FROM Person WHERE Gender='Female')
AND m.MID NOT IN
(SELECT MID FROM M_Cast WHERE PID in
(SELECT PID FROM Person WHERE Gender='Male'))) GROUP BY m.year
Check if this is what you're looking for.检查这是否是您要查找的内容。
select movie.year, count(movie.mid) as Year_Wise_Movie_Count,cast(x.Female_Cast_Only as real) / count(movie.mid) As Percentage_of_Female_Cast from movie
inner join
(
SELECT Movie.year as Year, COUNT(Movie.mid) AS Female_Cast_Only
FROM Movie
WHERE Movie.MID NOT IN (
SELECT Movie.MID from Movie
Inner Join M_cast
on TRIM(M_cast.MID) = Movie.MID
Inner Join Person
on TRIM(M_cast.PID) = Person.PID
WHERE Person.Gender!='Female'
GROUP BY Movie.MID
)
GROUP BY Movie.year
Order By Movie.year asc
) x
on x.year = movie.year
GROUP BY movie.year
ORDER BY movie.year
Output:输出:
year Year_Wise_Movie_Count Percentage_of_Female_Cast
---- --------------------- -------------------------
1939 2 0.5
1999 66 0.0151515151515152
2000 64 0.015625
2018 104 0.00961538461538462
Note: This was executed in SQLIte3注意:这是在 SQLIte3 中执行的
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.