简体   繁体   English

SQL:通过将子表连接到另一个表来计算百分比

[英]SQL : Calculating Percentage by joining a sub table to another

在此处输入图片说明

I've the above dataset, I need to report for each year the percentage of movies in that year with only female actors, and the total number of movies made that year.我有上面的数据集,我需要报告每年只有女演员的电影百分比,以及当年制作的电影总数。 For example, one answer will be: 1990 31.81 13522 meaning that in 1990 there were 13,522 movies, and 31.81%例如,一个答案是:1990 31.81 13522 意味着 1990 年有 13,522 部电影,31.81%

In order to get the moves with only female actors, wrote the following code:为了获得只有女演员的动作,编写了以下代码:

SELECT a.year as Year, COUNT(a.title) AS Female_Movies, a.title 
FROM Movie a
WHERE a.title NOT IN (

  SELECT b.title from Movie b
  Inner Join M_cast c
  on TRIM(c.MID) = b.MID
  Inner Join Person d
  on TRIM(c.PID) = d.PID
  WHERE d.Gender='Male'
  GROUP BY b.title
  )
GROUP BY a.year,a.title
Order By a.year asc

The total movies in each year , can be found using the following:每年的电影总数,可以使用以下方法找到:

SELECT a.year, count(a.title) AS Total_Movies
FROM Movie a
GROUP BY a.year
ORDER BY COUNT(a.title) DESC

Combinig the both I wrote, the following code:结合我写的两者,代码如下:

SELECT z.year as Year, count(z.title) AS Total_Movies, count(x.title) as Female_movies, count(z.title)/ count(x.title) As percentage
FROM Movie z
Inner Join (
SELECT a.year as Year, COUNT(a.title) AS Female_Movies, a.title 
FROM Movie a
WHERE a.title NOT IN (

  SELECT b.title from Movie b
  Inner Join M_cast c
  on TRIM(c.MID) = b.MID
  Inner Join Person d
  on TRIM(c.PID) = d.PID
  WHERE d.Gender='Male'
  GROUP BY b.title
  )
GROUP BY a.year,a.title
Order By a.year asc
)x
on x.year = z.year 
GROUP BY z.year
ORDER BY COUNT(z.title) DESC

However, in th output I'm seeing the years with only female movies correctly, but the count of total movies is equal to female_movies so I'm getting 1%, I tried debugging the code, but not sure where this is going wrong.但是,在 th 输出中,我正确地看到了只有女性电影的年份,但电影总数等于女性电影,所以我得到了 1%,我尝试调试代码,但不确定这是哪里出了问题。 Any insights would be appreciated.任何见解将不胜感激。

You assume that your 'z' contains all movies but since you do an inner join on the female movies, they'll also only contain female movies.您假设您的“z”包含所有电影,但由于您对女性电影进行了内部连接,因此它们也将仅包含女性电影。 You could fix that with a 'left join'.你可以用“左连接”来解决这个问题。

Assuming your two queries are correct, you can join on them with a 'WITH' like this:假设您的两个查询是正确的,您可以像这样使用“WITH”加入它们:

WITH allmovies (year, cnt) as
(SELECT a.year, count(a.title) AS Total_Movies
 FROM Movie a
 GROUP BY a.year
 ORDER BY COUNT(a.title) DESC)
,

femalemovies (year, cnt, title) as
(SELECT a.year as Year, COUNT(a.title) AS Female_Movies, a.title 
FROM Movie a
WHERE a.title NOT IN (

  SELECT b.title from Movie b
  Inner Join M_cast c
  on TRIM(c.MID) = b.MID
  Inner Join Person d
  on TRIM(c.PID) = d.PID
  WHERE d.Gender='Male'
  GROUP BY b.title
  )
GROUP BY a.year,a.title
Order By a.year asc)
select * from allmovies left join femalemovies on allmovies.year = femalemovies.year

You can use conditional aggregation.您可以使用条件聚合。 In a CASE expression check if no cast member that isn't female exists with a correlated subquery.CASE表达式中检查是否不存在具有相关子查询的非女性演员。 If the check is successful, return something not NULL and count() that to get the number of movies with only female cast members (or none at all).如果检查成功,则返回一些非NULLcount()以获取只有女性演员(或根本没有)的电影数量。

SELECT m.year,
       count(*) count_all,
       count(CASE
               WHEN NOT EXISTS (SELECT *
                                       FROM m_cast c
                                            INNER JOIN person p
                                                       ON p.pid = c.pid
                                       WHERE c.mid = m.mid
                                             AND p.gender <> 'Female') THEN
                 1
              END)
       /
       count(*)
       *
       100 percentage_only_female
       FROM movie m
       GROUP BY m.year;

Since in MySQL Boolean expressions in numerical context evaluate to 1 if true and to 0 otherwise, you could also use a sum() over the NOT EXISTS .由于在数值上下文中的 MySQL 布尔表达式中,如果为真,则为1 ,否则为0 ,因此您还可以在NOT EXISTS使用sum()

SELECT m.year,
       count(*) count_all,
       sum(NOT EXISTS (SELECT *
                              FROM m_cast c
                                   INNER JOIN person p
                                              ON p.pid = c.pid
                              WHERE c.mid = m.mid
                                    AND p.gender <> 'Female'))
       /
       count(*)
       *
       100 percentage_only_female
       FROM movie m
       GROUP BY m.year;

That however isn't compatible with most other DBMS in contrast to the first one.然而,与第一个相比,这与大多数其他 DBMS 不兼容。

I would use two levels of aggregation:我会使用两个级别的聚合:

SELECT m.MID, m.title, m.year,
       COUNT(*) as num_actors,
       SUM(gender = 'Female') as num_female_actors
FROM Movie m JOIN
     M_cast c
     ON c.MID = b.MID JOIN
     Person p
     ON p.PID = c.PID
GROUP BY m.MID, m.title, m.year;

Then a simple outer aggregation:然后是一个简单的外部聚合:

SELECT year,
       COUNT(*) as num_movies,
       SUM( num_actors = num_female_actors ) as num_female_only,
       AVG( num_actors = num_female_actors ) as female_only_ratio
FROM (SELECT m.MID, m.title, m.year,
             COUNT(*) as num_actors,
             SUM(gender = 'Female') as num_female_actors
      FROM Movie m JOIN
           M_cast c
           ON c.MID = b.MID JOIN
           Person p
           ON p.PID = c.PID
      GROUP BY m.MID, m.title, m.year
     ) m
GROUP BY year;

Notes:笔记:

  • Use meaningful table aliases, rather than arbitrary letters.使用有意义的表别名,而不是任意字母。 You'll note that the table aliases are abbreviations for the table names.您会注意到表别名是表名的缩写。
  • Do not use functions when filtering or JOIN ing unless necessary.除非必要,否则不要在过滤或JOIN时使用函数。 I removed the TRIM() .我删除了TRIM() If you need it use it.如果您需要它,请使用它。 Or better yet, fix the data.或者更好的是,修复数据。
              SELECT m.Year,COUNT(m.Year),x.t, 
              (COUNT(m.Year)*1.0/x.t*1.0)*100
             FROM Movie m LEFT  JOIN
             (SELECT Year,COUNT(Year) AS t FROM Movie GROUP BY year) AS x

             ON  m.Year=x.Year
             WHERE m.MID IN
             (SELECT MID FROM M_Cast WHERE PID in
             (SELECT PID FROM Person WHERE Gender='Female') 
              AND m.MID NOT IN
             (SELECT MID FROM M_Cast WHERE PID in
             (SELECT PID FROM Person WHERE Gender='Male'))) GROUP BY m.year    

Check if this is what you're looking for.检查这是否是您要查找的内容。

select movie.year, count(movie.mid) as Year_Wise_Movie_Count,cast(x.Female_Cast_Only as real) / count(movie.mid) As Percentage_of_Female_Cast from movie
inner join
(
SELECT Movie.year as Year, COUNT(Movie.mid) AS Female_Cast_Only
FROM Movie
WHERE Movie.MID NOT IN (
  SELECT Movie.MID from Movie
  Inner Join M_cast
  on TRIM(M_cast.MID) = Movie.MID
  Inner Join Person
  on TRIM(M_cast.PID) = Person.PID
  WHERE Person.Gender!='Female'
  GROUP BY Movie.MID
  )
GROUP BY Movie.year
Order By Movie.year asc
) x
on x.year = movie.year 
GROUP BY movie.year
ORDER BY movie.year

Output:输出:

year  Year_Wise_Movie_Count  Percentage_of_Female_Cast
----  ---------------------  -------------------------
1939  2                      0.5
1999  66                     0.0151515151515152
2000  64                     0.015625
2018  104                    0.00961538461538462

Note: This was executed in SQLIte3注意:这是在 SQLIte3 中执行的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM