简体   繁体   English

SQL计算集合中出现的平均值

[英]SQL calculating average on an occurrence within a set

I have a dataset for a school project that consists of a set of movies and genres they belong to. 我有一个学校项目的数据集,包括他们所属的一组电影和流派。 a movie can belong to more than one genre and each is one row in the table (id is pk). 一部电影可以属于不止一种类型,每一种都是表中的一行(id是pk)。 eg a small sample 例如一个小样本

1  Taken  Action
2  Sherlock Holmes Mystery
3  Sherlock Holmes Action
4  Predator        Horror
5  Predator        Action
6  Omen            Horror
7  Pink Panther    Comedy

How would I find that for a given Genre, what is the average number of Genres that a movie in that genre is in. eg on an average a movie in Horror Genre is in 1.5 Genres. 对于给定的类型,我如何找到该类型中的电影的平均类型数是多少。例如,恐怖类型中的电影平均为1.5种类型。

Generally I am used to doing averages on items like salary, numbers but this is slightly different 一般来说,我习惯于对工资,数字等项目进行平均分,但这有点不同

Please try the following... 请尝试以下......

SELECT genreListGenre,
       AVG( movieGenreCount ) AS genreMean
FROM ( SELECT genreList.genre AS genreListGenre,
              genreList.movieID AS genreListMovie,
              COUNT( moreGenres.genre ) AS movieGenreCount
       FROM movies AS genreList
       JOIN movies AS moreGenres ON genreList.movieName = moreGenres.movieName
       GROUP BY genreList.genre,
                genreList.movieID
     ) AS genreListGenerator
GROUP BY genreListGenre
ORDER BY genreListGenre;

The inner query joins one instance of movies to another of the same table. 内部查询将一个movies实例连接到同一个表中的另一个。 So that the fields from each may still be referred to without confusion, each instance is given an alias, genreList and moreGenres . 因此,每个实例的字段仍然可以被引用而不会混淆,每个实例都被赋予别名genreListmoreGenres The purpose of the join is to develop a list comprising each genre , each movieID associated with that genre , and each genre associated with that movieID 's corresponding movieName . movieID的目的是开发包括每个genre的列表,与该genre相关联的每个movieID ,以及与该movieID的对应movieName相关联的每个genre The first two fields can be determined from one instance of movies , and the third can be drawn from the second instance and is dependent on the value of movieName from the first instance, hence the join being on the shared value of movieName . 前两个字段可以从一个movies实例中确定,第三个字段可以从第二个实例中提取,并且取决于第一个实例中movieName的值,因此连接在movieName的共享值上。

Once the list is formed it is grouped by genreList.genre and genreList.movieID and a count of each related genre ( moreGenres.genre ) for that group is made. 一旦形成列表,它就按genreList.genregenreList.movieID分组,并genreList.movieID组制作每个相关类型( moreGenres.genre )的计数。

The outer query takes the fields returned by the inner query and groups them by genre and for each genre calculates a mean average of the count of each genre associated with each movie that is associated with that group's genre . 外部查询获取内部查询返回的字段并按genre对它们进行分组,并且对于每个genre计算与每个与该组的genre相关联的movie相关联的每个genre的计数的平均值。

If you have any questions or comments, then please feel free to post a Comment accordingly. 如果您有任何问题或意见,请随时发表评论。

Addendum 附录

This code was tested against an instance of movies created (in MySQL) using the following script... 此代码使用以下脚本针对创建的movies实例(在MySQL中)进行了测试...

CREATE TABLE movies
(
    movieID     INT,
    movieName   VARCHAR( 50 ),
    genre       VARCHAR( 20 )
);
INSERT INTO movies ( movieID,
                     movieName,
                     genre )
VALUES ( 1, 'Taken',           'Action'  ),
       ( 2, 'Sherlock Holmes', 'Mystery' ),
       ( 3, 'Sherlock Holmes', 'Action'  ),
       ( 4, 'Predator',        'Horror'  ),
       ( 5, 'Predator',        'Action'  ),
       ( 6, 'Omen',            'Horror'  ),
       ( 7, 'Pink Panther',    'Comedy'  );

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM