[英]SQL calculating average on an occurrence within a set
I have a dataset for a school project that consists of a set of movies and genres they belong to. 我有一个学校项目的数据集,包括他们所属的一组电影和流派。 a movie can belong to more than one genre and each is one row in the table (id is pk).
一部电影可以属于不止一种类型,每一种都是表中的一行(id是pk)。 eg a small sample
例如一个小样本
1 Taken Action
2 Sherlock Holmes Mystery
3 Sherlock Holmes Action
4 Predator Horror
5 Predator Action
6 Omen Horror
7 Pink Panther Comedy
How would I find that for a given Genre, what is the average number of Genres that a movie in that genre is in. eg on an average a movie in Horror Genre is in 1.5 Genres. 对于给定的类型,我如何找到该类型中的电影的平均类型数是多少。例如,恐怖类型中的电影平均为1.5种类型。
Generally I am used to doing averages on items like salary, numbers but this is slightly different 一般来说,我习惯于对工资,数字等项目进行平均分,但这有点不同
Please try the following... 请尝试以下......
SELECT genreListGenre,
AVG( movieGenreCount ) AS genreMean
FROM ( SELECT genreList.genre AS genreListGenre,
genreList.movieID AS genreListMovie,
COUNT( moreGenres.genre ) AS movieGenreCount
FROM movies AS genreList
JOIN movies AS moreGenres ON genreList.movieName = moreGenres.movieName
GROUP BY genreList.genre,
genreList.movieID
) AS genreListGenerator
GROUP BY genreListGenre
ORDER BY genreListGenre;
The inner query joins one instance of movies
to another of the same table. 内部查询将一个
movies
实例连接到同一个表中的另一个。 So that the fields from each may still be referred to without confusion, each instance is given an alias, genreList
and moreGenres
. 因此,每个实例的字段仍然可以被引用而不会混淆,每个实例都被赋予别名
genreList
和moreGenres
。 The purpose of the join is to develop a list comprising each genre
, each movieID
associated with that genre
, and each genre
associated with that movieID
's corresponding movieName
. movieID
的目的是开发包括每个genre
的列表,与该genre
相关联的每个movieID
,以及与该movieID
的对应movieName
相关联的每个genre
。 The first two fields can be determined from one instance of movies
, and the third can be drawn from the second instance and is dependent on the value of movieName
from the first instance, hence the join being on the shared value of movieName
. 前两个字段可以从一个
movies
实例中确定,第三个字段可以从第二个实例中提取,并且取决于第一个实例中movieName
的值,因此连接在movieName
的共享值上。
Once the list is formed it is grouped by genreList.genre
and genreList.movieID
and a count of each related genre ( moreGenres.genre
) for that group is made. 一旦形成列表,它就按
genreList.genre
和genreList.movieID
分组,并genreList.movieID
组制作每个相关类型( moreGenres.genre
)的计数。
The outer query takes the fields returned by the inner query and groups them by genre
and for each genre
calculates a mean average of the count of each genre
associated with each movie
that is associated with that group's genre
. 外部查询获取内部查询返回的字段并按
genre
对它们进行分组,并且对于每个genre
计算与每个与该组的genre
相关联的movie
相关联的每个genre
的计数的平均值。
If you have any questions or comments, then please feel free to post a Comment accordingly. 如果您有任何问题或意见,请随时发表评论。
Addendum 附录
This code was tested against an instance of movies
created (in MySQL) using the following script... 此代码使用以下脚本针对创建的
movies
实例(在MySQL中)进行了测试...
CREATE TABLE movies
(
movieID INT,
movieName VARCHAR( 50 ),
genre VARCHAR( 20 )
);
INSERT INTO movies ( movieID,
movieName,
genre )
VALUES ( 1, 'Taken', 'Action' ),
( 2, 'Sherlock Holmes', 'Mystery' ),
( 3, 'Sherlock Holmes', 'Action' ),
( 4, 'Predator', 'Horror' ),
( 5, 'Predator', 'Action' ),
( 6, 'Omen', 'Horror' ),
( 7, 'Pink Panther', 'Comedy' );
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.