[英]R join two data frames, group by column and calculate mean
I've Googled around, but I can't seem to find a solution for the problem I'm having.我已经谷歌搜索,但我似乎无法找到解决我遇到的问题的方法。 I have two data frames, one holds movies by ID and contains ratings for them:
我有两个数据框,一个按 ID 保存电影并包含对它们的评分:
> summary(ratings)
movieId mean_rating rating_count
Min. : 1 Min. : 1.000 Min. : 1.0
1st Qu.: 6796 1st Qu.: 5.600 1st Qu.: 3.0
Median : 65880 Median : 6.471 Median : 18.0
Mean : 58790 Mean : 6.266 Mean : 747.8
3rd Qu.: 99110 3rd Qu.: 7.130 3rd Qu.: 205.0
Max. :131262 Max. :10.000 Max. :67310.0
rn
Length:26744
Class :character
Mode :character
The other one is a collection of user defined tags that have been added to these movies.另一个是已添加到这些电影的用户定义标签的集合。 It also has a column called
movieId
that corresponds to movieId
in the first data frame.它还有一个名为
movieId
的列,对应于第一个数据帧中的movieId
。
> summary(tags)
userId movieId tag
Min. : 18 Min. : 1 Length:465564
1st Qu.: 28780 1st Qu.: 2571 Class :character
Median : 70201 Median : 7373 Mode :character
Mean : 68712 Mean : 32628
3rd Qu.:107322 3rd Qu.: 62235
Max. :138472 Max. :131258
timestamp rn
Min. :1135429210 Length:465564
1st Qu.:1245007262 Class :character
Median :1302291181 Mode :character
Mean :1298711076
3rd Qu.:1366217861
Max. :1427771352
What I want to do, is get the mean movie rating for each of the tags.我想要做的是获取每个标签的平均电影评分。 Basically, the equivalent of this SQL query:
基本上,相当于这个 SQL 查询:
SELECT t.tag, AVG(r.mean_rating) FROM movielens_tags t RIGHT JOIN movielens_ratings r ON t.movieId = r.movieId GROUP BY t.tag;
I just need 2 columns in the output:我只需要输出中的 2 列:
tag mean_rating
sci_fi 6.23
bollywood 7.45
action 5.75
However, this SQL query will never end.但是,这个 SQL 查询永远不会结束。 That's why I want to do it in R. Can anyone help me on how I should approach this?
这就是为什么我想在 R 中做到这一点。任何人都可以帮助我解决这个问题吗?
Here is the dplyr
translation of your SQL code (package dplyr
should be installed):这是您的 SQL 代码的
dplyr
翻译(应安装包dplyr
):
library(dplyr)
movielens_tags %>%
right_join(movielens_ratings, by = "movieId") %>%
group_by(tag) %>%
summarise(mean_rating = mean(mean_rating)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.