[英]Calculate mean of dataset in R that satisfies two conditions
我想在 R 中找到按年份和 book_Id 分組的數據集的平均值。
我嘗試使用 Tapply 但我只能在該函數中放置一個索引條件。
在 SQL 中它看起來像
Select year, book_id, avg(users_read)
From
Where year = 2018
Group by year, book_id
So my final table would like
year | book_id | avg(users_read)
2018. 1. 12
2018. 2. 8
2018. 3. 13
R 中 SQL 代碼的翻譯將是 -
res <- aggregate(users_read~year + book_id, subset(df, year == 2018), mean)
或者在dplyr
-
library(dplyr)
res <- df %>%
filter( year == 2018) %>%
group_by(book_id) %>%
summarise(users_read = mean(users_read))
感謝sqldf包,您也可以在 R 中使用您的 SQL 語句:
sqldf::sqldf("
Select year, book_id, avg(users_read)
From df1
Where year = 2018
Group by year, book_id
")
year book_id avg(users_read) 1 2018 1 10.4 2 2018 2 15.5 3 2018 3 9.0
set.seed(123)
n <- 20
df1 <- data.frame(year = sample(2018:2019, n, TRUE),
book_id = sample(3, n, TRUE),
users_read = sample(c(1:(n-1), NA), n))
請注意,列users_read
包含一個NA
值。
df1
year book_id users_read 1 2018 1 9 2 2018 1 NA 3 2018 1 10 4 2019 1 7 5 2018 3 5 6 2019 2 11 7 2019 3 6 8 2019 2 19 9 2018 1 2 10 2018 2 16 11 2019 3 8 12 2019 2 12 13 2019 1 1 14 2018 3 18 15 2019 3 3 16 2018 1 17 17 2019 3 13 18 2018 2 15 19 2018 1 14 20 2018 3 4
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.