I am trying to use aggregate function to achieve same result as with SQL query however:
SQL:
sqldf(" SELECT
PhotoID,
UserID,
SUM(Points) AS PhotoTotalPoints
FROM Photos
GROUP BY PhotoId, UserId")
116 186 rows.
R base:
aggregate(x = Photos["Points"]
, by = Photos[c("PhotoId","UserId")]
, FUN = sum
)
114 950 rows.
Using dplyr:
Photos %>%
group_by(PhotoId,UserId) %>%
summarise(sum = sum(Points))
116 186 rows.
I am new to R. Tried to solve it in many ways but couldn't find any explanation in docs. What am I missing?
It would be a case where there are NA
elements in one of the grouping columns and if there are NA, by default the aggregate
removes that row. In order to prevent that, we can use na.action = NULL
aggregate(Points~ PhotoId + UserId
, FUN = sum, na.rm = TRUE, na.action = NULL
)
Or it could be a case where some unused combinations are dropped with drop = TRUE
for the data.frame method
aggregate(x = Photos["Points"]
, by = Photos[c("PhotoId","UserId")]
, FUN = sum, na.rm = TRUE, drop = FALSE
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.