简体   繁体   中英

base R: Aggregate and sum by two columns

I am trying to use aggregate function to achieve same result as with SQL query however:

SQL:

sqldf(" SELECT
                PhotoID,
                UserID,
                SUM(Points) AS PhotoTotalPoints
            FROM Photos
            GROUP BY PhotoId, UserId")
116 186 rows.

R base:

aggregate(x = Photos["Points"]
  , by = Photos[c("PhotoId","UserId")]
  , FUN = sum
)
114 950 rows.

Using dplyr:

Photos %>%
    group_by(PhotoId,UserId) %>%
    summarise(sum = sum(Points)) 
116 186 rows.

I am new to R. Tried to solve it in many ways but couldn't find any explanation in docs. What am I missing?

It would be a case where there are NA elements in one of the grouping columns and if there are NA, by default the aggregate removes that row. In order to prevent that, we can use na.action = NULL

aggregate(Points~ PhotoId + UserId
    , FUN = sum, na.rm = TRUE,  na.action = NULL
   )

Or it could be a case where some unused combinations are dropped with drop = TRUE for the data.frame method

aggregate(x = Photos["Points"]
   , by = Photos[c("PhotoId","UserId")]
   , FUN = sum, na.rm = TRUE, drop = FALSE
   )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM