base R: Aggregate and sum by two columns

Question

I am trying to use aggregate function to achieve same result as with SQL query however:

SQL:

sqldf(" SELECT
                PhotoID,
                UserID,
                SUM(Points) AS PhotoTotalPoints
            FROM Photos
            GROUP BY PhotoId, UserId")
116 186 rows.

R base:

aggregate(x = Photos["Points"]
  , by = Photos[c("PhotoId","UserId")]
  , FUN = sum
)
114 950 rows.

Using dplyr:

Photos %>%
    group_by(PhotoId,UserId) %>%
    summarise(sum = sum(Points)) 
116 186 rows.

I am new to R. Tried to solve it in many ways but couldn't find any explanation in docs. What am I missing?

Answer 1

It would be a case where there are NA elements in one of the grouping columns and if there are NA, by default the aggregate removes that row. In order to prevent that, we can use na.action = NULL

aggregate(Points~ PhotoId + UserId
    , FUN = sum, na.rm = TRUE,  na.action = NULL
   )

Or it could be a case where some unused combinations are dropped with drop = TRUE for the data.frame method

aggregate(x = Photos["Points"]
   , by = Photos[c("PhotoId","UserId")]
   , FUN = sum, na.rm = TRUE, drop = FALSE
   )

base R: Aggregate and sum by two columns

Question

1 answers

solution1
2 ACCPTED 2020-04-15 23:41:52

base R: Aggregate and sum by two columns

Question

1 answers

solution1 2 ACCPTED 2020-04-15 23:41:52

solution1
2 ACCPTED 2020-04-15 23:41:52