简体   繁体   中英

Using group_by and summarise from dplyr for all rows not containing the variable to group_by

I have a data.frame with such as

df1 <- data.frame(id = c("A", "A", "B", "B", "B"), 
                  cost = c(100, 10, 120, 102, 102)

I know that I can use

df1.a <- group_by(df1, id) %>%
    summarise(no.c = n(), 
              m.costs = mean(cost))

to calculate the number of observations and mean by id. How could I do so if I want to calculate the number of observations and mean for all rows that are NOT equal to the ID, so it would for example give me 3 as value for observations not A and 2 for observations not B.

I would like to use the dplyr package and the group_by functions since I have to this for a lot of huge dataframes.

You can use the . to refer to the whole data.frame, which lets you calculate the differences between the group and the whole:

df1 %>% group_by(id) %>% 
    summarise(n = n(), 
              n_other = nrow(.) - n, 
              mean_cost = mean(cost), 
              mean_other = (sum(.$cost) - sum(cost)) / n_other)

## # A tibble: 2 × 5
##       id     n n_other mean_cost mean_other
##   <fctr> <int>   <int>     <dbl>      <dbl>
## 1      A     2       3        55        108
## 2      B     3       2       108         55

As you can see from the results, with two groups you could just use rev , but this approach will scale to more groups or calculations easily.

Looking for something like this? This calculates the total cost and total number of rows firstly and then subtract the total cost and total number of rows for each group and take average for the cost:

sumCost = sum(df1$cost)
totRows = nrow(df1)

df1 %>% 
        group_by(id) %>% 
        summarise(no.c = totRows - n(), 
                  m.costs = (sumCost - sum(cost))/no.c)

# A tibble: 2 x 3
#      id  no.c m.costs
#  <fctr> <int>   <dbl>
#1      A     3     108
#2      B     2      55

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM