R Studio - group by dataframe and get statistics using dplyr

Question

I have a dataframe:

I want to group by "ID" and "direction", then get the statistics for "value". The hardest thing for me is that for "category" column, I need to always output the last "category" in "ID" group, as highlighted on the picture.

I have the code, but the result is not desirable. Can anyone please help me to modify the existing code? Thank you for your time!

ID        <- c(1,1,1,2,2,2,3,3)
category  <- c("green", "green", "red", "red","green", "green", "yellow", "yellow")
direction <- c("in", "out","in", "out","in", "out","in", "out")
value     <- c(4,5,6,7,8,9,10,11)
df        <- data.frame(ID, category, direction, value)

res <- df %>% 
  group_by(ID,direction) %>% 
  arrange(ID, direction)%>%
  summarize(
    category    = last(category),
    sum_value   = sum(value),
    count_value = length(value)
  )

Answer 1

You're almost there. It's just that your "last(category)" grouping is based only on ID rather than both ID and direction. If you change it to:

res <- df %>%
  group_by(ID) %>% 
  mutate(category = last(category)) %>% 
  ungroup %>% 
  group_by(ID, direction, category) %>% 
  summarise(
    sum_value = sum(value),
    count_value = length(value)
  ) %>% 
  ungroup

It should do the trick.

R Studio - group by dataframe and get statistics using dplyr

Question

1 answers

solution1
1 ACCPTED 2020-04-17 21:09:31

R Studio - group by dataframe and get statistics using dplyr

Question

1 answers

solution1 1 ACCPTED 2020-04-17 21:09:31

solution1
1 ACCPTED 2020-04-17 21:09:31