I'm struggling to use dplyr and tidyr to take a df in this form:
myDf <- data.frame(id = c(1,1,1,1,2,2),
event = c('a','b','a','b','a','b'),
a_property = c(1,NA,2, NA, 3, NA),
b_property = c(NA,2,NA, 3, NA, 4))
> myDf
id event a_property b_property
1 a 1 NA
1 b NA 2
1 a 2 NA
1 b NA 3
2 a 3 NA
2 b NA 4
and transform into this desired format:
id count_event_a count_event_b sum_property_a sum_property_b
1 2 2 3 5
2 1 1 5 4
myDf %>%
group_by(id) %>%
summarise(count_event_a = sum(!is.na(a_property)),
count_event_b = sum(!is.na(b_property)),
sum_property_a = sum(a_property, na.rm = TRUE),
sum_property_b = sum(b_property, na.rm = TRUE)) %>%
ungroup()
There is a typo in your example. The answer should be:
# A tibble: 2 × 5
id count_event_a count_event_b sum_property_a sum_property_b
<dbl> <int> <int> <dbl> <dbl>
1 1 2 2 3 5
2 2 1 1 3 4
A little more general:
myDf %>%
gather(key, value, -id, -event) %>%
filter(!is.na(value)) %>%
group_by(id, event) %>%
summarise(count = n(),
sum = sum(value)) %>%
gather(key, value, -id, -event) %>%
unite(measure, key, event) %>%
spread(measure, value)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.