[英]dplyr group operations adding na
这是我的数据:
places <- c("London", "London", "London", "Paris", "Paris", "Rennes")
years <- c(2019, 2019, 2020, 2019, 2019, 2020)
dataset <- data.frame(years, places)
结果:
years places
1 2019 London
2 2019 London
3 2020 London
4 2019 Paris
5 2019 Paris
6 2020 Rennes
我按地点和年份计算
dataset2 <- dataset %>%
count(places, years)
places years n
1 London 2019 2
2 London 2020 1
3 Paris 2019 2
4 Rennes 2020 1
即使没有值,我也希望我的表格显示每个城市的两年。
places years n
1 London 2019 2
2 London 2020 1
3 Paris 2019 2
4 Paris 2020 NA # or better 0
5 Rennes 2019 NA # or better 0
6 Rennes 2020 1
您可以使用complete
的tidyr
来填写缺失的序列:
library(dplyr)
library(tidyr)
dataset %>% count(places, years) %>% complete(places, years, fill = list(n = 0))
如果将years
转换为factor
,则可以指定.drop = FALSE
。
dataset %>% mutate(years = factor(years)) %>% count(places, years, .drop = FALSE)
# places years n
# <fct> <fct> <int>
#1 London 2019 2
#2 London 2020 1
#3 Paris 2019 2
#4 Paris 2020 0
#5 Rennes 2019 0
#6 Rennes 2020 1
我们可以使用来自data.table
的CJ
library(data.table)
setDT(dataset)[, .N, .(years, places)][CJ(years, places, unique = TRUE), on = .(years, places)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.