[英]dplyr group operations adding na
Here are my data:这是我的数据:
places <- c("London", "London", "London", "Paris", "Paris", "Rennes")
years <- c(2019, 2019, 2020, 2019, 2019, 2020)
dataset <- data.frame(years, places)
The result:结果:
years places
1 2019 London
2 2019 London
3 2020 London
4 2019 Paris
5 2019 Paris
6 2020 Rennes
I am counting by place and years我按地点和年份计算
dataset2 <- dataset %>%
count(places, years)
places years n
1 London 2019 2
2 London 2020 1
3 Paris 2019 2
4 Rennes 2020 1
I want my table to show the two years for each city even if there are no values.即使没有值,我也希望我的表格显示每个城市的两年。
places years n
1 London 2019 2
2 London 2020 1
3 Paris 2019 2
4 Paris 2020 NA # or better 0
5 Rennes 2019 NA # or better 0
6 Rennes 2020 1
You could use complete
from tidyr
to fill in missing sequence:您可以使用
complete
的tidyr
来填写缺失的序列:
library(dplyr)
library(tidyr)
dataset %>% count(places, years) %>% complete(places, years, fill = list(n = 0))
If you convert years
to factor
you can specify .drop = FALSE
.如果将
years
转换为factor
,则可以指定.drop = FALSE
。
dataset %>% mutate(years = factor(years)) %>% count(places, years, .drop = FALSE)
# places years n
# <fct> <fct> <int>
#1 London 2019 2
#2 London 2020 1
#3 Paris 2019 2
#4 Paris 2020 0
#5 Rennes 2019 0
#6 Rennes 2020 1
We can use CJ
from data.table
我们可以使用来自
data.table
的CJ
library(data.table)
setDT(dataset)[, .N, .(years, places)][CJ(years, places, unique = TRUE), on = .(years, places)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.