[英]dplyr group operations adding na
這是我的數據:
places <- c("London", "London", "London", "Paris", "Paris", "Rennes")
years <- c(2019, 2019, 2020, 2019, 2019, 2020)
dataset <- data.frame(years, places)
結果:
years places
1 2019 London
2 2019 London
3 2020 London
4 2019 Paris
5 2019 Paris
6 2020 Rennes
我按地點和年份計算
dataset2 <- dataset %>%
count(places, years)
places years n
1 London 2019 2
2 London 2020 1
3 Paris 2019 2
4 Rennes 2020 1
即使沒有值,我也希望我的表格顯示每個城市的兩年。
places years n
1 London 2019 2
2 London 2020 1
3 Paris 2019 2
4 Paris 2020 NA # or better 0
5 Rennes 2019 NA # or better 0
6 Rennes 2020 1
您可以使用complete
的tidyr
來填寫缺失的序列:
library(dplyr)
library(tidyr)
dataset %>% count(places, years) %>% complete(places, years, fill = list(n = 0))
如果將years
轉換為factor
,則可以指定.drop = FALSE
。
dataset %>% mutate(years = factor(years)) %>% count(places, years, .drop = FALSE)
# places years n
# <fct> <fct> <int>
#1 London 2019 2
#2 London 2020 1
#3 Paris 2019 2
#4 Paris 2020 0
#5 Rennes 2019 0
#6 Rennes 2020 1
我們可以使用來自data.table
的CJ
library(data.table)
setDT(dataset)[, .N, .(years, places)][CJ(years, places, unique = TRUE), on = .(years, places)]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.