[英]group_by and count number of elements in each column in R
I have a data table like below:我有一个如下所示的数据表:
city year t_20 t_25
Seattle 2019 82 91
Seattle 2018 0 103
NYC 2010 78 8
DC 2011 71 0
DC 2011 0 0
DC 2018 60 0
I would like to group them by city
and year
and count the number of zeros in each group.我想按
city
和year
对它们进行分组,并计算每组中零的数量。
How can I do this?我怎样才能做到这一点? by
summarize_at
?通过
summarize_at
?
df %>% group_by(city, year) %>% summarise_at( WHAT GOES HERE , vars(t_20:t_25))
What should be the first argument of summarize_at
? summarize_at
的第一个参数应该是什么?
or any other way?或任何其他方式?
tally
? tally
?
An option is to reshape from wide to long before summarise
ing一个选择是在
summarise
之前从宽到长重塑
library(tidyverse)
df %>%
gather(k, v, -city, -year) %>%
group_by(city, year) %>%
summarise(n_0 = sum(v == 0))
# # A tibble: 5 x 3
## Groups: city [?]
# city year n_0
# <fct> <int> <int>
#1 DC 2011 3
#2 DC 2018 1
#3 NYC 2010 0
#4 Seattle 2018 1
#5 Seattle 2019 0
To summarise for each column separate you can do总结每列分开你可以做
df %>%
group_by(city, year) %>%
summarise_all(funs(sum(. == 0)))
## A tibble: 5 x 4
## Groups: city [?]
# city year t_20 t_25
# <fct> <int> <int> <int>
#1 DC 2011 1 2
#2 DC 2018 0 1
#3 NYC 2010 0 0
#4 Seattle 2018 1 0
#5 Seattle 2019 0 0
df <- read.table(text =
"city year t_20 t_25
Seattle 2019 82 91
Seattle 2018 0 103
NYC 2010 78 8
DC 2011 71 0
DC 2011 0 0
DC 2018 60 0", header = T)
A simple group by operation lends itself well to be formulated using SQL.一个简单的 group by 操作很适合使用 SQL 来制定。 For those SQL inclined, we could also try to solve this problem using the
sqldf
library:对于那些
sqldf
SQL 的人,我们也可以尝试使用sqldf
库来解决这个问题:
library(sqldf)
sql <- "SELECT city, COUNT(CASE WHEN t_20 = 0 THEN 1 END) AS t_20_cnt,
COUNT(CASE WHEN t_25 = 0 THEN 1 END) AS t_25_cnt
FROM df
GROUP BY city"
output <- sqldf(sql)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.