简体   繁体   English

group_by 并计算 R 中每列中的元素数

[英]group_by and count number of elements in each column in R

I have a data table like below:我有一个如下所示的数据表:

city         year    t_20   t_25 
Seattle      2019    82      91  
Seattle      2018     0      103   
NYC          2010    78       8 
DC           2011    71       0  
DC           2011     0       0    
DC           2018    60       0

I would like to group them by city and year and count the number of zeros in each group.我想按cityyear对它们进行分组,并计算每组中零的数量。

How can I do this?我怎样才能做到这一点? by summarize_at ?通过summarize_at ?

df %>% group_by(city, year) %>% summarise_at( WHAT GOES HERE , vars(t_20:t_25))

What should be the first argument of summarize_at ? summarize_at的第一个参数应该是什么?

or any other way?或任何其他方式? tally ? tally

An option is to reshape from wide to long before summarise ing一个选择是在summarise之前从宽到长重塑

library(tidyverse)
df %>%
    gather(k, v, -city, -year) %>%
    group_by(city, year) %>%
    summarise(n_0 = sum(v == 0)) 
#    # A tibble: 5 x 3
## Groups:   city [?]
#  city     year   n_0
#  <fct>   <int> <int>
#1 DC       2011     3
#2 DC       2018     1
#3 NYC      2010     0
#4 Seattle  2018     1
#5 Seattle  2019     0

To summarise for each column separate you can do总结每列分开你可以做

df %>%
    group_by(city, year) %>%
    summarise_all(funs(sum(. == 0)))
## A tibble: 5 x 4
## Groups:   city [?]
#  city     year  t_20  t_25
#  <fct>   <int> <int> <int>
#1 DC       2011     1     2
#2 DC       2018     0     1
#3 NYC      2010     0     0
#4 Seattle  2018     1     0
#5 Seattle  2019     0     0

Sample data样本数据

df <- read.table(text =
    "city         year    t_20   t_25
Seattle      2019    82      91
Seattle      2018     0      103
NYC          2010    78       8
DC           2011    71       0
DC           2011     0       0
DC           2018    60       0", header = T)

A simple group by operation lends itself well to be formulated using SQL.一个简单的 group by 操作很适合使用 SQL 来制定。 For those SQL inclined, we could also try to solve this problem using the sqldf library:对于那些sqldf SQL 的人,我们也可以尝试使用sqldf库来解决这个问题:

library(sqldf)

sql <- "SELECT city, COUNT(CASE WHEN t_20 = 0 THEN 1 END) AS t_20_cnt,
            COUNT(CASE WHEN t_25 = 0 THEN 1 END) AS t_25_cnt
        FROM df
        GROUP BY city"

output <- sqldf(sql)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM