按組計數非零觀測數

Question

對於以下數據-我想計算每年每班的學生人數。

   Class  Students Gender Height Year_1999  Year_2000 Year_2001 Year_2002
     1      Mark     M     180      80        54         22       12
     2      John     M     234      0         59         32       62
     1      Tom      M     124      0         53         26       12
     2      Jane     F     180      80        54         22       0
     3      Kim      F     140      0         2           3       32

輸出應為

    Class  Year_1999   Year_2000   Year_2001  Year_2002
     1       1            2            2         2
     2       1            2            2         1
     3       0            1            1         1

我嘗試了以下方法，但運氣不佳

Number_obs = df %>% 
    group_by(class) %>% 
    summarise(count=n())

Answer 1

我們可以使用summarise_at在dplyr 。 按“類別”分組后，循環瀏覽summarise_at列名稱中具有“ year” matches的列，獲得不等於0的值的sum

library(dplyr)
df1 %>% 
   group_by(Class) %>%
   summarise_at(vars(matches("Year")), list(~ sum(as.logical(.))))
# A tibble: 3 x 5
#  Class Year_1999 Year_2000 Year_2001 Year_2002
#  <int>     <int>     <int>     <int>     <int>
#1     1         1         2         2         2
#2     2         1         2         2         1
#3     3         0         1         1         1

或者我們可以gather成“長”格式，在單列上執行group_by操作並將其spread為“寬”格式

library(tidyr)
df1 %>% 
    gather(key, val, matches("Year")) %>%
    group_by(Class, key) %>%
    summarise(val = sum(val  != 0)) %>% 
    spread(key, val)

或使用data.table

library(data.table)
setDT(df1)[, lapply(.SD, function(x) sum(as.logical(x))), .(Class), .SDcols = 5:8]

或使用base R與aggregate

aggregate(.~ Class, df1[-(2:4)], function(x) sum(x != 0))
#    Class Year_1999 Year_2000 Year_2001 Year_2002
#1     1         1         2         2         2
#2     2         1         2         2         1
#3     3         0         1         1         1

或使用rowsum

rowsum(+(!!df1[5:8]), df1$Class)
#    Year_1999 Year_2000 Year_2001 Year_2002
#1         1         2         2         2
#2         1         2         2         1
#3         0         1         1         1

或使用colSums

t(sapply(split(as.data.frame(df1[5:8] != 0), df1$Class), colSums))

數據

df1 <- structure(list(Class = c(1L, 2L, 1L, 2L, 3L), Students = c("Mark", 
"John", "Tom", "Jane", "Kim"), Gender = c("M", "M", "M", "F", 
"F"), Height = c(180L, 234L, 124L, 180L, 140L), Year_1999 = c(80L, 
0L, 0L, 80L, 0L), Year_2000 = c(54L, 59L, 53L, 54L, 2L), Year_2001 = c(22L, 
32L, 26L, 22L, 3L), 
Year_2002 = c(12L, 62L, 12L, 0L, 32L)), class = "data.frame", 
  row.names = c(NA, 
-5L))

Answer 2

與@akrun的colSums解決方案類似，使用by 。

do.call(rbind, by(df[5:8] > 0, df[1], colSums))
#   Year_1999 Year_2000 Year_2001 Year_2002
# 1         1         2         2         2
# 2         1         2         2         1
# 3         0         1         1         1

要么

Reduce(rbind, by(df[5:8] > 0, df[1], colSums))
#      Year_1999 Year_2000 Year_2001 Year_2002
# init         1         2         2         2
#              1         2         2         1
#              0         1         1         1

do.call更快。

Answer 3

使用dplyr ，我們可以使用summarise_at

library(dplyr)

df %>%
  group_by(Class) %>%
  summarise_at(vars(starts_with("Year")), ~sum(. != 0))

#  Class Year_1999 Year_2000 Year_2001 Year_2002
#  <int>     <int>     <int>     <int>     <int>
#1     1         1         2         2         2
#2     2         1         2         2         1
#3     3         0         1         1         1

按組計數非零觀測數

問題描述

3 個解決方案

解決方案1
1 2019-06-25 04:32:54

數據

解決方案2
1 2019-06-25 07:38:06

解決方案3
0 2019-06-25 04:32:56

按組計數非零觀測數

問題描述

3 個解決方案

解決方案1 1 2019-06-25 04:32:54

數據

解決方案2 1 2019-06-25 07:38:06

解決方案3 0 2019-06-25 04:32:56

解決方案1
1 2019-06-25 04:32:54

解決方案2
1 2019-06-25 07:38:06

解決方案3
0 2019-06-25 04:32:56