簡體   English   中英

按組計數非零觀測數

[英]Counting number of non zero observation by group

對於以下數據-我想計算每年每班的學生人數。

   Class  Students Gender Height Year_1999  Year_2000 Year_2001 Year_2002
     1      Mark     M     180      80        54         22       12
     2      John     M     234      0         59         32       62
     1      Tom      M     124      0         53         26       12
     2      Jane     F     180      80        54         22       0
     3      Kim      F     140      0         2           3       32

輸出應為

    Class  Year_1999   Year_2000   Year_2001  Year_2002
     1       1            2            2         2
     2       1            2            2         1
     3       0            1            1         1

我嘗試了以下方法,但運氣不佳

Number_obs = df %>% 
    group_by(class) %>% 
    summarise(count=n())

我們可以使用summarise_atdplyr 按“類別”分組后,循環瀏覽summarise_at列名稱中具有“ year” matches的列,獲得不等於0的值的sum

library(dplyr)
df1 %>% 
   group_by(Class) %>%
   summarise_at(vars(matches("Year")), list(~ sum(as.logical(.))))
# A tibble: 3 x 5
#  Class Year_1999 Year_2000 Year_2001 Year_2002
#  <int>     <int>     <int>     <int>     <int>
#1     1         1         2         2         2
#2     2         1         2         2         1
#3     3         0         1         1         1

或者我們可以gather成“長”格式,在單列上執行group_by操作並將其spread為“寬”格式

library(tidyr)
df1 %>% 
    gather(key, val, matches("Year")) %>%
    group_by(Class, key) %>%
    summarise(val = sum(val  != 0)) %>% 
    spread(key, val)

或使用data.table

library(data.table)
setDT(df1)[, lapply(.SD, function(x) sum(as.logical(x))), .(Class), .SDcols = 5:8]

或使用base Raggregate

aggregate(.~ Class, df1[-(2:4)], function(x) sum(x != 0))
#    Class Year_1999 Year_2000 Year_2001 Year_2002
#1     1         1         2         2         2
#2     2         1         2         2         1
#3     3         0         1         1         1

或使用rowsum

rowsum(+(!!df1[5:8]), df1$Class)
#    Year_1999 Year_2000 Year_2001 Year_2002
#1         1         2         2         2
#2         1         2         2         1
#3         0         1         1         1

或使用colSums

t(sapply(split(as.data.frame(df1[5:8] != 0), df1$Class), colSums))

數據

df1 <- structure(list(Class = c(1L, 2L, 1L, 2L, 3L), Students = c("Mark", 
"John", "Tom", "Jane", "Kim"), Gender = c("M", "M", "M", "F", 
"F"), Height = c(180L, 234L, 124L, 180L, 140L), Year_1999 = c(80L, 
0L, 0L, 80L, 0L), Year_2000 = c(54L, 59L, 53L, 54L, 2L), Year_2001 = c(22L, 
32L, 26L, 22L, 3L), 
Year_2002 = c(12L, 62L, 12L, 0L, 32L)), class = "data.frame", 
  row.names = c(NA, 
-5L))

@akruncolSums解決方案類似,使用by

do.call(rbind, by(df[5:8] > 0, df[1], colSums))
#   Year_1999 Year_2000 Year_2001 Year_2002
# 1         1         2         2         2
# 2         1         2         2         1
# 3         0         1         1         1

要么

Reduce(rbind, by(df[5:8] > 0, df[1], colSums))
#      Year_1999 Year_2000 Year_2001 Year_2002
# init         1         2         2         2
#              1         2         2         1
#              0         1         1         1

do.call更快。

使用dplyr ,我們可以使用summarise_at

library(dplyr)

df %>%
  group_by(Class) %>%
  summarise_at(vars(starts_with("Year")), ~sum(. != 0))

#  Class Year_1999 Year_2000 Year_2001 Year_2002
#  <int>     <int>     <int>     <int>     <int>
#1     1         1         2         2         2
#2     2         1         2         2         1
#3     3         0         1         1         1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM