繁体   English   中英

如何基于另一列的值聚合一列的R数据帧

[英]How to aggregate R dataframe of one column based on values of another

我的数据框如下。 (类似,实际上还有更多的行和列)

      Gender Energetic   Weekly_Apple   Weekly_Banana
1   Female        3           No           Yes
2   Female        3           No           Yes
3   Male          5           No           Yes
4   Male          2           No            No
5   Female        1           No            No

我想要基于汇总“是”响应的简短代码,输出以下内容:

        Male        Female
Apples    0           0                
Bananas   1           2

每个性别吃的苹果数量= 0。 1个男性和2个女性吃苹果。

我尝试了以下方法:

count(original_data, c("Gender","Weekly_Apple"))
count(original_data, c("Gender","Weekly_Banana"))
count(original_data, c("Gender","Weekly_Grape"))
count(original_data, c("Gender","Weekly_PineApple"))

aggregate(x = original_data[c("Weekly_Apple", 
                          "Weekly_Banana",
                          "Weekly_Grape")],
                   by = original_data[c("Gender")],
                   FUN = n())

如NelsonGon所建议,我已将tf1 df1 <- t(df1)替换为tidyr::crossing(df1)

library(dplyr)    
df<-data.frame(
  Gender=c("Female", "Female", "Male", "Male", "Female"), 
  Energetic =c(3,3,5,2,1), 
  Weekly_Apple = c("No", "No", "No", "No", "No"), 
  Weekly_Banana = c("Yes", "Yes", "Yes", "No", "No"))

df1 <- df %>% 
  group_by(Gender) %>% 
  summarise(
    Apples = sum(Weekly_Apple=="Yes"), 
    Bananas = sum(Weekly_Banana =="Yes")
  )

df1 <- tidyr::crossing(df1)

一种data.table可能性可能是:

dcast(variable ~ Gender, 
      value.var = "value", 
      fun = function(x) sum(x == "Yes"), 
      data = melt(df[-2], id.vars = "Gender"))

       variable Female Male
1  Weekly_Apple      0    0
2 Weekly_Banana      2    1

您可以使用基数R:

table(reshape(cbind(df,id=1:nrow(df)),3:4,idvar = "id",dir="long",sep="_")[-(2:3)])[,,'Yes']
        time
Gender   Apple Banana
  Female     0      2
  Male       0      1

甚至

xtabs(Weekly~time+Gender,transform(reshape(cbind(df,id=1:nrow(df)),3:4,idvar = "id",dir="long",sep="_"),Weekly=Weekly=="Yes"))

        Gender
time     Female Male
  Apple       0    0
  Banana      2    1

dplyr-tidyr替代方案:

    df %>% 
  group_by(Gender) %>% 
   summarise_at(vars(contains("Weekly")), function(x) sum(x=="Yes")) %>% 
   tidyr::gather(key, val , -Gender) %>% 
   tidyr::spread(Gender, val)
# A tibble: 2 x 3
  key           Female  Male
  <chr>          <int> <int>
1 Weekly_Apple       0     0
2 Weekly_Banana      2     1

数据:

df <-  structure(list(Gender = structure(c(1L, 1L, 2L, 2L, 1L), .Label = c("Female", 
    "Male"), class = "factor"), Energetic = c(3, 3, 5, 2, 1), Weekly_Apple = structure(c(1L, 
    1L, 1L, 1L, 1L), .Label = "No", class = "factor"), Weekly_Banana = structure(c(2L, 
    2L, 2L, 1L, 1L), .Label = c("No", "Yes"), class = "factor")), class = "data.frame", row.names = c(NA, 
    -5L))

带有tapply另一个base R版本

t(sapply(names(df)[3:4], function(nm) with(df, tapply(df[[nm]]=="Yes", Gender,sum))))
#               Female Male
#Weekly_Apple       0    0
#Weekly_Banana      2    1

或与split

sapply(split(df[3:4], df$Gender), function(x) colSums(x == "Yes"))

或其变化

sapply(split(as.data.frame(df[3:4] == "Yes"), df$Gender), colSums)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM