简体   繁体   中英

How to aggregate R dataframe of one column based on values of another

My dataframe is as follows. (similar, there are actually many more rows and columns)

      Gender Energetic   Weekly_Apple   Weekly_Banana
1   Female        3           No           Yes
2   Female        3           No           Yes
3   Male          5           No           Yes
4   Male          2           No            No
5   Female        1           No            No

I want brief code that outputs the following, based on aggregating the "Yes" responses :

        Male        Female
Apples    0           0                
Bananas   1           2

The number of apples that each gender eats =0. 1 male & 2 females eats apple.

I have tried the following:

count(original_data, c("Gender","Weekly_Apple"))
count(original_data, c("Gender","Weekly_Banana"))
count(original_data, c("Gender","Weekly_Grape"))
count(original_data, c("Gender","Weekly_PineApple"))

aggregate(x = original_data[c("Weekly_Apple", 
                          "Weekly_Banana",
                          "Weekly_Grape")],
                   by = original_data[c("Gender")],
                   FUN = n())

As suggested by NelsonGon, I have replaced df1 <- t(df1) with a tidyr::crossing(df1) .

library(dplyr)    
df<-data.frame(
  Gender=c("Female", "Female", "Male", "Male", "Female"), 
  Energetic =c(3,3,5,2,1), 
  Weekly_Apple = c("No", "No", "No", "No", "No"), 
  Weekly_Banana = c("Yes", "Yes", "Yes", "No", "No"))

df1 <- df %>% 
  group_by(Gender) %>% 
  summarise(
    Apples = sum(Weekly_Apple=="Yes"), 
    Bananas = sum(Weekly_Banana =="Yes")
  )

df1 <- tidyr::crossing(df1)

One data.table possibility could be:

dcast(variable ~ Gender, 
      value.var = "value", 
      fun = function(x) sum(x == "Yes"), 
      data = melt(df[-2], id.vars = "Gender"))

       variable Female Male
1  Weekly_Apple      0    0
2 Weekly_Banana      2    1

you could use base R:

table(reshape(cbind(df,id=1:nrow(df)),3:4,idvar = "id",dir="long",sep="_")[-(2:3)])[,,'Yes']
        time
Gender   Apple Banana
  Female     0      2
  Male       0      1

or even

xtabs(Weekly~time+Gender,transform(reshape(cbind(df,id=1:nrow(df)),3:4,idvar = "id",dir="long",sep="_"),Weekly=Weekly=="Yes"))

        Gender
time     Female Male
  Apple       0    0
  Banana      2    1

A dplyr-tidyr alternative:

    df %>% 
  group_by(Gender) %>% 
   summarise_at(vars(contains("Weekly")), function(x) sum(x=="Yes")) %>% 
   tidyr::gather(key, val , -Gender) %>% 
   tidyr::spread(Gender, val)
# A tibble: 2 x 3
  key           Female  Male
  <chr>          <int> <int>
1 Weekly_Apple       0     0
2 Weekly_Banana      2     1

Data:

df <-  structure(list(Gender = structure(c(1L, 1L, 2L, 2L, 1L), .Label = c("Female", 
    "Male"), class = "factor"), Energetic = c(3, 3, 5, 2, 1), Weekly_Apple = structure(c(1L, 
    1L, 1L, 1L, 1L), .Label = "No", class = "factor"), Weekly_Banana = structure(c(2L, 
    2L, 2L, 1L, 1L), .Label = c("No", "Yes"), class = "factor")), class = "data.frame", row.names = c(NA, 
    -5L))

Another base R version with tapply

t(sapply(names(df)[3:4], function(nm) with(df, tapply(df[[nm]]=="Yes", Gender,sum))))
#               Female Male
#Weekly_Apple       0    0
#Weekly_Banana      2    1

Or with split

sapply(split(df[3:4], df$Gender), function(x) colSums(x == "Yes"))

or a variation of it

sapply(split(as.data.frame(df[3:4] == "Yes"), df$Gender), colSums)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM