简体   繁体   中英

R: adding rows in a data frame depending on another variable

I'm trying to do a kind of conditional rowSums .

I have a data frame with four columns containing 1's and 0's, and another variables that indicates which columns should be added to make the row totals.

For example:

df <- matrix(rbinom(40, 1, 0.5), ncol = 4)
df <- as.data.frame.matrix(df)
df$group <- sample(c('12', '123', '1234'), 10, replace = T)

If the group is 12 , then columns V1:V2 should be added, if 123 then V1:V3, and if 1234 then columns V1:V4.

I've tried a labour-intensive approach:

df$total12 <- rowSums(df[,c('V1', 'V2')])
df$total123 <- rowSums(df[,c('V1', 'V2', 'V3')])
df$total1234 <- rowSums(df[,c('V1', 'V2', 'V3', 'V4')])
df$total <- ifelse(df$group == '12', df$total12,
                   ifelse(df$group == '123', df$total123, df$total1234))

Is there a simpler way to do this?

Here is an option. We create a row/column index by splitting the 'group', extract the values of 'df' based on the index and get the sum grouped by the row index

lst <- strsplit(df$group, "")
i1 <- cbind(rep(seq_len(nrow(df)), lengths(lst)), as.integer(unlist(lst)))
df$total <- ave(df[-5][i1], i1[,1], FUN = sum)

Here is another option using the switch function. This is more readable and easier to extend then a series of nested ifelse statements.

df$total<-sapply(1:length(df$group), function(i){switch(df$group[i], 
            "12"=rowSums(df[i, c('V1', 'V2')]),
            "123"=rowSums(df[i, c('V1', 'V2', 'V3')]),
            "1234"=rowSums(df[i, c('V1', 'V2', 'V3', 'V4')]))})

Basically, loops through the elements of df$group and selects the proper formula to use. If your dataset isn't too long, performance should be acceptable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM