简体   繁体   中英

Calculate mean of every 2nd column in a dataframe

I would like to calculate the row-wise means of every second column in my dataset, meaning: Average of columns A and B, C and D, E and F. My data look like this:

|A|B|C|D|E|F|
|-|-|-|-|-|-|
|0|1|1|1|0|1|
|0|0|1|1|0|0|
|1|1|0|0|0|1|
|0|1|1|1|1|1|
|1|1|1|1|0|1|

As a condition I want to include that both values should be greater 0 to compute the mean:

data$meanAB <-if_else(A > 0 & B > 0, rowMeans(data[,1:2]), 0)

I manage to do this for two columns, but I would like a solution with new columns added to my dataframe with the rowwise means of every 2 columns. I want to end up with a table like this:

|A|B|C|D|E|F|meanAB|meanCD|meanEF|
|-|-|-|-|-|-|-|-|-|
|0|1|1|1|0|1|0|1|0|
|0|0|1|1|0|0|0|1|0|
|1|1|0|0|0|1|1|0|0|
|0|1|1|1|1|1|0|1|1|
|1|1|1|1|0|1|1|1|0|
|0|1|1|1|0|1|0|1|0|
|0|0|1|1|0|0|0|1|0|
|1|1|0|0|0|1|1|0|0|
|0|1|1|1|1|1|0|1|1|
|1|1|1|1|0|1|1|1|0|

Thanks in advance!

Base R option using split.default -

cbind(df, sapply(split.default(df, ceiling(seq_along(df)/2)), function(x) {
  ifelse(x[1] > 0 & x[2] > 0, rowMeans(x), 0)
}))

#  A B C D E F 1 2 3
#1 0 1 1 1 0 1 0 1 0
#2 0 0 1 1 0 0 0 1 0
#3 1 1 0 0 0 1 1 0 0
#4 0 1 1 1 1 1 0 1 1
#5 1 1 1 1 0 1 1 1 0

where column 1 is mean of A & B, column 2 is mean of C & D and so on.

Here is a way. It uses a cumsum trick to get groups of columns, two by two. Then it loops through the split data and computes the row means. Finally, it combines the output with the original input data.

cs <- cumsum(seq_len(ncol(data)) %% 2)
res <- lapply(split(as.list(data), cs), \(x){
  rowMeans(as.data.frame(x))
})
res <- do.call(cbind, res)
colnames(res) <- paste0("mean", tapply(names(data), cs, paste, collapse = ""))
cbind(data, res)
#  A B C D E F meanAB meanCD meanEF
#1 0 1 1 1 0 1    0.5      1    0.5
#2 0 0 1 1 0 0    0.0      1    0.0
#3 1 1 0 0 0 1    1.0      0    0.5
#4 0 1 1 1 1 1    0.5      1    1.0
#5 1 1 1 1 0 1    1.0      1    0.5

Data in dput format

data <-
structure(list(A = c(0L, 0L, 1L, 0L, 1L), B = c(1L, 0L, 1L, 1L, 
1L), C = c(1L, 1L, 0L, 1L, 1L), D = c(1L, 1L, 0L, 1L, 1L), E = c(0L, 
0L, 0L, 1L, 0L), F = c(1L, 0L, 1L, 1L, 1L)), row.names = c(NA, 
-5L), class = "data.frame")

A tidyverse solution would be. To me this is very short and neat.

library(dplyr)
#> 
#> Attache Paket: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

data <-
  structure(
    list(
      A = c(0L, 0L, 1L, 0L, 1L),
      B = c(1L, 0L, 1L, 1L, 1L),
      C = c(1L, 1L, 0L, 1L, 1L),
      D = c(1L, 1L, 0L, 1L, 1L),
      E = c(0L, 0L, 0L, 1L, 0L),
      F = c(1L, 0L, 1L, 1L, 1L)
    ),
    row.names = c(NA, -5L),
    class = "data.frame"
  )

data %>%
  rowwise() %>%
  mutate(meanAB = mean(c(A, B)),
         meanCD = mean(c(C, D)),
         meanEF = mean(c(E, F)))
#> # A tibble: 5 x 9
#> # Rowwise: 
#>       A     B     C     D     E     F meanAB meanCD meanEF
#>   <int> <int> <int> <int> <int> <int>  <dbl>  <dbl>  <dbl>
#> 1     0     1     1     1     0     1    0.5      1    0.5
#> 2     0     0     1     1     0     0    0        1    0  
#> 3     1     1     0     0     0     1    1        0    0.5
#> 4     0     1     1     1     1     1    0.5      1    1  
#> 5     1     1     1     1     0     1    1        1    0.5

Created on 2021-10-27 by the reprex package (v2.0.1)

We can use

data[paste0('mean', 1:3)] <-  sapply(split.default(df, as.integer(gl(ncol(df), 
      2, ncol(df)))), function(x) {
     i1 <- rowSums(x > 0) == 2
   replace(rowMeans(x), !i1, 0)})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM