Calculate mean of every 2nd column in a dataframe

Question

I would like to calculate the row-wise means of every second column in my dataset, meaning: Average of columns A and B, C and D, E and F. My data look like this:

|A|B|C|D|E|F|
|-|-|-|-|-|-|
|0|1|1|1|0|1|
|0|0|1|1|0|0|
|1|1|0|0|0|1|
|0|1|1|1|1|1|
|1|1|1|1|0|1|

As a condition I want to include that both values should be greater 0 to compute the mean:

data$meanAB <-if_else(A > 0 & B > 0, rowMeans(data[,1:2]), 0)

I manage to do this for two columns, but I would like a solution with new columns added to my dataframe with the rowwise means of every 2 columns. I want to end up with a table like this:

|A|B|C|D|E|F|meanAB|meanCD|meanEF|
|-|-|-|-|-|-|-|-|-|
|0|1|1|1|0|1|0|1|0|
|0|0|1|1|0|0|0|1|0|
|1|1|0|0|0|1|1|0|0|
|0|1|1|1|1|1|0|1|1|
|1|1|1|1|0|1|1|1|0|
|0|1|1|1|0|1|0|1|0|
|0|0|1|1|0|0|0|1|0|
|1|1|0|0|0|1|1|0|0|
|0|1|1|1|1|1|0|1|1|
|1|1|1|1|0|1|1|1|0|

Thanks in advance!

Answer 1

Base R option using split.default -

cbind(df, sapply(split.default(df, ceiling(seq_along(df)/2)), function(x) {
  ifelse(x[1] > 0 & x[2] > 0, rowMeans(x), 0)
}))

#  A B C D E F 1 2 3
#1 0 1 1 1 0 1 0 1 0
#2 0 0 1 1 0 0 0 1 0
#3 1 1 0 0 0 1 1 0 0
#4 0 1 1 1 1 1 0 1 1
#5 1 1 1 1 0 1 1 1 0

where column 1 is mean of A & B, column 2 is mean of C & D and so on.

Answer 2

Here is a way. It uses a cumsum trick to get groups of columns, two by two. Then it loops through the split data and computes the row means. Finally, it combines the output with the original input data.

cs <- cumsum(seq_len(ncol(data)) %% 2)
res <- lapply(split(as.list(data), cs), \(x){
  rowMeans(as.data.frame(x))
})
res <- do.call(cbind, res)
colnames(res) <- paste0("mean", tapply(names(data), cs, paste, collapse = ""))
cbind(data, res)
#  A B C D E F meanAB meanCD meanEF
#1 0 1 1 1 0 1    0.5      1    0.5
#2 0 0 1 1 0 0    0.0      1    0.0
#3 1 1 0 0 0 1    1.0      0    0.5
#4 0 1 1 1 1 1    0.5      1    1.0
#5 1 1 1 1 0 1    1.0      1    0.5

Data in `dput` format

data <-
structure(list(A = c(0L, 0L, 1L, 0L, 1L), B = c(1L, 0L, 1L, 1L, 
1L), C = c(1L, 1L, 0L, 1L, 1L), D = c(1L, 1L, 0L, 1L, 1L), E = c(0L, 
0L, 0L, 1L, 0L), F = c(1L, 0L, 1L, 1L, 1L)), row.names = c(NA, 
-5L), class = "data.frame")

Answer 3

A tidyverse solution would be. To me this is very short and neat.

library(dplyr)
#> 
#> Attache Paket: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

data <-
  structure(
    list(
      A = c(0L, 0L, 1L, 0L, 1L),
      B = c(1L, 0L, 1L, 1L, 1L),
      C = c(1L, 1L, 0L, 1L, 1L),
      D = c(1L, 1L, 0L, 1L, 1L),
      E = c(0L, 0L, 0L, 1L, 0L),
      F = c(1L, 0L, 1L, 1L, 1L)
    ),
    row.names = c(NA, -5L),
    class = "data.frame"
  )

data %>%
  rowwise() %>%
  mutate(meanAB = mean(c(A, B)),
         meanCD = mean(c(C, D)),
         meanEF = mean(c(E, F)))
#> # A tibble: 5 x 9
#> # Rowwise: 
#>       A     B     C     D     E     F meanAB meanCD meanEF
#>   <int> <int> <int> <int> <int> <int>  <dbl>  <dbl>  <dbl>
#> 1     0     1     1     1     0     1    0.5      1    0.5
#> 2     0     0     1     1     0     0    0        1    0  
#> 3     1     1     0     0     0     1    1        0    0.5
#> 4     0     1     1     1     1     1    0.5      1    1  
#> 5     1     1     1     1     0     1    1        1    0.5

^{Created on 2021-10-27 by the reprex package (v2.0.1)}

Answer 4

We can use

data[paste0('mean', 1:3)] <-  sapply(split.default(df, as.integer(gl(ncol(df), 
      2, ncol(df)))), function(x) {
     i1 <- rowSums(x > 0) == 2
   replace(rowMeans(x), !i1, 0)})

Calculate mean of every 2nd column in a dataframe

Question

4 answers

solution1
1 2021-10-27 14:07:24

solution2
0 ACCPTED 2021-10-27 14:06:52

Data in `dput` format

solution3
0 2021-10-27 15:37:55

solution4
0 2021-10-27 16:37:49

Calculate mean of every 2nd column in a dataframe

Question

4 answers

solution1 1 2021-10-27 14:07:24

solution2 0 ACCPTED 2021-10-27 14:06:52

Data in dput format

solution3 0 2021-10-27 15:37:55

solution4 0 2021-10-27 16:37:49

solution1
1 2021-10-27 14:07:24

solution2
0 ACCPTED 2021-10-27 14:06:52

Data in `dput` format

solution3
0 2021-10-27 15:37:55

solution4
0 2021-10-27 16:37:49