R tibble: aggregating by row across specific columns, by column groups

Question

I have data of biological compounds levels of test patients, who are grouped into different groups depending on being administered certain drugs. That is, we have:

Columns: Drugs(or groups) A, B and C, where each group has 3 patients (individually denoted where the patients in A are denoted A1, A2, A3; patients in B are denoted B1, B2, B3, and so on.)
Rows: we are monitoring biological compounds Coronin , Dystrophin , Tubulin (randomly Googled protein names), and so on.

So we have a tibble like (all values in the tibble are floats):

| compound  | A1 | A2 | A3 | B1 ... C3|
|-----------|----|----|----|---- ... --|
| Coronin   |
| Dystrophin|
| Gloverin  |
| keratin   |
| Tubulin   |

For each compound, I wish to compute the means of each group, as a new column, like so:

| compound  | A1 | A2 | A3 | B1 ...C3| mean_A | mean_B | mean_C |
|-----------|-----|-----|-----|---- ... --|---------|---------|---------|
| Coronin   |  1  |  2  |  3  |     ...   |    2    |  ...              |
| Dystrophin|  4  |  5  |  6  |     ...   |    5    |  ...              |
| Gloverin  |  ...
| keratin   |
| Tubulin   |

The code to do this is:

my_tibble <- my_tibble %>% 
  mutate(mean_A = rowMeans(select(., c("A1", "A2", "A3")))) %>%
  mutate(mean_B = rowMeans(select(., c("B1", "B2", "B3")))) %>%
  mutate(mean_C = rowMeans(select(., c("C1", "C2", "C3"))))

The question is: I'd like to be able to this for a dynamically input number of groups, ie C, D, E, etc ...where column-to-group is a separate, user-input tibble in itself, say:

| group_name | name1 | name2 | name3 |
|------------|-------|-------|-------|
|      A     |  A1   |  B2   |  C3   |
|      B     |  B1   |  B2   |  C3   |
...
and so on

How might I iteratively add mutate verbs, according to a user-specified number of groups (and associated sample-to-group names)?

Note: the group names "C", "B" ...etc are arbitrary (the groups are, for instance, likely to be assigned the name of the drug that that group was given), so I wouldn't use an iterative operation that relies on the fact that they are literally named "A", "B", etc.

Answer 1

An option would be to split by the column names, loop through the list with sapply , get the rowMeans and assign it to 3 new columns

nm1 <- substr(names(df1)[-1], 1, nchar(names(df1)[-1])-1)  
df1[paste0("mean_", toupper(unique(nm1)))] <- 
            sapply(split.default(df1[-1], nm1), rowMeans)

df1
#  compound g11 g12 g13 g21 g22 g23 g31 g32 g33  mean_G1  mean_G2  mean_G3
#1        A   7   3   9   8   8   1   3   7   2 6.333333 5.666667 4.000000
#2        B   3   8   8   1   2   5   1   1   4 6.333333 2.666667 2.000000
#3        C   8   6   7   5   1   4   3   6   3 7.000000 3.333333 4.000000
#4        D   7   9   8   5   5   6   8   7   6 8.000000 5.333333 7.000000
#5        E   2   4   1   5   2   6   6   1   3 2.333333 4.333333 3.333333

NOTE: This can be extended to any number of groups. Only thing to change is the 1:3 in the current example for creating the column names

data

set.seed(24)
df1 <- cbind(compound = LETTERS[1:5], as.data.frame(matrix(sample(1:9, 5 * 9,
      replace = TRUE), nrow = 5, ncol = 9, dimnames = list(NULL,
        paste0(rep(paste0("g", 1:3), each = 3), 1:3)))))

R tibble: aggregating by row across specific columns, by column groups

Question

1 answers

solution1
0 2019-06-06 04:58:55

data

R tibble: aggregating by row across specific columns, by column groups

Question

1 answers

solution1 0 2019-06-06 04:58:55

data

solution1
0 2019-06-06 04:58:55