简体   繁体   中英

R tibble: aggregating by row across specific columns, by column groups

I have data of biological compounds levels of test patients, who are grouped into different groups depending on being administered certain drugs. That is, we have:

  • Columns: Drugs(or groups) A, B and C, where each group has 3 patients (individually denoted where the patients in A are denoted A1, A2, A3; patients in B are denoted B1, B2, B3, and so on.)
  • Rows: we are monitoring biological compounds Coronin , Dystrophin , Tubulin (randomly Googled protein names), and so on.

So we have a tibble like (all values in the tibble are floats):

| compound  | A1 | A2 | A3 | B1 ... C3|
|-----------|----|----|----|---- ... --|
| Coronin   |
| Dystrophin|
| Gloverin  |
| keratin   |
| Tubulin   |

For each compound, I wish to compute the means of each group, as a new column, like so:

| compound  | A1 | A2 | A3 | B1 ...C3| mean_A | mean_B | mean_C |
|-----------|-----|-----|-----|---- ... --|---------|---------|---------|
| Coronin   |  1  |  2  |  3  |     ...   |    2    |  ...              |
| Dystrophin|  4  |  5  |  6  |     ...   |    5    |  ...              |
| Gloverin  |  ...
| keratin   |
| Tubulin   |

The code to do this is:

my_tibble <- my_tibble %>% 
  mutate(mean_A = rowMeans(select(., c("A1", "A2", "A3")))) %>%
  mutate(mean_B = rowMeans(select(., c("B1", "B2", "B3")))) %>%
  mutate(mean_C = rowMeans(select(., c("C1", "C2", "C3"))))

The question is: I'd like to be able to this for a dynamically input number of groups, ie C, D, E, etc ...where column-to-group is a separate, user-input tibble in itself, say:

| group_name | name1 | name2 | name3 |
|------------|-------|-------|-------|
|      A     |  A1   |  B2   |  C3   |
|      B     |  B1   |  B2   |  C3   |
...
and so on

How might I iteratively add mutate verbs, according to a user-specified number of groups (and associated sample-to-group names)?

Note: the group names "C", "B" ...etc are arbitrary (the groups are, for instance, likely to be assigned the name of the drug that that group was given), so I wouldn't use an iterative operation that relies on the fact that they are literally named "A", "B", etc.

An option would be to split by the column names, loop through the list with sapply , get the rowMeans and assign it to 3 new columns

nm1 <- substr(names(df1)[-1], 1, nchar(names(df1)[-1])-1)  
df1[paste0("mean_", toupper(unique(nm1)))] <- 
            sapply(split.default(df1[-1], nm1), rowMeans)

df1
#  compound g11 g12 g13 g21 g22 g23 g31 g32 g33  mean_G1  mean_G2  mean_G3
#1        A   7   3   9   8   8   1   3   7   2 6.333333 5.666667 4.000000
#2        B   3   8   8   1   2   5   1   1   4 6.333333 2.666667 2.000000
#3        C   8   6   7   5   1   4   3   6   3 7.000000 3.333333 4.000000
#4        D   7   9   8   5   5   6   8   7   6 8.000000 5.333333 7.000000
#5        E   2   4   1   5   2   6   6   1   3 2.333333 4.333333 3.333333

NOTE: This can be extended to any number of groups. Only thing to change is the 1:3 in the current example for creating the column names

data

set.seed(24)
df1 <- cbind(compound = LETTERS[1:5], as.data.frame(matrix(sample(1:9, 5 * 9,
      replace = TRUE), nrow = 5, ncol = 9, dimnames = list(NULL,
        paste0(rep(paste0("g", 1:3), each = 3), 1:3)))))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM