简体   繁体   English

使用 R 和 dplyr 的相关控制组对 dataframe 中的所有数字列进行规范化

[英]Normalise all numeric columns in a dataframe by relevant control group with R and dplyr

I have a large data set with multiple observations of different conditions:我有一个大型数据集,其中包含对不同条件的多次观察:

Condition = c('A', 'A', 'B', 'B', 'C', 'C', 'D', 'D')
Control = c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B')
Value_1 = 1:4
Value_2 = 2 * 1:4
Value_3 = 3 * 1:4

t = data.frame(Condition, Control, Value_1, Value_2, Value_3)

  Condition Control Value_1 Value_2 Value_3
1         A       A       1       2       3
2         A       A       2       4       6
3         B       A       3       6       9
4         B       A       4       8      12
5         C       B       1       2       3
6         C       B       2       4       6
7         D       B       3       6       9
8         D       B       4       8      12

I want to divide each of the value columns by the mean of the values of their specified control group.我想将每个值列除以其指定控制组的值的平均值。 The desired output is:所需的 output 是:

  Condition Control Value_1 Value_2 Value_3
  <chr>     <chr>     <dbl>   <dbl>   <dbl>
1 A         A         0.667   0.667   0.667
2 A         A         1.33    1.33    1.33 
3 B         A         0.857   0.857   0.857
4 B         A         1.14    1.14    1.14 
5 C         C         0.667   0.667   0.667
6 C         C         1.33    1.33    1.33 
7 D         C         0.857   0.857   0.857
8 D         C         1.14    1.14    1.14 

If I only had one control group and specified the columns then I would use:如果我只有一个对照组并指定了列,那么我将使用:

t %>% group_by(Control) %>%
   mutate(Value_1 = Value_1/Value_1[Condition == 'A'])

However here this will not work for several reasons: I have multiple values for each control group that need to be meaned first.然而,这里这不起作用有几个原因:我为每个控制组有多个值,需要首先表示。 I also have multiple controls, and need to specify that the relevant control for each row is the one specified for that row (not just A in every case).我也有多个控件,并且需要指定每一行的相关控件是为该行指定的控件(而不仅仅是 A 在每种情况下)。 I also want to apply the normalisation to every numerical column.我还想将规范化应用于每个数字列。 I know that mutate_if(is.numeric, .fun) can be used to select numeric columns, but I do not know how it would be possible to write a generic function to perform the normalisation to a control group.我知道mutate_if(is.numeric, .fun)可用于 select 数字列,但我不知道如何编写通用 function 来执行对对照组的标准化。

With this small dataset it would be easiest to just split it into each control group, and to specify the mutations by naming each colony manually.有了这个小数据集,最简单的方法就是将其分成每个对照组,并通过手动命名每个菌落来指定突变。 However I am looking for a solution that can handle larger datasets with arbitrary numbers of variables and control groups.但是,我正在寻找一种可以处理具有任意数量的变量和控制组的更大数据集的解决方案。

You can make a table of the means for each condition, then left join that to your data.您可以为每个条件制作一个均值表,然后将其加入您的数据。 Now that you have the means and the values in the same table, you just need to transmute to do the division.既然您在同一张表中有了均值和值,您只需要transmute即可进行除法。

cond_means <- 
  t %>% 
    group_by(Condition) %>% 
    summarise_if(is.numeric, mean)

t %>% 
  left_join(cond_means, by = c(Control = 'Condition')) %>% 
  transmute(Condition, 
            Control,
            Value_1 = Value_1.x/Value_1.y,
            Value_2 = Value_2.x/Value_2.y,
            Value_3 = Value_3.x/Value_3.y)

#   Condition Control   Value_1   Value_2   Value_3
# 1         A       A 0.6666667 0.6666667 0.6666667
# 2         A       A 1.3333333 1.3333333 1.3333333
# 3         B       A 2.0000000 2.0000000 2.0000000
# 4         B       A 2.6666667 2.6666667 2.6666667
# 5         C       B 0.2857143 0.2857143 0.2857143
# 6         C       B 0.5714286 0.5714286 0.5714286
# 7         D       B 0.8571429 0.8571429 0.8571429
# 8         D       B 1.1428571 1.1428571 1.1428571

One change I made was to use stringsAsFactors = FALSE when creating the table, because factors are a pain to work with.我所做的一项更改是在创建表时使用stringsAsFactors = FALSE ,因为使用因素很痛苦。

t = data.frame(Condition, Control, Value_1, Value_2, Value_3, 
               stringsAsFactors = FALSE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM