简体   繁体   English


[英]How to use dplyr to mutate columns on conditions from two dataframes

I have two dataframes, which are based on a third, larger dataset. 我有两个数据框,它们基于第三个更大的数据集。 I want to normalize the data in one dataframe according to the entries in the second dataframe - My favorite would be to use dplyr, but other packages/solutions are very appreciated, too :) 我想根据第二个数据帧中的条目对一个数据帧中的数据进行规范化-我最喜欢的是使用dplyr,但其他包/解决方案也非常受欢迎:)

In my first dataframe, I have the counts of different organs. 在我的第一个数据帧中,我有不同器官的计数。

Dataframe organ_count 数据框Organ_count

 # A tibble: 5 x 2
                         organs count
                        <fctr> <int>
1                       Organ_A    23
2                       Organ_B    29
3                       Organ_C    24
4                       Organ_D    145
5                       Organ_E    97

In my second dataframe, I have the count of the same organs, but splitted upon in which state they appear in the large dataset I used as a source. 在第二个数据帧中,我拥有相同器官的数量,但是根据它们在哪个状态中出现的情况进行了划分,这些状态出现在我用作来源的大型数据集中。

Dataframe organ_state_count 数据框organ_state_count

# A tibble: 15 x 3
              organs hmm_state count
             <fctr>     <chr> <int>
 1       Organ_A         E1     12
 2       Organ_A         E2     2
 3       Organ_A         E3     9
 4       Organ_B         E1     13
 5       Organ_B         E2     10
 6       Organ_B         E3     6
 7       Organ_C         E1     7
 8       Organ_C         E2     7
 9       Organ_C         E3     10
10       Organ_D         E1     72
11       Organ_D         E2     23
12       Organ_D         E3     50
13       Organ_E         E1     90
14       Organ_E         E2     2
15       Organ_E         E3     5

What I want to do now is: 我现在想做的是:

I want to divide organ_state_count$count by the total number of entries for this organ (given in organ_state), resulting in the percentage of this organ for the given state. 我想将organ_state_count $ count除以该器官的总数(在organ_state中给出),得出该器官在给定状态下的百分比。

I already tried something like this: 我已经尝试过这样的事情:

organ_state_count %>% 
    rowwise() %>% 
    do(organ_total = filter(organ_count,organs == .$organs)) %>%

But it throws this error message: 但这会引发以下错误消息:

Error in mutate_impl(.data, dots) : 
Evaluation error: arguments imply differing number of rows: 1, 0.
In addition: Warning messages:
1: Unknown or uninitialised column: 'count'. 
2: In Ops.factor(left, right) : ‘/’ not meaningful for factors

I must admit I'm fairly new to R and to the whole dplyr/tidyverse thing as well, so I'm a bit overwhelmed. 我必须承认我对R以及整个dplyr / tidyverse都还很陌生,所以我有点不知所措。

I also think that there is some kind of possibility of just using the organ_state_count frame for this task, and solve everything in just one dataframe, but I'm not sure how. 我还认为,仅使用Organ_state_count框架即可完成此任务,并仅在一个数据帧中解决所有问题,但我不确定如何实现。

Thanks for your answers and help! 感谢您的回答和帮助!

you can try something like: 您可以尝试类似:

df %>%
  group_by(organs) %>%
  mutate(tot = sum(count)) %>%
  ungroup() %>%
  mutate(pct = count/tot)

There's no need to use the first dataframe, as you have that information in the second dataframe already. 无需使用第一个数据框,因为您已经在第二个数据框中拥有该信息。 Just select the columns you want to use for the final output. 只需选择要用于最终输出的列即可。

data: 数据:

df <- read.table( text = "id organs hmm_state count
1 Organ_A E1 12
2 Organ_A E2 2
3 Organ_A E3 9
4 Organ_B E1 13
5 Organ_B E2 10
6 Organ_B E3 6
7 Organ_C E1 7
8 Organ_C E2 7
9 Organ_C E3 10
10 Organ_D E1 72
11 Organ_D E2 23
12 Organ_D E3 50
13 Organ_E E1 90
14 Organ_E E2 2
15 Organ_E E3 5", sep =" ", header = TRUE) 

output: 输出:

      id  organs hmm_state count   tot        pct
   <int>  <fctr>    <fctr> <int> <int>      <dbl>
1      1 Organ_A        E1    12    23 0.52173913
2      2 Organ_A        E2     2    23 0.08695652
3      3 Organ_A        E3     9    23 0.39130435
4      4 Organ_B        E1    13    29 0.44827586
5      5 Organ_B        E2    10    29 0.34482759
6      6 Organ_B        E3     6    29 0.20689655
7      7 Organ_C        E1     7    24 0.29166667
8      8 Organ_C        E2     7    24 0.29166667
9      9 Organ_C        E3    10    24 0.41666667
10    10 Organ_D        E1    72   145 0.49655172
11    11 Organ_D        E2    23   145 0.15862069
12    12 Organ_D        E3    50   145 0.34482759
13    13 Organ_E        E1    90    97 0.92783505
14    14 Organ_E        E2     2    97 0.02061856
15    15 Organ_E        E3     5    97 0.05154639

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM