简体   繁体   English

用多行计算多个百分比,分组,然后遍历 R 中的列

[英]Calculating multiple percents with multiple rows, grouping, then iterating over columns in R

Longtime lurker, first time writer.长期潜伏者,第一次写作。

Using dataframe A, I am trying to calculate 4 percentages using multiple rows, grouped by a column.使用 dataframe A,我尝试使用按列分组的多行计算 4 个百分比。 I then hope to iterate those same calculations over other columns, saving the outputs into dataframe B.然后我希望在其他列上迭代这些相同的计算,将输出保存到 dataframe B 中。

Dataframe A (output by another program) looks like this: Dataframe A(由另一个程序输出)如下所示:

sample_number <- c("1","1","1","1","1","2","2","2","2","2","3","3","3","3","3")
condition <- c("A","B","C","D","E","A","B","C","D","E","A","B","C","D","E")
celltype_1 <- c(1220,800,700,300,200,1000,900,500,100,100,1700,600,800,300,200)
celltype_2 <- c(950,850,450,50,50,1650,550,750,250,150,1150,750,650,250,150)
dat_a<-data.frame(sample_number,condition, celltype_1, celltype_2)

dat_a

   sample_number condition celltype_1 celltype_2
1              1         A       1220        950
2              1         B        800        850
3              1         C        700        450
4              1         D        300         50
5              1         E        200         50
6              2         A       1000       1650
7              2         B        900        550
8              2         C        500        750
9              2         D        100        250
10             2         E        100        150
11             3         A       1700       1150
12             3         B        600        750
13             3         C        800        650
14             3         D        300        250
15             3         E        200        150

I hope to calculate the following percentages using the values in columns celltype_1 & _2 that correspond with these letters in the condition column:我希望使用与条件列中的这些字母相对应的 celltype_1 和 _2 列中的值来计算以下百分比:

per_w = 100*((A - B)/(A-D))
per_x = 100 - per_w
per_y = 100*((A - C)/(A-D))
per_z = 100 - per_y

and output the results into dataframe B:和 output 将结果转换为 dataframe B:

sample_number <- c("1","1","1","1","1","2","2","2","2","2","3","3","3","3","3")
condition <- c("A","B","C","D","E","A","B","C","D","E","A","B","C","D","E")
celltype_1 <- c(1220,800,700,300,200,1000,900,500,100,100,1700,600,800,300,200)
celltype_2 <- c(950,850,450,50,50,1650,550,750,250,150,1150,750,650,250,150)
dat_a<-data.frame(sample_number,condition, celltype_1, celltype_2)
colnames(cell_matrix) <- c("sample_number","condition","celltype_1","celltype_2")

dat_b

  sample_number celltype per_w per_x per_y per_z
1             1        1    35    65    25    75
2             2        2    20    80    60    40
3             3        1    70    30    40    60
4             1        2    45    55    75    15
5             2        1    15    85     5    95
6             3        2    90    10    30    70

I have started different combinations of loops, group by() , and sapply() , but here is the most successful code thus far which calculates results for cell_type 1 (albeit without a perfectly formatted dataframe B), but doesn't yet have the flexibility of being applied across columns.我已经开始了循环、 group by()sapply()的不同组合,但这是迄今为止最成功的代码,它计算 cell_type 1 的结果(尽管没有完美格式化的 dataframe B),但还没有跨列应用的灵活性。

dat_test = dat_a %>% 
  select(c(1,2,3)) %>% 
  group_by(sample_number) %>% 
  spread("condition",3)  %>% 
  mutate(per_w = 100*((A - B)/(A-D))) %>% 
  mutate(per_x = 100 - per_w) %>% 
  mutate(per_y = 100*((A - C)/(A-D))) %>%
  mutate(per_z = 100 - per_y) 

dat_test

  sample_number     A     B     C     D     E per_w per_x per_y per_z
  <chr>         <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1              1220   800   700   300   200  45.7  54.3  56.5  43.5
2 2              1000   900   500   100   100  11.1  88.9  55.6  44.4
3 3              1700   600   800   300   200  78.6  21.4  64.3  35.7

I have seen parts of my question in other stack questions, but have not determined how to put all the pieces together.我在其他堆栈问题中看到了我的部分问题,但还没有确定如何将所有部分放在一起。 I would appreciate any help you can provide.我将不胜感激您能提供的任何帮助。 Thank you!谢谢!

If you want to perform calculation on both cell type, you'll need to separate them into different rows (ie the first pivot_longer ).如果要对两种单元格类型执行计算,则需要将它们分成不同的行(即第一个pivot_longer )。

library(tidyverse)

dat_a %>% 
  pivot_longer(starts_with("celltype"), names_to = "celltype", names_pattern = "celltype_(\\d)") %>% 
  pivot_wider(names_from = condition, values_from = value) %>% 
  group_by(celltype, sample_number) %>% 
  mutate(per_w = 100*((A - B)/(A-D)), 
         per_x = 100 - per_w,
         per_y = 100*((A - C)/(A-D)),
         per_z = 100 - per_y) %>% 
  select(-(A:E)) %>% 
  ungroup()

# A tibble: 6 × 6
  sample_number celltype per_w per_x per_y per_z
  <chr>         <chr>    <dbl> <dbl> <dbl> <dbl>
1 1             1         45.7  54.3  56.5  43.5
2 1             2         11.1  88.9  55.6  44.4
3 2             1         11.1  88.9  55.6  44.4
4 2             2         78.6  21.4  64.3  35.7
5 3             1         78.6  21.4  64.3  35.7
6 3             2         44.4  55.6  55.6  44.4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM