用多行计算多个百分比，分组，然后遍历 R 中的列

Question

Longtime lurker, first time writer.长期潜伏者，第一次写作。

Using dataframe A, I am trying to calculate 4 percentages using multiple rows, grouped by a column.使用 dataframe A，我尝试使用按列分组的多行计算 4 个百分比。 I then hope to iterate those same calculations over other columns, saving the outputs into dataframe B.然后我希望在其他列上迭代这些相同的计算，将输出保存到 dataframe B 中。

Dataframe A (output by another program) looks like this: Dataframe A（由另一个程序输出）如下所示：

sample_number <- c("1","1","1","1","1","2","2","2","2","2","3","3","3","3","3")
condition <- c("A","B","C","D","E","A","B","C","D","E","A","B","C","D","E")
celltype_1 <- c(1220,800,700,300,200,1000,900,500,100,100,1700,600,800,300,200)
celltype_2 <- c(950,850,450,50,50,1650,550,750,250,150,1150,750,650,250,150)
dat_a<-data.frame(sample_number,condition, celltype_1, celltype_2)

dat_a

   sample_number condition celltype_1 celltype_2
1              1         A       1220        950
2              1         B        800        850
3              1         C        700        450
4              1         D        300         50
5              1         E        200         50
6              2         A       1000       1650
7              2         B        900        550
8              2         C        500        750
9              2         D        100        250
10             2         E        100        150
11             3         A       1700       1150
12             3         B        600        750
13             3         C        800        650
14             3         D        300        250
15             3         E        200        150

I hope to calculate the following percentages using the values in columns celltype_1 & _2 that correspond with these letters in the condition column:我希望使用与条件列中的这些字母相对应的 celltype_1 和 _2 列中的值来计算以下百分比：

per_w = 100*((A - B)/(A-D))
per_x = 100 - per_w
per_y = 100*((A - C)/(A-D))
per_z = 100 - per_y

and output the results into dataframe B:和 output 将结果转换为 dataframe B：

sample_number <- c("1","1","1","1","1","2","2","2","2","2","3","3","3","3","3")
condition <- c("A","B","C","D","E","A","B","C","D","E","A","B","C","D","E")
celltype_1 <- c(1220,800,700,300,200,1000,900,500,100,100,1700,600,800,300,200)
celltype_2 <- c(950,850,450,50,50,1650,550,750,250,150,1150,750,650,250,150)
dat_a<-data.frame(sample_number,condition, celltype_1, celltype_2)
colnames(cell_matrix) <- c("sample_number","condition","celltype_1","celltype_2")

dat_b

  sample_number celltype per_w per_x per_y per_z
1             1        1    35    65    25    75
2             2        2    20    80    60    40
3             3        1    70    30    40    60
4             1        2    45    55    75    15
5             2        1    15    85     5    95
6             3        2    90    10    30    70

I have started different combinations of loops, group by() , and sapply() , but here is the most successful code thus far which calculates results for cell_type 1 (albeit without a perfectly formatted dataframe B), but doesn't yet have the flexibility of being applied across columns.我已经开始了循环、 group by()和sapply()的不同组合，但这是迄今为止最成功的代码，它计算 cell_type 1 的结果（尽管没有完美格式化的 dataframe B），但还没有跨列应用的灵活性。

dat_test = dat_a %>% 
  select(c(1,2,3)) %>% 
  group_by(sample_number) %>% 
  spread("condition",3)  %>% 
  mutate(per_w = 100*((A - B)/(A-D))) %>% 
  mutate(per_x = 100 - per_w) %>% 
  mutate(per_y = 100*((A - C)/(A-D))) %>%
  mutate(per_z = 100 - per_y) 

dat_test

  sample_number     A     B     C     D     E per_w per_x per_y per_z
  <chr>         <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1              1220   800   700   300   200  45.7  54.3  56.5  43.5
2 2              1000   900   500   100   100  11.1  88.9  55.6  44.4
3 3              1700   600   800   300   200  78.6  21.4  64.3  35.7

I have seen parts of my question in other stack questions, but have not determined how to put all the pieces together.我在其他堆栈问题中看到了我的部分问题，但还没有确定如何将所有部分放在一起。 I would appreciate any help you can provide.我将不胜感激您能提供的任何帮助。 Thank you!谢谢！

Answer 1

If you want to perform calculation on both cell type, you'll need to separate them into different rows (ie the first pivot_longer ).如果要对两种单元格类型执行计算，则需要将它们分成不同的行（即第一个pivot_longer ）。

library(tidyverse)

dat_a %>% 
  pivot_longer(starts_with("celltype"), names_to = "celltype", names_pattern = "celltype_(\\d)") %>% 
  pivot_wider(names_from = condition, values_from = value) %>% 
  group_by(celltype, sample_number) %>% 
  mutate(per_w = 100*((A - B)/(A-D)), 
         per_x = 100 - per_w,
         per_y = 100*((A - C)/(A-D)),
         per_z = 100 - per_y) %>% 
  select(-(A:E)) %>% 
  ungroup()

# A tibble: 6 × 6
  sample_number celltype per_w per_x per_y per_z
  <chr>         <chr>    <dbl> <dbl> <dbl> <dbl>
1 1             1         45.7  54.3  56.5  43.5
2 1             2         11.1  88.9  55.6  44.4
3 2             1         11.1  88.9  55.6  44.4
4 2             2         78.6  21.4  64.3  35.7
5 3             1         78.6  21.4  64.3  35.7
6 3             2         44.4  55.6  55.6  44.4

用多行计算多个百分比，分组，然后遍历 R 中的列

问题描述

1 个解决方案

解决方案1
0 2022-08-05 14:56:12

用多行计算多个百分比，分组，然后遍历 R 中的列

问题描述

1 个解决方案

解决方案1 0 2022-08-05 14:56:12

解决方案1
0 2022-08-05 14:56:12