[英]Calculating multiple percents with multiple rows, grouping, then iterating over columns in R
长期潜伏者,第一次写作。
使用 dataframe A,我尝试使用按列分组的多行计算 4 个百分比。 然后我希望在其他列上迭代这些相同的计算,将输出保存到 dataframe B 中。
Dataframe A(由另一个程序输出)如下所示:
sample_number <- c("1","1","1","1","1","2","2","2","2","2","3","3","3","3","3")
condition <- c("A","B","C","D","E","A","B","C","D","E","A","B","C","D","E")
celltype_1 <- c(1220,800,700,300,200,1000,900,500,100,100,1700,600,800,300,200)
celltype_2 <- c(950,850,450,50,50,1650,550,750,250,150,1150,750,650,250,150)
dat_a<-data.frame(sample_number,condition, celltype_1, celltype_2)
dat_a
sample_number condition celltype_1 celltype_2
1 1 A 1220 950
2 1 B 800 850
3 1 C 700 450
4 1 D 300 50
5 1 E 200 50
6 2 A 1000 1650
7 2 B 900 550
8 2 C 500 750
9 2 D 100 250
10 2 E 100 150
11 3 A 1700 1150
12 3 B 600 750
13 3 C 800 650
14 3 D 300 250
15 3 E 200 150
我希望使用与条件列中的这些字母相对应的 celltype_1 和 _2 列中的值来计算以下百分比:
per_w = 100*((A - B)/(A-D))
per_x = 100 - per_w
per_y = 100*((A - C)/(A-D))
per_z = 100 - per_y
和 output 将结果转换为 dataframe B:
sample_number <- c("1","1","1","1","1","2","2","2","2","2","3","3","3","3","3")
condition <- c("A","B","C","D","E","A","B","C","D","E","A","B","C","D","E")
celltype_1 <- c(1220,800,700,300,200,1000,900,500,100,100,1700,600,800,300,200)
celltype_2 <- c(950,850,450,50,50,1650,550,750,250,150,1150,750,650,250,150)
dat_a<-data.frame(sample_number,condition, celltype_1, celltype_2)
colnames(cell_matrix) <- c("sample_number","condition","celltype_1","celltype_2")
dat_b
sample_number celltype per_w per_x per_y per_z
1 1 1 35 65 25 75
2 2 2 20 80 60 40
3 3 1 70 30 40 60
4 1 2 45 55 75 15
5 2 1 15 85 5 95
6 3 2 90 10 30 70
我已经开始了循环、 group by()
和sapply()
的不同组合,但这是迄今为止最成功的代码,它计算 cell_type 1 的结果(尽管没有完美格式化的 dataframe B),但还没有跨列应用的灵活性。
dat_test = dat_a %>%
select(c(1,2,3)) %>%
group_by(sample_number) %>%
spread("condition",3) %>%
mutate(per_w = 100*((A - B)/(A-D))) %>%
mutate(per_x = 100 - per_w) %>%
mutate(per_y = 100*((A - C)/(A-D))) %>%
mutate(per_z = 100 - per_y)
dat_test
sample_number A B C D E per_w per_x per_y per_z
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1220 800 700 300 200 45.7 54.3 56.5 43.5
2 2 1000 900 500 100 100 11.1 88.9 55.6 44.4
3 3 1700 600 800 300 200 78.6 21.4 64.3 35.7
我在其他堆栈问题中看到了我的部分问题,但还没有确定如何将所有部分放在一起。 我将不胜感激您能提供的任何帮助。 谢谢!
如果要对两种单元格类型执行计算,则需要将它们分成不同的行(即第一个pivot_longer
)。
library(tidyverse)
dat_a %>%
pivot_longer(starts_with("celltype"), names_to = "celltype", names_pattern = "celltype_(\\d)") %>%
pivot_wider(names_from = condition, values_from = value) %>%
group_by(celltype, sample_number) %>%
mutate(per_w = 100*((A - B)/(A-D)),
per_x = 100 - per_w,
per_y = 100*((A - C)/(A-D)),
per_z = 100 - per_y) %>%
select(-(A:E)) %>%
ungroup()
# A tibble: 6 × 6
sample_number celltype per_w per_x per_y per_z
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 1 45.7 54.3 56.5 43.5
2 1 2 11.1 88.9 55.6 44.4
3 2 1 11.1 88.9 55.6 44.4
4 2 2 78.6 21.4 64.3 35.7
5 3 1 78.6 21.4 64.3 35.7
6 3 2 44.4 55.6 55.6 44.4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.