Longtime lurker, first time writer.
Using dataframe A, I am trying to calculate 4 percentages using multiple rows, grouped by a column. I then hope to iterate those same calculations over other columns, saving the outputs into dataframe B.
Dataframe A (output by another program) looks like this:
sample_number <- c("1","1","1","1","1","2","2","2","2","2","3","3","3","3","3")
condition <- c("A","B","C","D","E","A","B","C","D","E","A","B","C","D","E")
celltype_1 <- c(1220,800,700,300,200,1000,900,500,100,100,1700,600,800,300,200)
celltype_2 <- c(950,850,450,50,50,1650,550,750,250,150,1150,750,650,250,150)
dat_a<-data.frame(sample_number,condition, celltype_1, celltype_2)
dat_a
sample_number condition celltype_1 celltype_2
1 1 A 1220 950
2 1 B 800 850
3 1 C 700 450
4 1 D 300 50
5 1 E 200 50
6 2 A 1000 1650
7 2 B 900 550
8 2 C 500 750
9 2 D 100 250
10 2 E 100 150
11 3 A 1700 1150
12 3 B 600 750
13 3 C 800 650
14 3 D 300 250
15 3 E 200 150
I hope to calculate the following percentages using the values in columns celltype_1 & _2 that correspond with these letters in the condition column:
per_w = 100*((A - B)/(A-D))
per_x = 100 - per_w
per_y = 100*((A - C)/(A-D))
per_z = 100 - per_y
and output the results into dataframe B:
sample_number <- c("1","1","1","1","1","2","2","2","2","2","3","3","3","3","3")
condition <- c("A","B","C","D","E","A","B","C","D","E","A","B","C","D","E")
celltype_1 <- c(1220,800,700,300,200,1000,900,500,100,100,1700,600,800,300,200)
celltype_2 <- c(950,850,450,50,50,1650,550,750,250,150,1150,750,650,250,150)
dat_a<-data.frame(sample_number,condition, celltype_1, celltype_2)
colnames(cell_matrix) <- c("sample_number","condition","celltype_1","celltype_2")
dat_b
sample_number celltype per_w per_x per_y per_z
1 1 1 35 65 25 75
2 2 2 20 80 60 40
3 3 1 70 30 40 60
4 1 2 45 55 75 15
5 2 1 15 85 5 95
6 3 2 90 10 30 70
I have started different combinations of loops, group by()
, and sapply()
, but here is the most successful code thus far which calculates results for cell_type 1 (albeit without a perfectly formatted dataframe B), but doesn't yet have the flexibility of being applied across columns.
dat_test = dat_a %>%
select(c(1,2,3)) %>%
group_by(sample_number) %>%
spread("condition",3) %>%
mutate(per_w = 100*((A - B)/(A-D))) %>%
mutate(per_x = 100 - per_w) %>%
mutate(per_y = 100*((A - C)/(A-D))) %>%
mutate(per_z = 100 - per_y)
dat_test
sample_number A B C D E per_w per_x per_y per_z
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1220 800 700 300 200 45.7 54.3 56.5 43.5
2 2 1000 900 500 100 100 11.1 88.9 55.6 44.4
3 3 1700 600 800 300 200 78.6 21.4 64.3 35.7
I have seen parts of my question in other stack questions, but have not determined how to put all the pieces together. I would appreciate any help you can provide. Thank you!
If you want to perform calculation on both cell type, you'll need to separate them into different rows (ie the first pivot_longer
).
library(tidyverse)
dat_a %>%
pivot_longer(starts_with("celltype"), names_to = "celltype", names_pattern = "celltype_(\\d)") %>%
pivot_wider(names_from = condition, values_from = value) %>%
group_by(celltype, sample_number) %>%
mutate(per_w = 100*((A - B)/(A-D)),
per_x = 100 - per_w,
per_y = 100*((A - C)/(A-D)),
per_z = 100 - per_y) %>%
select(-(A:E)) %>%
ungroup()
# A tibble: 6 × 6
sample_number celltype per_w per_x per_y per_z
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 1 45.7 54.3 56.5 43.5
2 1 2 11.1 88.9 55.6 44.4
3 2 1 11.1 88.9 55.6 44.4
4 2 2 78.6 21.4 64.3 35.7
5 3 1 78.6 21.4 64.3 35.7
6 3 2 44.4 55.6 55.6 44.4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.