So I searched for a few other questions but they weren't quite what I was looking for.
I have a data frame, with samples in columns, and conditions in rows. The data is arranged as below, except there are around 200 rows and around 30000 columns:
donor_id time stimulation Gene_1 Gene_2 Gene_3 Gene_4 Gene_5 Gene_6 Gene_7 Gene_8
A 0.5h U 80.56644 0 55.68308 3.567304 6.465864 1.095409 490.3318 2.322889
A 0.5h Stim 79.37402 0 55.88619 4.394622 6.503430 1.190555 453.7305 0.169858
A 1h U 62.73152 0 53.01435 3.596723 7.272073 0.736384 349.6818 1.307157
A 1h Stim 54.82245 0 53.17697 3.445614 5.385228 1.520416 332.2109 1.378058
B 0.5h U 69.89228 0 51.78394 2.410192 5.668343 1.482302 377.0095 0.589922
B 0.5h Stim 64.42587 0 52.67998 1.085260 8.958538 0.977994 382.8479 0.312372
B 1h U 56.47391 0.323123 52.93331 2.925232 5.650667 1.396532 356.9900 1.657515
B 1h Stim 0.25548 0.085027 49.85429 1.355360 5.030664 2.175491 218.5442 0.290898
I want to subtract all of the "U" rows from the "Stim" rows, leaving me half the number of rows I started with. Each row in the full table does have a unique combination of donor_id and time
All the similar questions I can find by searching seem to want either to subtract one row from everything else, or want to subtract every row from the row above it, rather than every other row. I am sure there must be some way using a FOR loop or a lapply, but I can't figure out how to get it across all rows and columns.
This is a base R option:
aggregate(df[4:11], by = list("donor_id" = df$donor_id, "time" = df$time), diff)
donor_id time Gene_1 Gene_2 Gene_3 Gene_4 Gene_5 Gene_6 Gene_7
1 A 0.5h -1.19242 0.000000 0.20311 0.827318 0.037566 0.095146 -36.6013
2 B 0.5h -5.46641 0.000000 0.89604 -1.324932 3.290195 -0.504308 5.8384
3 A 1h -7.90907 0.000000 0.16262 -0.151109 -1.886845 0.784032 -17.4709
4 B 1h -56.21843 -0.238096 -3.07902 -1.569872 -0.620003 0.778959 -138.4458
Gene_8
1 -2.153031
2 -0.277550
3 0.070901
4 -1.366617
Or a dplyr
solution:
df %>%
group_by(donor_id, time) %>%
summarise_at(vars(starts_with("Gene")), diff)
# Groups: donor_id [2]
donor_id time Gene_1 Gene_2 Gene_3 Gene_4 Gene_5 Gene_6 Gene_7 Gene_8
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 0.5h -1.19 0 0.203 0.827 0.0376 0.0951 -36.6 -2.15
2 A 1h -7.91 0 0.163 -0.151 -1.89 0.784 -17.5 0.0709
3 B 0.5h -5.47 0 0.896 -1.32 3.29 -0.504 5.84 -0.278
4 B 1h -56.2 -0.238 -3.08 -1.57 -0.620 0.779 -138. -1.37
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.