[英]R Subtracting every other row from row below
So I searched for a few other questions but they weren't quite what I was looking for. 因此,我搜索了其他一些问题,但这些问题与我所寻找的不完全相同。
I have a data frame, with samples in columns, and conditions in rows. 我有一个数据框,列中有示例,行中有条件。 The data is arranged as below, except there are around 200 rows and around 30000 columns:
数据排列如下,除了大约200行和30000列:
donor_id time stimulation Gene_1 Gene_2 Gene_3 Gene_4 Gene_5 Gene_6 Gene_7 Gene_8
A 0.5h U 80.56644 0 55.68308 3.567304 6.465864 1.095409 490.3318 2.322889
A 0.5h Stim 79.37402 0 55.88619 4.394622 6.503430 1.190555 453.7305 0.169858
A 1h U 62.73152 0 53.01435 3.596723 7.272073 0.736384 349.6818 1.307157
A 1h Stim 54.82245 0 53.17697 3.445614 5.385228 1.520416 332.2109 1.378058
B 0.5h U 69.89228 0 51.78394 2.410192 5.668343 1.482302 377.0095 0.589922
B 0.5h Stim 64.42587 0 52.67998 1.085260 8.958538 0.977994 382.8479 0.312372
B 1h U 56.47391 0.323123 52.93331 2.925232 5.650667 1.396532 356.9900 1.657515
B 1h Stim 0.25548 0.085027 49.85429 1.355360 5.030664 2.175491 218.5442 0.290898
I want to subtract all of the "U" rows from the "Stim" rows, leaving me half the number of rows I started with. 我想从“刺激”行中减去所有“ U”行,剩下的行数只有我的一半。 Each row in the full table does have a unique combination of donor_id and time
完整表中的每一行都具有donor_id和时间的唯一组合
All the similar questions I can find by searching seem to want either to subtract one row from everything else, or want to subtract every row from the row above it, rather than every other row. 我通过搜索可以找到的所有类似问题似乎都是要从其他所有内容中减去一行,或者是要从其上方的行中减去每一行,而不是每隔一行。 I am sure there must be some way using a FOR loop or a lapply, but I can't figure out how to get it across all rows and columns.
我确信必须有某种方式可以使用FOR循环或lapply,但是我不知道如何在所有行和列中使用它。
This is a base R option: 这是基本的R选项:
aggregate(df[4:11], by = list("donor_id" = df$donor_id, "time" = df$time), diff)
donor_id time Gene_1 Gene_2 Gene_3 Gene_4 Gene_5 Gene_6 Gene_7
1 A 0.5h -1.19242 0.000000 0.20311 0.827318 0.037566 0.095146 -36.6013
2 B 0.5h -5.46641 0.000000 0.89604 -1.324932 3.290195 -0.504308 5.8384
3 A 1h -7.90907 0.000000 0.16262 -0.151109 -1.886845 0.784032 -17.4709
4 B 1h -56.21843 -0.238096 -3.07902 -1.569872 -0.620003 0.778959 -138.4458
Gene_8
1 -2.153031
2 -0.277550
3 0.070901
4 -1.366617
Or a dplyr
solution: 或
dplyr
解决方案:
df %>%
group_by(donor_id, time) %>%
summarise_at(vars(starts_with("Gene")), diff)
# Groups: donor_id [2]
donor_id time Gene_1 Gene_2 Gene_3 Gene_4 Gene_5 Gene_6 Gene_7 Gene_8
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 0.5h -1.19 0 0.203 0.827 0.0376 0.0951 -36.6 -2.15
2 A 1h -7.91 0 0.163 -0.151 -1.89 0.784 -17.5 0.0709
3 B 0.5h -5.47 0 0.896 -1.32 3.29 -0.504 5.84 -0.278
4 B 1h -56.2 -0.238 -3.08 -1.57 -0.620 0.779 -138. -1.37
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.