简体   繁体   中英

R Subtracting every other row from row below

So I searched for a few other questions but they weren't quite what I was looking for.

I have a data frame, with samples in columns, and conditions in rows. The data is arranged as below, except there are around 200 rows and around 30000 columns:

donor_id time  stimulation         Gene_1         Gene_2         Gene_3         Gene_4         Gene_5         Gene_6         Gene_7         Gene_8
A        0.5h         U         80.56644               0        55.68308        3.567304        6.465864        1.095409        490.3318        2.322889
A        0.5h         Stim      79.37402               0        55.88619        4.394622        6.503430        1.190555        453.7305        0.169858
A        1h           U         62.73152               0        53.01435        3.596723        7.272073        0.736384        349.6818        1.307157
A        1h           Stim      54.82245               0        53.17697        3.445614        5.385228        1.520416        332.2109        1.378058
B        0.5h         U         69.89228               0        51.78394        2.410192        5.668343        1.482302        377.0095        0.589922
B        0.5h         Stim      64.42587               0        52.67998        1.085260        8.958538        0.977994        382.8479        0.312372
B        1h           U         56.47391        0.323123        52.93331        2.925232        5.650667        1.396532        356.9900        1.657515
B        1h           Stim      0.25548         0.085027        49.85429        1.355360        5.030664        2.175491        218.5442        0.290898

I want to subtract all of the "U" rows from the "Stim" rows, leaving me half the number of rows I started with. Each row in the full table does have a unique combination of donor_id and time

All the similar questions I can find by searching seem to want either to subtract one row from everything else, or want to subtract every row from the row above it, rather than every other row. I am sure there must be some way using a FOR loop or a lapply, but I can't figure out how to get it across all rows and columns.

This is a base R option:

aggregate(df[4:11], by = list("donor_id" = df$donor_id, "time" = df$time), diff)

  donor_id time    Gene_1    Gene_2   Gene_3    Gene_4    Gene_5    Gene_6    Gene_7
1        A 0.5h  -1.19242  0.000000  0.20311  0.827318  0.037566  0.095146  -36.6013
2        B 0.5h  -5.46641  0.000000  0.89604 -1.324932  3.290195 -0.504308    5.8384
3        A   1h  -7.90907  0.000000  0.16262 -0.151109 -1.886845  0.784032  -17.4709
4        B   1h -56.21843 -0.238096 -3.07902 -1.569872 -0.620003  0.778959 -138.4458
     Gene_8
1 -2.153031
2 -0.277550
3  0.070901
4 -1.366617

Or a dplyr solution:

df %>%
  group_by(donor_id, time) %>%
  summarise_at(vars(starts_with("Gene")), diff)

# Groups:   donor_id [2]
  donor_id time  Gene_1 Gene_2 Gene_3 Gene_4  Gene_5  Gene_6  Gene_7  Gene_8
  <fct>    <fct>  <dbl>  <dbl>  <dbl>  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 A        0.5h   -1.19  0      0.203  0.827  0.0376  0.0951  -36.6  -2.15  
2 A        1h     -7.91  0      0.163 -0.151 -1.89    0.784   -17.5   0.0709
3 B        0.5h   -5.47  0      0.896 -1.32   3.29   -0.504     5.84 -0.278 
4 B        1h    -56.2  -0.238 -3.08  -1.57  -0.620   0.779  -138.   -1.37 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM