简体   繁体   中英

How to subtract even row numbers with odd row numbers across columns using dplyr in R

My data frame looks like this

df <-data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8), time=rep(c("0h","72h"),2))
  col1 col2 time
1    1    5   0h
2    2    6  72h
3    3    7   0h
4    4    8  72h

I want to use the mutate_across or any other dplyr function (preferably) to subtract the values of the 72h with the values of the 0h from the previous row in each column.

I would like my data to look like this

  col1 col2 time
     1    1   72h
     1    1   72h

base

df <-data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8), time=rep(c(0,72),2))

df[c(FALSE,TRUE), ] - df[c(TRUE, FALSE), ]
#>   col1 col2 time
#> 2    1    1   72
#> 4    1    1   72

Created on 2021-07-06 by the reprex package (v2.0.0)

tidyverse using the approach @Emir Dakin

library(tidyverse)
df <-data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8), time=rep(c("0h", "72h"),2))

df %>%
  mutate(across(where(is.numeric), ~.x - lag(.x, default = first(.x)))) %>%
  filter(time == "72h")
#>   col1 col2 time
#> 1    1    1  72h
#> 2    1    1  72h

Created on 2021-07-06 by the reprex package (v2.0.0)

You can use the lag function if the data is neatly ordered as you've shown. This is a very straight-forward application but it should work, I don't think you need anything else than mutate :

df %>%
  mutate(col1 = col1 - lag(col1, default = first(col1)),
         col2 = col2 - lag(col2, default = first(col2))) %>%
  filter(time == "72h")

With the answer by Emir Dakin, I have added a control with the sequence of occurrence of time:

library(dplyr)
df %>% group_by(time) %>% mutate(sl= seq(time)) %>% group_by(sl) %>% 
  mutate(col1 = col1 - lag(col1, default = first(col1), order_by = time), 
         col2 = col2 - lag(col2, default = first(col2), order_by = time))  %>% 
  ungroup() %>% filter(time  == "72h") %>% select(col1, col2, time) 

# A tibble: 2 x 3
   col1  col2 time 
  <dbl> <dbl> <chr>
1     1     1 72h  
2     1     1 72h  

Or:

library(tidyverse)

df <-data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8), time=rep(c("0h","72h"),2))

df %>%
  mutate(id = rep(seq(nrow(df) / 2), each = 2), # create an id of what belongs together
         tmp = rep(c("start", "end"), nrow(df) / 2),
         time = as.numeric(str_remove(time, "h"))) %>%
  mutate_at(vars("col1":"time"), ~if_else(tmp == "start", .x * -1, .x)) %>%
  group_by(id) %>%
  summarise_at(vars("col1":"time"), sum) 

# # A tibble: 2 x 4
# id  col1  col2  time
# <int> <dbl> <dbl> <dbl>
# 1     1     1     1    72
# 2     2     1     1    72

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM