简体   繁体   中英

Preserving row order while adding one data.frame's values to another based on conditions

tstep <- rep(c("a", "b", "c", "d", "e"), 5)
Variable <- c(rep(c("v"), 5), rep(c("w"), 5), rep(c("x"), 5), rep(c("y"), 5), rep(c("x"), 5))  
Value <- c(1,2,3,4,5,10,11,12,13,14,33,22,44,57,5,3,2,1,2,3,34,24,11,11,7)
Scenario <- c(rep(c("i"), 20), rep(c("j"), 5) ) 
df1 <- data.frame(tstep, Variable, Value, Scenario)

tstep <- c("a", "b", "c", "d", "e")
Variable <- rep(c("x"), 5) 
Value <- c(100, 34, 100,22, 100)
Scenario <- c(rep(c("i"), 5))
df2<- data.frame(tstep, Variable, Value, Scenario)

I've found similar posts but it seems there might be quite a few methods. I'm hoping to find a fast one as these are samples of ~0.5 gb long .csvs with many variables and I may have more columns to need to include. I'm hoping to not need to cut up and put back together df1 .

Which do you prefer to add $Value of df2 to df1 for matching tstep, Variable, and Scenario columns, and preserve the original row order in df1 ?

 #df2 from above, that I want to add to df1 from above, for matching rows
 tstep Variable Value Scenario
  a        x     100        i
  b        x     34         i
  c        x     100        i
  d        x     22         i
  e        x     100        i

  #df1 from above               #desired df1:
  tstep Variable Value Scenario tstep Variable Value Scenario
  a        v     1         i    a        v     1         i
  b        v     2         i    b        v     2         i
  c        v     3         i    c        v     3         i
  d        v     4         i    d        v     4         i
  e        v     5         i    e        v     5         i
  a        w    10         i    a        w    10         i
  b        w    11         i    b        w    11         i
  c        w    12         i    c        w    12         i
  d        w    13         i    d        w    13         i
  e        w    14         i    e        w    14         i
  a        x    33         i    a        x   133         i
  b        x    22         i    b        x    56         i
  c        x    44         i    c        x   144         i
  d        x    57         i    d        x    79         i
  e        x     5         i    e        x   105         i
  a        y     3         i    a        y     3         i
  b        y     2         i    b        y     2         i
  c        y     1         i    c        y     1         i
  d        y     2         i    d        y     2         i
  e        y     3         i    e        y     3         i
  a        x    34         j    a        x    34         j
  b        x    24         j    b        x    24         j
  c        x    11         j    c        x    11         j
  d        x    11         j    d        x    11         j
  e        x     7         j    e        x     7         j

Here is a short solution using data.table package using an update join:

library(data.table)
#convert df1 and df2 into data.table
setDT(df1)
setDT(df2)

#this is an update join. 
#'join' df1 with df2 using tstep, Variable, Scenario. 
#'update' (`:=`) Value in df1 using its Value + df2's Value where there is join
df1[df2, Value := Value + i.Value, on=.(tstep, Variable, Scenario)]
df1

output:

    tstep Variable Value Scenario
 1:     a        v     1        i
 2:     b        v     2        i
 3:     c        v     3        i
 4:     d        v     4        i
 5:     e        v     5        i
 6:     a        w    10        i
 7:     b        w    11        i
 8:     c        w    12        i
 9:     d        w    13        i
10:     e        w    14        i
11:     a        x   133        i
12:     b        x    56        i
13:     c        x   144        i
14:     d        x    79        i
15:     e        x   105        i
16:     a        y     3        i
17:     b        y     2        i
18:     c        y     1        i
19:     d        y     2        i
20:     e        y     3        i
21:     a        x    34        j
22:     b        x    24        j
23:     c        x    11        j
24:     d        x    11        j
25:     e        x     7        j
    tstep Variable Value Scenario

Some introductory data.table materials: https://github.com/Rdatatable/data.table/wiki/Getting-started


To address OP's comment when applying to multiple csvs:

library(data.table)
rbindlist(
    lapply(c("csv1.csv", "csv14.csv"), function(nm) {
        x <- fread(nm)
        x[x[Variable=="y"], Value := Value + i.Value, on=.(tstep, Variable, Scenario)]
        x
    }),
    use.names=TRUE)

Not the most efficient solution but one possible alternative:

library(dplyr)    

df1 %>% 
  left_join(df2, by = c("tstep", "Variable", "Scenario")) %>%
  mutate(Value.x = if_else(is.na(Value.y), Value.x, Value.x + Value.y)) %>%
  select(1, 2, Value = 3, 4)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM