tstep <- rep(c("a", "b", "c", "d", "e"), 5)
Variable <- c(rep(c("v"), 5), rep(c("w"), 5), rep(c("x"), 5), rep(c("y"), 5), rep(c("x"), 5))
Value <- c(1,2,3,4,5,10,11,12,13,14,33,22,44,57,5,3,2,1,2,3,34,24,11,11,7)
Scenario <- c(rep(c("i"), 20), rep(c("j"), 5) )
df1 <- data.frame(tstep, Variable, Value, Scenario)
tstep <- c("a", "b", "c", "d", "e")
Variable <- rep(c("x"), 5)
Value <- c(100, 34, 100,22, 100)
Scenario <- c(rep(c("i"), 5))
df2<- data.frame(tstep, Variable, Value, Scenario)
I've found similar posts but it seems there might be quite a few methods. I'm hoping to find a fast one as these are samples of ~0.5 gb long .csvs
with many variables and I may have more columns to need to include. I'm hoping to not need to cut up and put back together df1
.
Which do you prefer to add $Value
of df2
to df1
for matching tstep, Variable, and Scenario columns, and preserve the original row order in df1
?
#df2 from above, that I want to add to df1 from above, for matching rows
tstep Variable Value Scenario
a x 100 i
b x 34 i
c x 100 i
d x 22 i
e x 100 i
#df1 from above #desired df1:
tstep Variable Value Scenario tstep Variable Value Scenario
a v 1 i a v 1 i
b v 2 i b v 2 i
c v 3 i c v 3 i
d v 4 i d v 4 i
e v 5 i e v 5 i
a w 10 i a w 10 i
b w 11 i b w 11 i
c w 12 i c w 12 i
d w 13 i d w 13 i
e w 14 i e w 14 i
a x 33 i a x 133 i
b x 22 i b x 56 i
c x 44 i c x 144 i
d x 57 i d x 79 i
e x 5 i e x 105 i
a y 3 i a y 3 i
b y 2 i b y 2 i
c y 1 i c y 1 i
d y 2 i d y 2 i
e y 3 i e y 3 i
a x 34 j a x 34 j
b x 24 j b x 24 j
c x 11 j c x 11 j
d x 11 j d x 11 j
e x 7 j e x 7 j
Here is a short solution using data.table
package using an update join:
library(data.table)
#convert df1 and df2 into data.table
setDT(df1)
setDT(df2)
#this is an update join.
#'join' df1 with df2 using tstep, Variable, Scenario.
#'update' (`:=`) Value in df1 using its Value + df2's Value where there is join
df1[df2, Value := Value + i.Value, on=.(tstep, Variable, Scenario)]
df1
output:
tstep Variable Value Scenario
1: a v 1 i
2: b v 2 i
3: c v 3 i
4: d v 4 i
5: e v 5 i
6: a w 10 i
7: b w 11 i
8: c w 12 i
9: d w 13 i
10: e w 14 i
11: a x 133 i
12: b x 56 i
13: c x 144 i
14: d x 79 i
15: e x 105 i
16: a y 3 i
17: b y 2 i
18: c y 1 i
19: d y 2 i
20: e y 3 i
21: a x 34 j
22: b x 24 j
23: c x 11 j
24: d x 11 j
25: e x 7 j
tstep Variable Value Scenario
Some introductory data.table
materials: https://github.com/Rdatatable/data.table/wiki/Getting-started
To address OP's comment when applying to multiple csvs:
library(data.table)
rbindlist(
lapply(c("csv1.csv", "csv14.csv"), function(nm) {
x <- fread(nm)
x[x[Variable=="y"], Value := Value + i.Value, on=.(tstep, Variable, Scenario)]
x
}),
use.names=TRUE)
Not the most efficient solution but one possible alternative:
library(dplyr)
df1 %>%
left_join(df2, by = c("tstep", "Variable", "Scenario")) %>%
mutate(Value.x = if_else(is.na(Value.y), Value.x, Value.x + Value.y)) %>%
select(1, 2, Value = 3, 4)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.