简体   繁体   中英

R: Merging dataframes

I am looking to merge to dataframes, but the manner in which I would like to merge them is a bit uncommon.

I will illustrate with an example:

 Matrix1 Col1 Col2 Vol VWAP Value ABC 1 2 4 8 ABC 2 3 5 15 DEF 1 8 9 72 DEF 2 8 9 72 Matrix 2 Col1 Col2 Vol VWAP Value ABC 1 4 7 28 ABC 2 5 1 5 HIJ 1 6 6 36 HIJ 2 7 3 21 

I would like to then get the following matrix:

Matrix3

 Col1 Col2 Vol VWAP Value ABC 1 6 6 36 ABC 2 8 2.5 20 DEF 1 8 9 72 DEF 2 8 9 72 HIJ 1 6 6 36 HIJ 2 7 3 21 

In the first two matrices the VWAP column is just the Value column divided by the Vol column. The third matrix combines the first two in the following manner: If the first two Cols are the same, add the Vol and Value cols of the matching rows. If there is no match, just add the unmatched rows to the end of the matrix. The VWAP column of Matrix3 is then again just the Value col divided by the Vol col.

I tried the following:

Matrix3 = merge(Matrix1 ,Matrix2, all = TRUE)  
Matrix3[,4] = Matrix3[,5]/Matrix3[,3]

but for some reason it isn't summing the Vol or the Value columns. I have checked, and the first column is a character, while the rest are either numeric/integer.

Any ideas?

Thanks

Mike

If you treat them as data frames, you can append them first using rbind() then use `ddply()' to summarize the Vol, Value, and calculate the V

df1<-data.frame(Col1=c("ABC","ABC","DEF","DEF"),
                Col2=c(1,2,1,2),
                Vol=c(2,3,8,8),
                VWAP=c(4,5,9,9),
                Value=c(8,15,72,72))  

df2<-data.frame(Col1=c("ABC","ABC","HIJ","HIJ"),
                Col2=c(1,2,1,2),
                Vol=c(4,5,6,7),
                VWAP=c(7,1,6,3),
                Value=c(28,5,36,21))  

merged=rbind(df1,df2)             # stick the dfs together
require(plyr)                     # library
ddply(merged,
     .(Col1,Col2),
     summarize,
     Vol=sum(Vol),
     VWAP=sum(Value)/sum(Vol),
     Value=sum(Value))

  Col1 Col2 Vol VWAP Value
1  ABC    1   6  6.0    36
2  ABC    2   8  2.5    20
3  DEF    1   8  9.0    72
4  DEF    2   8  9.0    72
5  HIJ    1   6  6.0    36
6  HIJ    2   7  3.0    21

First a comment on notation: Don't call your data.frame Matrix1. In R the classes matrix and data.frame are different.

Anyway, the merge command cannot possibly know that it is supposed to add your "Value" and "Vol" columns. You should first merge and then take care of the addition afterwards. Here's how you can solve this:

m3 <- merge(Matrix1, Matrix2, by=c("Col1", "Col2"), all=TRUE)
# add vol and value
m3[, "Vol"] <- rowSums(m3[, c("Vol.x", "Vol.y")], na.rm=TRUE)
m3[, "Value"] <- rowSums(m3[, c("Value.x", "Value.y")], na.rm=TRUE)
# divide to get vwap
m3[, "VWAP"] <- m3[, "Value"]/m3[, "Vol"]
# extract result
res <- m3[, c("Col1", "Col2", "Vol", "VWAP", "Value")]
res 
##    Col1 Col2 Vol VWAP Value
##  1  ABC    1   6  6.0    36
##  2  ABC    2   8  2.5    20
##  3  DEF    1   8  9.0    72
##  4  DEF    2   8  9.0    72
##  5  HIJ    1   6  6.0    36
##  6  HIJ    2   7  3.0    21

You can do it manullay:

id <- mat1$Col1 %in% mat2$Col1 &
    mat1$Col2 %in% mat2$Col2

mat1[id,c('Vol')] <- colSums(rbind(mat1[id,c('Vol')],
                    mat2[id,c('Vol')]))

mat1[id,c('Value')] <- colSums(rbind(mat1[id,c('Value')],
                                     mat2[id,c('Value')]))

m3 <- rbind(mat1,mat2[!id,])

m3[, "VWAP"] <- m3[, "Value"]/m3[, "Vol"]

# Col1 Col2 Vol VWAP Value
# 1   ABC    1   6  6.0    36
# 2   ABC    2   8  2.5    20
# 3   DEF    1   8  9.0    72
# 4   DEF    2   8  9.0    72
# 31  HIJ    1   6  6.0    36
# 41  HIJ    2   7  3.0    21

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM