简体   繁体   中英

Multiply column values in one data.frame by column in another data.frame on a condition in R

I have two data frames in r that I am trying to combine based on the values in a column for each.

df1=data.frame(comp=c("comp1", "comp2", "comp3","comp1"),
 state1=c(1,0,0,1),
 state2=c(1,1,0,1),
 state3=c(0,1,1,0),
 state4=c(0,0,1,0),year=c(1,1,1,2))

   comp state1 state2 state3 state4 year
1 comp1      1      1      0      0    1
2 comp2      0      1      1      0    1
3 comp3      0      0      1      1    1
4 comp1      1      1      0      0    2

df2=data.frame(state=c("state1","state2", "state3", "state4", 
                       "state1","state2", "state3", "state4"), 
 var1=c(1,0,0,1,0,0,1,1), 
 var2=c(0,1,0,0,0,1,1,0), 
 year=c(1,1,1,1,2,2,2,2))

df2

    state var1 var2 year
1 state1    1    0    1
2 state2    0    1    1
3 state3    0    0    1
4 state4    1    0    1
5 state1    0    1    2
6 state2    0    1    2
7 state3    1    1    2
8 state4    1    0    2

I'd like to append columns to df1 that are var1, var2 which is the mean of all states for that comp.

so, var1 for comp1 should be 1*1+1*0+0*0+0*1/(1+1) or state*var/sum(state for comp) by year.

df3 would look like:

         state1 state2 state3 state4 year  var1  var2
    1 comp1  1      1      0      0     1   0.5   0.5
    2 comp2  0      1      1      0     1   0.0   0.5
    3 comp3  0      0      1      1     1   0.5   0.0
    4 comp1  1      1      0      0     2   0.5   1.0

Is this possible? I tried to use ddply with mean of var1, summarizing by comp and year, but that doesn't work. I end up with more than one row per comp per year.

Thanks in advance. This one is the most similar to my problem, but it doesn't show a conditional in the second data set. Multiply various subsets of a data frame by different vectors

Please advise.

My hope is that by breaking this into segments you can find out why my results look different than your prediction:

 df3 <- matrix(NA, ncol=2, nrow=nrow(df1))
 for (i in seq(nrow(df1))) {
     df3[i, 1] <- sum(df2[ df2$year==df1$year[i], "var1"] * df1[i, 2:5])
     df3[i, 2] <- sum(df2[ df2$year==df1$year[i], "var2"] * df1[i, 2:5])
 }
 m4<-df3/rowSums(df1[2:5])
 cbind(df1, m4)
#---------------
   comp state1 state2 state3 state4 year   1         2
1 comp1      1      1      0      0    1 0.5 0.5000000
2 comp2      0      1      1      0    1 0.0 0.3333333
3 comp3      0      0      1      1    1 0.5 0.0000000
4 comp1      1      1      0      0    2 0.0 0.3333333

Seems to match up ok on "var1" entries and I'm hoping you just threw in some guesses for "var2".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM