简体   繁体   中英

Reshaping in data.table

EDIT: I have edited my question slightly because the suggested solution was a bit problematic for my dataset. The OP is written below.

I have a dataset df of which prop is the amount of observations in that year as a fraction of total observations. For example: For the Netherlands (NLD) 60% of observations have the year 2005. For Bulgaria (BLG) this is 50%.

    row country year prop
1:   1     NLD 2005  0.6
2:   2     NLD 2005  0.6
3:   3     BLG 2006  0.5
4:   4     BLG 2005  0.5
5:   5     GER 2005  1.0
6:   6     NLD 2007  0.2
7:   7     NLD 2005  0.6
8:   8     NLD 2008  0.2

What I want is to get the following:

    row country prop2005 prop2006 prop2007 prop 2008 
1:   1     NLD  0.6      0.0      0.2      0.2
2:   2     NLD  0.6      0.0      0.2      0.2
3:   3     NLD  0.6      0.0      0.2      0.2
4:   4     BLG  0.5      0.5      0.0      0.0
5:   5     BLG  0.5      0.5      0.0      0.0
6:   6     BLG  0.5      0.5      0.0      0.0
7:   7     GER  1.0      0.0      0.0      0.0
8:   8     GER  1.0      0.0      0.0      0.0
9:   9     GER  1.0      0.0      0.0      0.0

ORIGINAL POST:

I have a dataset df of which prop is the amount of observations in that year as a fraction of total observations. For example: For the Netherlands (NLD) 60% of observations have the year 2005. For Bulgaria (BLG) this is 50%.

    row country year prop
1:   1     NLD 2005  0.6
2:   2     NLD 2005  0.6
3:   3     BLG 2006  0.5
4:   4     BLG 2005  0.5
5:   5     GER 2005  1.0
6:   6     NLD 2007  0.2
7:   7     NLD 2005  0.6
8:   8     NLD 2008  0.2

I would like to connect these values to a different dataset ( df2 which has questions related to those years) and looks as follows:

    row country q05 q06 q07 q08 
1:   1     NLD  1   2   1   3   
2:   2     NLD  2   1   2   3   
3:   3     NLD  1   2   2   4   
4:   4     BLG  5   5   2   4   
5:   5     BLG  1   2   1   1   
6:   6     BLG  2   2   5   1   
7:   7     GER  3   5   4   4   
8:   8     GER  2   5   3   4   
9:   9     GER  1   2   3   5  

What I want is to get the following:

    row country prop2005 prop2006 prop2007 prop 2008 
1:   1     NLD  1   2   1   3   0.6      0.0      0.2      0.2
2:   2     NLD  2   1   2   3   0.6      0.0      0.2      0.2
3:   3     NLD  1   2   2   4   0.6      0.0      0.2      0.2
4:   4     BLG  5   5   2   4   0.5      0.5      0.0      0.0
5:   5     BLG  1   2   1   1   0.5      0.5      0.0      0.0
6:   6     BLG  2   2   5   1   0.5      0.5      0.0      0.0
7:   7     GER  3   5   4   4   1.0      0.0      0.0      0.0
8:   8     GER  2   5   3   4   1.0      0.0      0.0      0.0
9:   9     GER  1   2   3   5   1.0      0.0      0.0      0.0

In other words, for every observation, I want the proportions connected to that country added to the observation (as they function like a weight).

I am reasonably familiar with merging in data.table;

df1 <- merge(df1, df2,  by= "country", all.x = TRUE, allow.cartesian=FALSE)

However, I don't really know how I can reshape the data.table to correctly merge it.

Any suggestions?

CURRENT "SOLUTION":

df1 <- dcast(df1, country~year, value="prop")
df1 <- merge(df1, df2,  by= "country", all.x = TRUE, allow.cartesian=FALSE)

A possible solution:

melt(df2, id = 1:2, value.name = 'q'
     )[, year := as.integer(paste0('20',sub('\\D+','',variable)))
       ][df, on = .(country, year), prop := i.prop
         ][is.na(prop), prop := 0
           ][, dcast(.SD, row + country ~ year, value.var = c('q','prop'), sep = '')]

which gives:

  row country q2005 q2006 q2007 q2008 prop2005 prop2006 prop2007 prop2008 1: 1 NLD 1 2 1 3 0.6 0.0 0.2 0.2 2: 2 NLD 2 1 2 3 0.6 0.0 0.2 0.2 3: 3 NLD 1 2 2 4 0.6 0.0 0.2 0.2 4: 4 BLG 5 5 2 4 0.5 0.5 0.0 0.0 5: 5 BLG 1 2 1 1 0.5 0.5 0.0 0.0 6: 6 BLG 2 2 5 1 0.5 0.5 0.0 0.0 7: 7 GER 3 5 4 4 1.0 0.0 0.0 0.0 8: 8 GER 2 5 3 4 1.0 0.0 0.0 0.0 9: 9 GER 1 2 3 5 1.0 0.0 0.0 0.0 

To see how this works, you can split the code in several steps as follows:

df3 <- melt(df2, id = 1:2, value.name = 'q')[, year := as.integer(paste0('20',sub('\\D+','',variable)))]

df3[df, on = .(country, year), prop := i.prop][]
df3[is.na(prop), prop := 0][]
df3[, dcast(.SD, row + country ~ year, value.var = c('q','prop'), sep = '')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM