简体   繁体   中英

R - Create a new variable where each observation depends on another table and other variables in the data frame

I have the two following tables:

df <- data.frame(eth = c("A","B","B","A","C"),ZIP1 = c(1,1,2,3,5))
Inc <- data.frame(ZIP2 = c(1,2,3,4,5,6,7),A = c(56,98,43,4,90,19,59), B = c(49,10,69,30,10,4,95),C = c(69,2,59,8,17,84,30))

eth    ZIP1         ZIP2    A    B    C
A      1            1      56   49   69
B      1            2      98   10   2
B      2            3      43   69   59
A      3            4      4    30   8
C      5            5      90   10   17
                    6      19   4    84
                    7      59   95   39

I would like to create a variable Inc in the df data frame where for each observation, the value is the intersection of the eth and ZIP of the observation. In my example, it would lead to:

   eth    ZIP1   Inc        
    A      1    56
    B      1    49
    B      2    10
    A      3    43
    C      5    17

A loop or quite brute force could solve it but it takes time on my dataset, I'm looking for a more subtle way maybe using data.table. It seems to me that it is a very standard question and I'm apologizing if it is, my unability to formulate a precise title for this problem (as you may have noticed..) is maybe why I haven't found any similar question in searching on the forum..

Thanks !

Sure, it can be done in data.table:

library(data.table)
setDT(df)

df[ melt(Inc, id.var="ZIP2", variable.name="eth", value.name="Inc"), 
  Inc := i.Inc
, on=c(ZIP1 = "ZIP2","eth") ]

The syntax for this "merge-assign" operation is X[i, Xcol := expression, on=merge_cols] .

You can run the i = melt(Inc, id.var="ZIP", variable.name="eth", value.name="Inc") part on its own to see how it works. Inside the merge, columns from i can be referred to with i.* prefixes.


Alternately...

setDT(df)
setDT(Inc)
df[, Inc := Inc[.(ZIP1), eth, on="ZIP2", with=FALSE], by=eth]

This is built on a similar idea. The package vignettes are a good place to start for this sort of syntax.

We can use row/column indexing

df$Inc <- Inc[cbind(match(df$ZIP1, Inc$ZIP2), match(df$eth, colnames(Inc)))]

df
#  eth ZIP1 Inc
#1   A    1  56
#2   B    1  49
#3   B    2  10
#4   A    3  43
#5   C    5  17

What about this?

library(reshape2)
merge(df, melt(Inc, id="ZIP2"), by.x = c("ZIP1", "eth"), by.y = c("ZIP2", "variable"))
  ZIP1 eth value
1    1   A    56
2    1   B    49
3    2   B    10
4    3   A    43
5    5   C    17

Another option:

library(dplyr)
library(tidyr)
Inc %>%
  gather(eth, value, -ZIP2) %>%
  left_join(df, ., by = c("eth", "ZIP1" = "ZIP2"))

my solution(which maybe seems awkward)

for (i in 1:length(df$eth)) {
    df$Inc[i] <- Inc[as.character(df$eth[i])][df$ZIP[i],]
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM