简体   繁体   中英

Using mapply for indirect addressing in a data frame

With the following two data frames

> d1
  keystr keynum 
1    abc      5
2    def      2
3    def      7 
4    abc      3 

> d2
   HD  2  3  5  7
1 abc  H  I  J  K
2 def  L  M  N  P

I would like to insert a column d1$val that uses the string in keystr and the number in keynum as indices in the d2 data frame. The result should be:

> d1
  keystr keynum  val
1    abc      5    J
2    def      2    L
3    def      7    P 
4    abc      3    I 

This should be an indirect application of mapply. How can I make the code below

d1 <- data.frame("keystr"=c("abc","def","def","abc"), "keynum"=c(5,2,7,3))
d2 <- data.frame("HD"=c("abc","def"),
                 "2"=c("H","L"), "3"=c("I","M"),
                 "5"=c("J","N"), "7"=c("K","P"))
d1$val <- mapply(function(kstr,knum) d2[kstr,knum],
                 d1$keystr, d1$keynum )

access the entries in this (indirect) fashion?

If you are not bounded to use mapply you can do a join:

Code:

library(tidyverse)
d1 <- data.frame("keystr"=c("abc","def","def","abc"), "keynum"=c(5,2,7,3))
d2 <- data.frame("HD"=c("abc","def"),
                 "2"=c("H","L"), "3"=c("I","M"),
                 "5"=c("J","N"), "7"=c("K","P"))

d2 %>%
  gather(keynum, value, -HD) %>%
  mutate(keynum = as.numeric(gsub(keynum, pattern = "X", replacement = ""))) %>%
  left_join(y = ., x = d1, by = c("keystr" = "HD", "keynum"))

Output:

  keystr keynum value
1    abc      5     J
2    def      2     L
3    def      7     P
4    abc      3     I

We can transform the data frame and then conduct a merge by and .

library(dplyr)
library(tidyr)

d3 <- d2 %>%
  gather(keynum, letter, -HD) %>%
  mutate(keynum = as.numeric(sub("X", "", keynum)))

d4 <- d1 %>%
  left_join(d3, by = c("keystr" = "HD", "keynum"))
d4
#   keystr keynum letter
# 1    abc      5      J
# 2    def      2      L
# 3    def      7      P
# 4    abc      3      I

DATA

Notice that I set stringsAsFactors = FALSE when creating the data frames.

d1 <- data.frame("keystr"=c("abc","def","def","abc"), "keynum"=c(5,2,7,3),
                 stringsAsFactors = FALSE)
d2 <- data.frame("HD"=c("abc","def"),
                 "2"=c("H","L"), "3"=c("I","M"),
                 "5"=c("J","N"), "7"=c("K","P"),
                 stringsAsFactors = FALSE)

You can use d1 columns to index the character values in d2[-1] if you convert to a matrix and the cbind the column character values. It creates a two-D lookup table to which you pass indices for both row and column at the same time. Then you can also pass a two-D matrix against it to generate a vector of outputs. (Can also use 3 or 4 or higher-D indexing with R arrays to which on=e would pass 3,4 or higher number column matrices):

( m2 <- sapply(d2[ , -1], as.character) )
#------
     2   3   5   7  
[1,] "H" "I" "J" "K"
[2,] "L" "M" "N" "P"

rownames(m2) <- as.character(d2[[1]])
m2
#--------
    2   3   5   7  
abc "H" "I" "J" "K"
def "L" "M" "N" "P"


(d1$val <-  m2[ cbind(as.character(d1[[1]]),as.character(d1[[2]])) ])
[1] "J" "L" "P" "I"

 d1
#--------
  keystr keynum val
1    abc      5   J
2    def      2   L
3    def      7   P
4    abc      3   I

Note the need to use as.character repeatedly, because those were factor columns. Better construction would have been to build your data.frames with stringsAsFactors=FALSE . Building the matrix will be fast and the indexing is likely to be very efficient.

You can reshape and join the data.frames using base R:

d1 <- read.table(text = 'keystr keynum 
1    abc      5
2    def      2
3    def      7 
4    abc      3', stringsAsFactors = FALSE)

d2 <- read.table(text = 'HD  2  3  5  7
1 abc  H  I  J  K
2 def  L  M  N  P', stringsAsFactors = FALSE, check.names = FALSE)

d2 <- reshape(d2, idvar = "HD", varying = names(d2)[-1], v.names = "val",
              times = names(d2)[-1], direction = "long")
merge(d1, d2, by.x = c("keystr", "keynum"), by.y = c("HD", "time"))
#>   keystr keynum val
#> 1    abc      3   I
#> 2    abc      5   J
#> 3    def      2   L
#> 4    def      7   P

I think OP was thinking right that mapply can provide him a direct solution. He is pretty close to a working solution with his mapply approach. Just logic to compare for the row selection has to be corrected and then paste0 to be used for column selection from d2 .

d1$val <- mapply(function(x,y)d2[d2$HD==x,paste0("X",y)],d1$keystr, d1$keynum)
d1
#    keystr keynum val
# 1    abc      5   J
# 2    def      2   L
# 3    def      7   P
# 4    abc      3   I
# 

Added a check.names = False to enable data.frame column names starting with numbers. Index with a cbind() matrix of two columns, the i, j pairs will be extracted all at once.

d1 <- data.frame("keystr"=c("abc","def","def","abc"), "keynum"=c(5,2,7,3))
d2 <- data.frame("HD"=c("abc","def"),
                 "2"=c("H","L"), "3"=c("I","M"),
                 "5"=c("J","N"), "7"=c("K","P"), check.names=FALSE)

d1$val <- mapply(function(kstr,knum) d2[cbind(match(kstr, d1$keystr),
                                              match(knum, names(d2)))],
                 d1$keystr,
                 d1$keynum)

  keystr keynum val
1    abc      5   J
2    def      2   L
3    def      7   P
4    abc      3   I

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM