[英]Using mapply for indirect addressing in a data frame
With the following two data frames 使用以下两个数据框
> d1
keystr keynum
1 abc 5
2 def 2
3 def 7
4 abc 3
> d2
HD 2 3 5 7
1 abc H I J K
2 def L M N P
I would like to insert a column d1$val that uses the string in keystr
and the number in keynum
as indices in the d2
data frame. 我想插入使用的弦一柱D1 $ VAL
keystr
和数量keynum
作为指数d2
的数据帧。 The result should be: 结果应该是:
> d1
keystr keynum val
1 abc 5 J
2 def 2 L
3 def 7 P
4 abc 3 I
This should be an indirect application of mapply. 这应该是mapply的间接应用。 How can I make the code below
我怎样才能制作下面的代码
d1 <- data.frame("keystr"=c("abc","def","def","abc"), "keynum"=c(5,2,7,3))
d2 <- data.frame("HD"=c("abc","def"),
"2"=c("H","L"), "3"=c("I","M"),
"5"=c("J","N"), "7"=c("K","P"))
d1$val <- mapply(function(kstr,knum) d2[kstr,knum],
d1$keystr, d1$keynum )
access the entries in this (indirect) fashion? 以这种(间接)方式访问条目?
If you are not bounded to use mapply
you can do a join: 如果你没有限制使用
mapply
你可以加入:
Code: 码:
library(tidyverse)
d1 <- data.frame("keystr"=c("abc","def","def","abc"), "keynum"=c(5,2,7,3))
d2 <- data.frame("HD"=c("abc","def"),
"2"=c("H","L"), "3"=c("I","M"),
"5"=c("J","N"), "7"=c("K","P"))
d2 %>%
gather(keynum, value, -HD) %>%
mutate(keynum = as.numeric(gsub(keynum, pattern = "X", replacement = ""))) %>%
left_join(y = ., x = d1, by = c("keystr" = "HD", "keynum"))
Output: 输出:
keystr keynum value
1 abc 5 J
2 def 2 L
3 def 7 P
4 abc 3 I
We can transform the data frame and then conduct a merge by tidyr and dplyr . 我们可以转换数据框,然后通过tidyr和dplyr进行合并。
library(dplyr)
library(tidyr)
d3 <- d2 %>%
gather(keynum, letter, -HD) %>%
mutate(keynum = as.numeric(sub("X", "", keynum)))
d4 <- d1 %>%
left_join(d3, by = c("keystr" = "HD", "keynum"))
d4
# keystr keynum letter
# 1 abc 5 J
# 2 def 2 L
# 3 def 7 P
# 4 abc 3 I
DATA 数据
Notice that I set stringsAsFactors = FALSE
when creating the data frames. 请注意,我在创建数据帧时设置
stringsAsFactors = FALSE
。
d1 <- data.frame("keystr"=c("abc","def","def","abc"), "keynum"=c(5,2,7,3),
stringsAsFactors = FALSE)
d2 <- data.frame("HD"=c("abc","def"),
"2"=c("H","L"), "3"=c("I","M"),
"5"=c("J","N"), "7"=c("K","P"),
stringsAsFactors = FALSE)
You can use d1 columns to index the character values in d2[-1] if you convert to a matrix and the cbind the column character values. 如果转换为矩阵并且cbind列字符值,则可以使用d1列索引d2 [-1]中的字符值。 It creates a two-D lookup table to which you pass indices for both row and column at the same time.
它创建了一个二维查找表,您可以同时为行和列传递索引。 Then you can also pass a two-D matrix against it to generate a vector of outputs.
然后你也可以对它传递一个二维矩阵来生成一个输出向量。 (Can also use 3 or 4 or higher-D indexing with R arrays to which on=e would pass 3,4 or higher number column matrices):
(也可以使用3或4或更高的D索引与R数组,其中on = e将通过3,4或更高数量的列矩阵):
( m2 <- sapply(d2[ , -1], as.character) )
#------
2 3 5 7
[1,] "H" "I" "J" "K"
[2,] "L" "M" "N" "P"
rownames(m2) <- as.character(d2[[1]])
m2
#--------
2 3 5 7
abc "H" "I" "J" "K"
def "L" "M" "N" "P"
(d1$val <- m2[ cbind(as.character(d1[[1]]),as.character(d1[[2]])) ])
[1] "J" "L" "P" "I"
d1
#--------
keystr keynum val
1 abc 5 J
2 def 2 L
3 def 7 P
4 abc 3 I
Note the need to use as.character
repeatedly, because those were factor columns. 注意需要重复使用
as.character
,因为那些是因子列。 Better construction would have been to build your data.frames with stringsAsFactors=FALSE
. 更好的构造是使用
stringsAsFactors=FALSE
构建data.frames。 Building the matrix will be fast and the indexing is likely to be very efficient. 构建矩阵将很快并且索引可能非常有效。
You can reshape and join the data.frames using base R: 您可以使用基R重新整形和加入data.frames:
d1 <- read.table(text = 'keystr keynum
1 abc 5
2 def 2
3 def 7
4 abc 3', stringsAsFactors = FALSE)
d2 <- read.table(text = 'HD 2 3 5 7
1 abc H I J K
2 def L M N P', stringsAsFactors = FALSE, check.names = FALSE)
d2 <- reshape(d2, idvar = "HD", varying = names(d2)[-1], v.names = "val",
times = names(d2)[-1], direction = "long")
merge(d1, d2, by.x = c("keystr", "keynum"), by.y = c("HD", "time"))
#> keystr keynum val
#> 1 abc 3 I
#> 2 abc 5 J
#> 3 def 2 L
#> 4 def 7 P
I think OP
was thinking right that mapply
can provide him a direct solution. 我认为
OP
正确地认为mapply
可以为他提供直接的解决方案。 He is pretty close to a working solution with his mapply
approach. 他是非常接近与他的工作解决方案
mapply
方法。 Just logic to compare for the row selection has to be corrected and then paste0
to be used for column selection from d2
. 只需纠正比较行选择的逻辑,然后使用
paste0
从d2
选择列。
d1$val <- mapply(function(x,y)d2[d2$HD==x,paste0("X",y)],d1$keystr, d1$keynum)
d1
# keystr keynum val
# 1 abc 5 J
# 2 def 2 L
# 3 def 7 P
# 4 abc 3 I
#
Added a check.names = False to enable data.frame column names starting with numbers. 添加了check.names = False以启用以数字开头的data.frame列名。 Index with a
cbind()
matrix of two columns, the i, j
pairs will be extracted all at once. 索引具有两列的
cbind()
矩阵, i, j
对将一次全部提取。
d1 <- data.frame("keystr"=c("abc","def","def","abc"), "keynum"=c(5,2,7,3))
d2 <- data.frame("HD"=c("abc","def"),
"2"=c("H","L"), "3"=c("I","M"),
"5"=c("J","N"), "7"=c("K","P"), check.names=FALSE)
d1$val <- mapply(function(kstr,knum) d2[cbind(match(kstr, d1$keystr),
match(knum, names(d2)))],
d1$keystr,
d1$keynum)
keystr keynum val
1 abc 5 J
2 def 2 L
3 def 7 P
4 abc 3 I
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.