[英]R: populate data.frame within function in mapply
A data.frame df1
is queried (fuzzy match) against another data.frame df2
with agrep
.使用 agrep 针对另一个 data.frame
df2
查询(模糊匹配)一个agrep
df1
。 Via iterating over its output (a list called matches
holding the row number of the respective matches in df2
), df1
is populated with affiliated values from df2
.通过迭代其 output(称为
matches
项的列表,其中包含df2
中相应匹配项的行号), df1
填充了来自df2
的关联值。 The goal is a function that is passed to mapply
;目标是传递给 mapply 的
mapply
; however, in all my attempts df1
remains unchanged.然而,在我所有的尝试中,
df1
保持不变。
In a for-loop, the code works as expected and populates df1
with the affiliated variables from df2
.在 for 循环中,代码按预期工作,并使用来自
df2
的附属变量填充df1
。 Still, I would be interested how to solve this with a function that is passed to mapply
.尽管如此,我还是对如何使用传递给
mapply
的 function 来解决这个问题感兴趣。
First, the two data.frames:首先,两个data.frames:
df1 <- structure(list(Species = c("Alisma plantago-aquatica", "Alnus glutinosa",
"Carex davalliana", "Carex echinata",
"Carex elata"),
CheckPoint = c(NA, NA, NA, NA, NA),
L = c(NA, NA, NA, NA, NA),
R = c(NA, NA, NA, NA, NA),
K = c(NA, NA, NA, NA, NA)),
row.names = c(NA, 5L), class = "data.frame")
df2 <- structure(list(Species = c("Alisma gramineum", "Alisma lanceolatum",
"Alisma plantago-aquatica", "Alnus glutinosa",
"Alnus incana", "Alnus viridis",
"Carex davalliana", "Carex depauperata",
"Carex diandra", "Carex digitata",
"Carex dioica", "Carex distans",
"Carex disticha", "Carex echinata",
"Carex elata"),
L = c(7L, 7L, 7L, 5L, 6L, 7L, 9L, 4L, 8L, 3L, 9L, 9L, 8L,
8L, 8L),
R = c(7L, 7L, 5L, 5L, 4L, 3L, 4L, 7L, 6L, NA, 4L, 6L, 6L,
NA, NA),
K = c(6L, 2L, NA, 3L, 5L, 4L, 4L, 2L, 7L, 4L, NA, 3L, NA,
3L, 2L)),
row.names = seq(1:15), class = "data.frame")
Then, fuzzy match by Species
:然后,按
Species
进行模糊匹配:
matches <- lapply(df1$Species, agrep, x = df2$Species, value = FALSE,
max.distance = c(deletions = 0,
insertions = 1,
substitutions = 1))
Populating df1
with the values from df2
works as expected:使用
df2
中的值填充df1
可以按预期工作:
for (i in 1:dim(df1)[1]){
df1[i, 2:5] <- df2[matches[[i]], ]
}
In contrast to my approach with mapply
that does return the correct values, although as a list of dissasembled values that are never written into df1
.与我使用
mapply
的方法相反,它确实返回了正确的值,尽管作为一个从未写入df1
的反汇编值列表。 No combination (with or without return(df1)
, writing it into another variable nor desparate attempts with the state of SIMPLIFY
or USE.NAMES
) yielded the desired results.没有任何组合(有或没有
return(df1)
,将其写入另一个变量,也没有使用SIMPLIFY
或USE.NAMES
的 state 的绝望尝试)产生所需的结果。
populatedf1 <- function(matches, index){
df1[index, 2:5] <- df2[matches, ]
#return(df1)
}
mapply(populatedf1, matches, seq_along(matches), SIMPLIFY = FALSE,
USE.NAMES = FALSE)
Would be great if someone knows the solution or could point me into a certain direction, thanks: :)如果有人知道解决方案或可以为我指明某个方向,那就太好了,谢谢::)
Actually, you would not need any loop here ( for
or mapply
) if you replace lapply
with sapply
(so that it returns a vector instead of list) and then do a direct assignment.实际上,如果您将
lapply
替换为sapply
(以便它返回向量而不是列表),然后执行直接分配,则此处不需要任何循环( for
或mapply
)。
matches <- sapply(df1$Species, agrep, x = df2$Species, value = FALSE,
max.distance = c(deletions = 0,
insertions = 1,
substitutions = 1))
df1[, 2:5] <- df2[matches,]
df1
# Species CheckPoint L R K
#1 Alisma plantago-aquatica Alisma plantago-aquatica 7 5 NA
#2 Alnus glutinosa Alnus glutinosa 5 5 3
#3 Carex davalliana Carex davalliana 9 4 4
#4 Carex echinata Carex echinata 8 NA 3
#5 Carex elata Carex elata 8 NA 2
As far as your approach is concerned you can use Map
or mapply
with SIMPLIFY = FALSE
and bring the list of dataframes into one dataframe using do.call
and rbind
and then assign.就您的方法而言,您可以使用
Map
或使用SIMPLIFY = FALSE
mapply
并使用do.call
和rbind
将数据帧列表放入一个 dataframe 中,然后分配。
df1[, 2:5] <- do.call(rbind, Map(populatedf1, matches, seq_along(matches)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.