简体   繁体   English

R:在 mapply 中的 function 中填充 data.frame

[英]R: populate data.frame within function in mapply

A data.frame df1 is queried (fuzzy match) against another data.frame df2 with agrep .使用 agrep 针对另一个 data.frame df2查询(模糊匹配)一个agrep df1 Via iterating over its output (a list called matches holding the row number of the respective matches in df2 ), df1 is populated with affiliated values from df2 .通过迭代其 output(称为matches项的列表,其中包含df2中相应匹配项的行号), df1填充了来自df2的关联值。 The goal is a function that is passed to mapply ;目标是传递给 mapply 的mapply however, in all my attempts df1 remains unchanged.然而,在我所有的尝试中, df1保持不变。

In a for-loop, the code works as expected and populates df1 with the affiliated variables from df2 .在 for 循环中,代码按预期工作,并使用来自df2的附属变量填充df1 Still, I would be interested how to solve this with a function that is passed to mapply .尽管如此,我还是对如何使用传递给mapply的 function 来解决这个问题感兴趣。

First, the two data.frames:首先,两个data.frames:

df1 <- structure(list(Species = c("Alisma plantago-aquatica", "Alnus glutinosa",
                                  "Carex davalliana", "Carex echinata",
                                  "Carex elata"),
                      CheckPoint = c(NA, NA, NA, NA, NA),
                      L = c(NA, NA, NA, NA, NA),
                      R = c(NA, NA, NA, NA, NA),
                      K = c(NA, NA, NA, NA, NA)),
                 row.names = c(NA, 5L), class = "data.frame")

df2 <- structure(list(Species = c("Alisma gramineum", "Alisma lanceolatum",
                                  "Alisma plantago-aquatica", "Alnus glutinosa",
                                  "Alnus incana", "Alnus viridis",
                                  "Carex davalliana", "Carex depauperata",
                                  "Carex diandra", "Carex digitata",
                                  "Carex dioica", "Carex distans",
                                  "Carex disticha", "Carex echinata",
                                  "Carex elata"),
                      L = c(7L, 7L, 7L, 5L, 6L, 7L, 9L, 4L, 8L, 3L, 9L, 9L, 8L,
                            8L, 8L),
                      R = c(7L, 7L, 5L, 5L, 4L, 3L, 4L, 7L, 6L, NA, 4L, 6L, 6L,
                            NA, NA),
                      K = c(6L, 2L, NA, 3L, 5L, 4L, 4L, 2L, 7L, 4L, NA, 3L, NA,
                            3L, 2L)),
                 row.names = seq(1:15), class = "data.frame")

Then, fuzzy match by Species :然后,按Species进行模糊匹配:

matches <- lapply(df1$Species, agrep, x = df2$Species, value = FALSE,
                 max.distance = c(deletions = 0,
                                  insertions = 1,
                                  substitutions = 1))

Populating df1 with the values from df2 works as expected:使用df2中的值填充df1可以按预期工作:

for (i in 1:dim(df1)[1]){
  df1[i, 2:5] <- df2[matches[[i]], ]
  }

In contrast to my approach with mapply that does return the correct values, although as a list of dissasembled values that are never written into df1 .与我使用mapply的方法相反,它确实返回了正确的值,尽管作为一个从未写入df1的反汇编值列表。 No combination (with or without return(df1) , writing it into another variable nor desparate attempts with the state of SIMPLIFY or USE.NAMES ) yielded the desired results.没有任何组合(有或没有return(df1) ,将其写入另一个变量,也没有使用SIMPLIFYUSE.NAMES的 state 的绝望尝试)产生所需的结果。

populatedf1 <- function(matches, index){
    df1[index, 2:5] <- df2[matches, ]
  #return(df1)
  }

mapply(populatedf1, matches, seq_along(matches), SIMPLIFY = FALSE,
              USE.NAMES = FALSE)

Would be great if someone knows the solution or could point me into a certain direction, thanks: :)如果有人知道解决方案或可以为我指明某个方向,那就太好了,谢谢::)

Actually, you would not need any loop here ( for or mapply ) if you replace lapply with sapply (so that it returns a vector instead of list) and then do a direct assignment.实际上,如果您将lapply替换为sapply (以便它返回向量而不是列表),然后执行直接分配,则此处不需要任何循环( formapply )。

matches <- sapply(df1$Species, agrep, x = df2$Species, value = FALSE,
                   max.distance = c(deletions = 0,
                                    insertions = 1,
                                   substitutions = 1))

df1[, 2:5] <- df2[matches,]
df1

#                   Species               CheckPoint L  R  K
#1 Alisma plantago-aquatica Alisma plantago-aquatica 7  5 NA
#2          Alnus glutinosa          Alnus glutinosa 5  5  3
#3         Carex davalliana         Carex davalliana 9  4  4
#4           Carex echinata           Carex echinata 8 NA  3
#5              Carex elata              Carex elata 8 NA  2

As far as your approach is concerned you can use Map or mapply with SIMPLIFY = FALSE and bring the list of dataframes into one dataframe using do.call and rbind and then assign.就您的方法而言,您可以使用Map或使用SIMPLIFY = FALSE mapply并使用do.callrbind将数据帧列表放入一个 dataframe 中,然后分配。

df1[, 2:5] <- do.call(rbind, Map(populatedf1, matches, seq_along(matches)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM