简体   繁体   English

R:递归* ply / plyr函数; 用于循环更换

[英]R: Recursive *ply/plyr function; for loop replacement

I am trying to replace a for loop with a *ply type function. 我正在尝试用* ply类型的函数替换for循环。

The issue I am having is that I'm not sure how to update the same data repetitively. 我遇到的问题是我不确定如何重复更新相同的数据。

Here is some sample data (I know this specific example could be done other ways but this is just for simplicity -- my real example is much more complicated): 这是一些示例数据(我知道可以通过其他方法完成此特定示例,但这只是为了简单起见-我的实际示例要复杂得多):

sample_pat_rep <-  data.frame(matrix(NA, ncol=2, nrow=3, dimnames=list(c(), c("Pattern","Replacement"))), stringsAsFactors=FALSE)
sample_pat_rep[1,] <-  c("a","A")
sample_pat_rep[2,] <-  c("b","B")
sample_pat_rep[3,] <-  c("c","C")

sample_strings <-  data.frame(matrix(NA, ncol=2, nrow=3, dimnames=list(c(), c("Original","Fixed"))), stringsAsFactors=FALSE)
sample_strings[1,] <-  c("aaaaaaaa bbbbbbbb cccccccc","aaaaaaaa bbbbbbbb cccccccc")
sample_strings[2,] <-  c("aAaAaAaA bBbBbBbB cCcCcCcC","aAaAaAaA bBbBbBbB cCcCcCcC")
sample_strings[3,] <-  c("AaAaAaAa BbBbBbBb CcCcCcCc","AaAaAaAa BbBbBbBb CcCcCcCc")

Here is a for loop version: 这是一个for循环版本:

sample_strings1 <- sample_strings
for (i in 1:nrow(sample_pat_rep))
{
  sample_strings1[,c("Fixed")] <- gsub(sample_pat_rep[i,c("Pattern")], sample_pat_rep[i,c("Replacement")], sample_strings1[,c("Fixed")],ignore.case = TRUE)
} 

When I try to replicate this with adply, it will not update the data -- it essential replicates and rbinds it. 当我尝试用adply复制它时,它不会更新数据-它必不可少地复制和绑定数据。

sample_strings2 <- adply(.data=sample_pat_rep, .margins=1, .fun = function(x,data){

data[,c("Fixed")] <- gsub(x[,c("Pattern")], x[,c("Replacement")], data[,c("Fixed")],ignore.case = TRUE)
return(data)

}, data=sample_strings, .expand = FALSE, .progress = "none", .inform = FALSE, .parallel = FALSE, .paropts = NULL)

I'm sure there is an easy fix. 我敢肯定有一个简单的解决方法。 I looked at Rapply but it wasn't clear that this was the fix. 我看了看拉普利(Rapply),但不清楚是否可以解决问题。

Maybe write a function that makes the call?? 也许写一个函数来进行调用?? Use Rapply?? 使用Rapply?

Thanks ahead of time! 提前谢谢!


UPDATE: NEW DATA 更新:新数据

This is closer to an actual scenario. 这更接近实际情况。 The matches are dynamic and based off a external system. 匹配是动态的,基于外部系统。 I am trying to avoid overly-complicated regex or nested if elses. 我试图避免过于复杂的正则表达式或嵌套其他情况。

library(plyr)

sample_match <-  data.frame(matrix(NA, ncol=1, nrow=3, dimnames=list(c(), c("Match"))), stringsAsFactors=FALSE)
sample_match[1,] <-  c("dog")
sample_match[2,] <-  c("cat")
sample_match[3,] <-  c("bear")

sample_strings <-  data.frame(matrix(NA, ncol=2, nrow=3, dimnames=list(c(), c("Sentence","Has_Animal"))), stringsAsFactors=FALSE)
sample_strings[1,] <-  c("This person only has a cat",0)
sample_strings[2,] <-  c("This person has a cat and a dog",0)
sample_strings[3,] <-  c("This person has no animals",0)

sample_strings1 <- sample_strings
for (i in 1:nrow(sample_match))
{
 sample_strings1[,c("Has_Animal")] <- ifelse(grepl(sample_match[i,c("Match")], sample_strings1[,c("Sentence")]), 1,sample_strings1[,c("Has_Animal")])
} 


sample_strings2 <- adply(.data=sample_match, .margins=1, .fun = function(x,data){

 data[,c("Has_Animal")] <- ifelse(grepl(x[,c("Match")], data[,c("Sentence")]), 1,data[,c("Has_Animal")])
 return(data)

}, data=sample_strings, .expand = FALSE, .progress = "none", .inform = FALSE, .parallel = FALSE, .paropts = NULL)

Update: Misunderstood the question, that sample_strings2 was the required result. 更新:误解了这个问题,即sample_strings2是必需的结果。 Updated the answer that gives sample_strings1 now, which IIUC is what's required. 现在更新了给出sample_strings1的答案,即需要IIUC。

Here's a solution using base : 这是使用base的解决方案:

pattern = paste(sample_match$Match, collapse="|")
transform(sample_strings, Has_Animal = grepl(pattern, Sentence)*1L)

#                          Sentence Has_Animal
# 1      This person only has a cat          1
# 2 This person has a cat and a dog          1
# 3      This person has no animals          0

If you don't want to match words that contain the pattern within, for ex: concatenate contains cat , then you can use the regex \\b for word boundary. 如果您不希望匹配包含模式的单词,例如: concatenate contains cat ,则可以将正则表达式\\b用于单词边界。

pattern = paste(paste("\\b", sample_match$Match, "\\b", sep=""), collapse="|")
grepl(pattern, c("cat", "concatenate"))
# [1] TRUE FALSE

Here is a straight plyr approach to the question: 这里是一个直plyr方法的问题:

ddply(sample_strings,.(Sentence),function(x,ref = sample_match) {
  any(unlist(strsplit(x[["Sentence"]]," ")) %in% ref[[1]])
  })

                         Sentence    V1
1 This person has a cat and a dog  TRUE
2      This person has no animals FALSE
3      This person only has a cat  TRUE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM