简体   繁体   English

数据集中的模式匹配

[英]Pattern matching in dataset

been struggling with this for a while. 一直为此苦苦挣扎。

I have a dataset with two columns, a Description column and the other is the pattern column that I am trying to match against the description column.If the corresponding pattern exists in the Description column, it needs to be replaced by an asterisk 我有一个包含两列的数据集,一个描述列,另一个是我要与描述列匹配的模式列,如果在描述列中存在相应的模式,则需要用星号替换

For instance, if the Description is ABCDEisthedescription and the Pattern is ABCDE, then the new description should *isthedescription 例如,如果“描述”为ABCDEisthethescription,而“模式”为ABCDE,则新描述应为“ * thethedescription”

I tried the following data$NewDescription <- gsub(data$pattern,"\\\\*",Data$Description ) 我尝试了以下data$NewDescription <- gsub(data$pattern,"\\\\*",Data$Description )

since there is more than one row in the dataset, it throws an error ( a warning rather) "argument 'pattern' has length > 1 and only the first element will be used" 由于数据集中有多个行,因此会引发错误(警告),“参数'pattern'的长度> 1,并且仅使用第一个元素”

Any help will be hugely appreciated. 任何帮助将不胜感激。

You can use an mapply here to apply the function to each row. 您可以在此处使用mapply将功能应用于每一行。

#sample data
data<-data.frame(
    pattern=c("ABCDE","XYZ"), 
    Description=c("ABCDEisthedescription", "sillyXYZvalue")
)

Now use mapply 现在使用mapply

mapply(function(p,d) gsub(p, "\\*", d, fixed=T), data$pattern, data$Description)
# [1] "\\*isthedescription" "silly\\*value" 

Additionally, 另外,

Patterns <- paste0(
  sample(LETTERS[1:4],500,replace=TRUE),
  sample(LETTERS[1:4],500,replace=TRUE),
  sample(LETTERS[1:4],500,replace=TRUE),
  sample(LETTERS[1:4],500,replace=TRUE))
##
Desc <- paste0(Patterns,"isthedescription")
Ptrn <- sample(Patterns,500)
##
Data <- data.frame(
  Description=Desc,
  Pattern=Ptrn,
  stringsAsFactors=FALSE)
##
newDesc <- sapply(1:nrow(Data), function(X){
  if(substr(Data$Description[X],1,4)==Data$Pattern[X]){
    gsub(Data$Pattern[X],"*",Data$Description[X])
  } else {
    Data$Description[X]
  }
})

@MrFlick's approach seems more concise though. @MrFlick的方法似乎更简洁。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM