在文件中查找和替换带有标点符号的文本

Question

嗨，朋友们，我在这里问了一个相关问题。这里的问题是未检测到标点符号的txt （关键字）。我试图使答案通用但失败了。

基本上我有一个带标点符号和不带标点符号的txt （关键字），我需要在文件中搜索toSearch 。

对于 Ex，这些是我的文件toSearch的内容

 [1]'Nokia. Okay. R: Samsung R: Samsung M: And you have? R: I have Micromax'
 [2]'M: Okay, you have taken car. R: I have (Mahindra Scorpio and Mahindra's) this Duro DZ.M: Okay.'
 [3]'M: What is your age ? R: 32 years R: My name is "Nitish". I have Interior designing business.'
 [4]'R: 3rd, Not extra spicy. R: 4th, Fresh. R: 5th, Variety. R: 6th, Hygienic environment'
 [5]'How you feel? How it should be? We will move forward, if there we have to make an ideal'
 [6]'What is the strength of your organisation? How many people a re working.'
 [7]'R: Read newspaper R:Had breakfast with family.'

和txt （关键字）是。 我使用#@来分隔关键字，因为我不能使用, （逗号）。

 txt<-"R: Samsung R: Samsung M:#@I have (Mahindra Scorpio and Mahindra's)#@R: 32 years R: My name is "Nitish"#@R: 4th, Fresh. R: 5th, Variety#@How you feel? How it should be?

我预期的 o/p 是找到出现并用下划线_替换关键字中的空格

 [1]'Nokia. Okay. R:_Samsung_R:_Samsung_M: And you have? R: I have Micromax'
 [2]'M: Okay, you have taken car. R: I_have_(Mahindra_Scorpio_and_Mahindra's) this Duro DZ.M: Okay.'
 [3]'M: What is your age ? R:_32_years_R:_My_name_is_"Nitish". I have Interior designing business.'
 [4]'R: 3rd, Not extra spicy. R:_4th,_Fresh._R:_5th,_Variety. R: 6th, Hygienic environment'
 [5]'How_you_feel?_How_it_should_ be? We will move forward, if there we have to make an ideal'
 [6]'What is the strength of your organisation? How many people a re working.'
 [7]'R: Read newspaper R:Had breakfast with family.'

如果你们不明白这是简单的查找和替换文本（FART）功能。只有空格被替换为_

我曾尝试使用这个正则表达式

for(i in 1:length(txt))
{
    #finding the first word of the keyword 
    start <- head(strsplit(txt, split=" ")[[i]], 1)  
    n <- stri_stats_latex(txt[i])[4] 

    #all possible occurrences for the keywords in the text
    o<-unlist(regmatches(toSearch,gregexpr(paste0(start,"(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,",n-1,"}"),toSearch,ignore.case=TRUE)))  

    #exact match with the result
    p<-which(!is.na(pmatch(txt,o)))  

    #replace the keywords in the text file.
    text<-as.character(replace_all(text,txt[p],str_replace_all(txt[p]))) 
}

Answer 1

因此，在使用正则表达式时，您必须非常小心标点符号。 如果您要进行精确匹配，最好不要使用正则表达式并为grep设置fixed=T 。 因此，您可以使用Reduce查找和替换

#input data
target<-c("Nokia. Okay. R: Samsung R: Samsung M: And you have? R: I have Micromax", 
"M: Okay, you have taken car. R: I have (Mahindra Scorpio and Mahindra's) this Duro DZ.M: Okay.", 
"M: What is your age ? R: 32 years R: My name is \"Nitish\". I have Interior designing business.", 
"R: 3rd, Not extra spicy. R: 4th, Fresh. R: 5th, Variety. R: 6th, Hygienic environment", 
"How you feel? How it should be? We will move forward, if there we have to make an ideal", 
"What is the strength of your organisation? How many people a re working.", 
"R: Read newspaper R:Had breakfast with family.")

kw<-c("R: Samsung R: Samsung M:", "I have (Mahindra Scorpio and Mahindra's)", 
"R: 32 years R: My name is \"Nitish\"", "R: 4th, Fresh. R: 5th, Variety", 
"How you feel? How it should be?")

而这里我们使用reduce来依次替换目标文本中的每个关键字

Reduce(function (t,kw) gsub(kw, gsub(" ","_",kw), t, fixed=T), 
    kw, init=target, accumulate=F)

# [1] "Nokia. Okay. R:_Samsung_R:_Samsung_M: And you have? R: I have Micromax"                         
# [2] "M: Okay, you have taken car. R: I_have_(Mahindra_Scorpio_and_Mahindra's) this Duro DZ.M: Okay." 
# [3] "M: What is your age ? R:_32_years_R:_My_name_is_\"Nitish\". I have Interior designing business."
# [4] "R: 3rd, Not extra spicy. R:_4th,_Fresh._R:_5th,_Variety. R: 6th, Hygienic environment"          
# [5] "How_you_feel?_How_it_should_be? We will move forward, if there we have to make an ideal"        
# [6] "What is the strength of your organisation? How many people a re working."                       
# [7] "R: Read newspaper R:Had breakfast with family."

我希望这对你的放屁有所帮助。

Answer 2

一个应该适用于更大问题的简化示例。

toSearch <- c("this is some text","something else to search")
txt <- c("is some#@else to")
txt <- strsplit(txt,"#@")[[1]]
txtundsc <- gsub("\\s+","_",txt)

for(i in seq_along(txt)) { toSearch <- gsub(txt[i],txtundsc[i],toSearch) }
toSearch
# [1] "this is_some text"        "something else_to search"

在文件中查找和替换带有标点符号的文本

问题描述

2 个解决方案

解决方案1
2 已采纳 2014-05-27 05:28:21

解决方案2
0 2014-05-27 05:29:39

在文件中查找和替换带有标点符号的文本

问题描述

2 个解决方案

解决方案1 2 已采纳 2014-05-27 05:28:21

解决方案2 0 2014-05-27 05:29:39

解决方案1
2 已采纳 2014-05-27 05:28:21

解决方案2
0 2014-05-27 05:29:39