简体   繁体   English

如何使用 gsub() 准确替换字符串

[英]How do I replace the string exactly using gsub()

I have a corpus: txt = "a patterned layer within a microelectronic pattern."我有一个语料库:txt =“微电子图案中的图案层。” I would like to replace the term "pattern" exactly by "form", I try to write a code:我想用“形式”完全替换术语“模式”,我尝试编写一个代码:

txt_replaced = gsub("pattern","form",txt)

However, the responsed corpus in txt_replaced is: "a formed layer within a microelectronic form."然而,txt_replaced 中的响应语料库是:“微电子形式中的形成层”。

As you can see, the term "patterned" is wrongly replaced by "formed" because parts of characteristics in "patterned" matched to "pattern".如您所见,术语“patterned”被错误地替换为“formed”,因为“patterned”中的部分特征与“pattern”匹配。

I would like to query that if I can replace the string exactly using gsub()?我想查询是否可以使用 gsub() 完全替换字符串? That is, only the term with exactly match should be replaced.也就是说,只有完全匹配的术语才应该被替换。

I thirst for a responsed as below: "a patterned layer within a microelectronic form."我渴望得到如下回应:“微电子形式中的图案层。”

Many thanks!非常感谢!

As @koshke noted, a very similar question has been answered before (by me).正如@koshke 所指出的,之前(我)已经回答了一个非常相似的问题。 ...But that was grep and this is gsub , so I'll answer it again: ...但那是grep ,这是gsub ,所以我会再次回答:

"\\<" is an escape sequence for the beginning of a word, and ">" is the end. "\\<" 是单词开头的转义序列,">" 是结尾。 In R strings you need to double the backslashes, so:在 R 字符串中,您需要将反斜杠加倍,因此:

txt <- "a patterned layer within a microelectronic pattern."
txt_replaced <- gsub("\\<pattern\\>","form",txt)
txt_replaced
# [1] "a patterned layer within a microelectronic form."

Or, you could use \\b instead of \\< and \\> .或者,您可以使用\\b而不是\\<\\> \\b matches a word boundary so it can be used at both ends> \\b匹配一个词边界,所以它可以在两端使用>

txt_replaced <- gsub("\\bpattern\\b","form",txt)

Also note that if you want to replace only ONE occurrence, you should use sub instead of gsub .另请注意,如果您只想替换一次,则应使用sub而不是gsub

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM