简体   繁体   English

如何替换一个词,除非它紧跟在另一个词之后

[英]How to replace a word unless it immediately follows another word

How can I replace a word unless it immediately follows another word please.我怎样才能替换一个词,除非它紧跟在另一个词之后。 For example, for the vector vec below, how to replace the the in the third element with of the .比如下面的向量vec ,如何把第三个元素中的the替换成of the

The rule for the example below is to: replace 'the' unless it comes immediately after 'of'以下示例的规则是:替换“the”,除非它紧跟在“of”之后

vec <- c("time of the day", "word of the day", "time the day")

# This also replaces the 'the' when following 'of'
gsub("the", "of the", vec)
# "time of of the day" "word of of the day" "time of the day" 

The expected outcome is c("time of the day", "word of the day", "time of the day")预期结果是c("time of the day", "word of the day", "time of the day")

If your strings always only contain a single space between words, you may use如果您的字符串始终只包含单词之间的一个空格,您可以使用

gsub("(?<!\\bof\\s)the\\b", "of the", vec, perl=TRUE)
library(stringr)
str_replace_all(vec, "(?<!\\bof\\s)the\\b", "of the")

See the regex demo .请参阅正则表达式演示 The the whole word is replaced with of the only if the is NOT preceded with a whole word of and one single whitespace after it immediately before the . the单词被替换为of the on 只有当the前面没有一个完整的单词of并且在它之后紧接the .

However , there are a lot of scenarios when users type more than one space between words.但是,当用户在单词之间键入多个空格时,会出现很多情况。

Hence, a more generic solution is因此,一个更通用的解决方案

> gsub("\\bof the\\b(*SKIP)(?!)|\\bthe\\b", "of the", vec, perl=TRUE)
[1] "time of the day" "word of the day" "time of the day"

See the regex demo and the R demo online .请在线查看正则表达式演示R 演示

Details :详情

  • \bof the\b - matches of the as whole words \bof the\b - 作为整个单词of the
  • (*SKIP)(?!) - skips the match and the regex engine goes on to search for the next match from the failure position (*SKIP)(?!) - 跳过匹配项,正则表达式引擎继续搜索失败的下一个匹配项 position
  • | - or - 要么
  • \bthe\b - matches the whole word in any other context. \bthe\b - 匹配任何其他上下文中the整个单词。

If the whitespaces between of and the are not limitless, say 1 to 100, you can use a stringr based solution like如果ofthe之间的空格不是无限的,比如 1 到 100,您可以使用基于stringr的解决方案,例如

library(stringr)
vec <- c("time of the day", "word of the day", "time the day")
str_replace_all(vec, "\\b(?<!\\bof\\s{1,100})the\\b", "of the")
## => [1] "time of the day" "word of the day" "time of the day"

See this online R demo .请参阅此在线演示 R ICU regex flavor that is used in stringr regex fnctions allows the use of limiting quantifiers in the lookbehind patterns.stringr regex 函数中使用的ICU regex flavor允许在 lookbehind 模式中使用限制量词

See this regex demo (used the Java 8 option online as it also supports contrained-width lookbehind patterns.).请参阅此正则表达式演示(在线使用 Java 8 选项,因为它还支持宽度受限的后视模式。)。 Details:细节:

  • \b - a word boundary \b - 单词边界
  • (?<,\bof\s{1,100}) - a negative lookbehind that fails the match if there is a whole word of followed with one to 100 whitespace chars immediately before the current location (?<,\bof\s{1,100}) - 如果在当前位置之前有一个完整of则匹配失败的否定后视
  • the - a the string the - a the
  • \b - a word boundary. \b - 单词边界。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM