[英]How to replace a word unless it immediately follows another word
How can I replace a word unless it immediately follows another word please.我怎样才能替换一个词,除非它紧跟在另一个词之后。 For example, for the vector
vec
below, how to replace the the
in the third element with of the
.比如下面的向量
vec
,如何把第三个元素中的the
替换成of the
。
The rule for the example below is to: replace 'the' unless it comes immediately after 'of'以下示例的规则是:替换“the”,除非它紧跟在“of”之后
vec <- c("time of the day", "word of the day", "time the day")
# This also replaces the 'the' when following 'of'
gsub("the", "of the", vec)
# "time of of the day" "word of of the day" "time of the day"
The expected outcome is c("time of the day", "word of the day", "time of the day")
预期结果是
c("time of the day", "word of the day", "time of the day")
If your strings always only contain a single space between words, you may use如果您的字符串始终只包含单词之间的一个空格,您可以使用
gsub("(?<!\\bof\\s)the\\b", "of the", vec, perl=TRUE)
library(stringr)
str_replace_all(vec, "(?<!\\bof\\s)the\\b", "of the")
See the regex demo .请参阅正则表达式演示。 The
the
whole word is replaced with of the
only if the
is NOT preceded with a whole word of
and one single whitespace after it immediately before the
. the
单词被替换为of the
on 只有当the
前面没有一个完整的单词of
并且在它之后紧接the
.
However , there are a lot of scenarios when users type more than one space between words.但是,当用户在单词之间键入多个空格时,会出现很多情况。
Hence, a more generic solution is因此,一个更通用的解决方案是
> gsub("\\bof the\\b(*SKIP)(?!)|\\bthe\\b", "of the", vec, perl=TRUE)
[1] "time of the day" "word of the day" "time of the day"
See the regex demo and the R demo online .请在线查看正则表达式演示和R 演示。
Details :详情:
\bof the\b
- matches of the
as whole words \bof the\b
- 作为整个单词of the
(*SKIP)(?!)
- skips the match and the regex engine goes on to search for the next match from the failure position (*SKIP)(?!)
- 跳过匹配项,正则表达式引擎继续搜索失败的下一个匹配项 position|
- or \bthe\b
- matches the
whole word in any other context. \bthe\b
- 匹配任何其他上下文中the
整个单词。 If the whitespaces between of
and the
are not limitless, say 1 to 100, you can use a stringr
based solution like如果
of
和the
之间的空格不是无限的,比如 1 到 100,您可以使用基于stringr
的解决方案,例如
library(stringr)
vec <- c("time of the day", "word of the day", "time the day")
str_replace_all(vec, "\\b(?<!\\bof\\s{1,100})the\\b", "of the")
## => [1] "time of the day" "word of the day" "time of the day"
See this online R demo .请参阅此在线演示 R 。 ICU regex flavor that is used in
stringr
regex fnctions allows the use of limiting quantifiers in the lookbehind patterns.在
stringr
regex 函数中使用的ICU regex flavor允许在 lookbehind 模式中使用限制量词。
See this regex demo (used the Java 8 option online as it also supports contrained-width lookbehind patterns.).请参阅此正则表达式演示(在线使用 Java 8 选项,因为它还支持宽度受限的后视模式。)。 Details:
细节:
\b
- a word boundary \b
- 单词边界(?<,\bof\s{1,100})
- a negative lookbehind that fails the match if there is a whole word of
followed with one to 100 whitespace chars immediately before the current location (?<,\bof\s{1,100})
- 如果在当前位置之前有一个完整of
则匹配失败的否定后视the
- a the
string the
- a the
\b
- a word boundary. \b
- 单词边界。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.