[英]Matching repeating words in a row by regex
I would like to find a replace repeating words in the string, but only if the are next to each other or separated by a space.我想在字符串中找到一个替换重复单词,但前提是它们彼此相邻或由空格分隔。 For example:例如:
"<number> <number>" -> "<number>"
"<number><number>"-> "<number>"
but not但不是
"<number> test <number>" -> "<number> test <number>"
I have tried this:我试过这个:
import re
re.sub(f"(.+)(?=\<number>+)","", label).strip()
but it would give the wrong result for the last test option.但它会为最后一个测试选项提供错误的结果。
Could you please help me with that?你能帮我解决这个问题吗?
You can use您可以使用
re.sub(r"(<number>)(?:\s*<number>)+",r"\1", label).strip()\
See the regex demo .请参阅正则表达式演示。 Details :详情:
(<number>)
- Group 1: a <number>
string (<number>)
- 第 1 组:一个<number>
字符串(?:\s*<number>)+
- one or more occurrences of the following sequence of patterns: (?:\s*<number>)+
- 一次或多次出现以下模式序列:
\s*
- zero or more whitespaces \s*
- 零个或多个空格<number>
- a <number>
string <number>
- <number>
字符串The \1
is the replacement backreference to the Group 1 value. \1
是对 Group 1 值的替换反向引用。
import re
text = '"<number> <number>", "<number><number>", not "<number> test <number>"'
print( re.sub(r"(<number>)(?:\s*<number>)+", r'\1', text) )
# => "<number>", "<number>", not "<number> test <number>"
You can use您可以使用
(<number>\s*){2,}
(<number>\s*)
Capture group 1 , match <number>
followed by optional chars (<number>\s*)
捕获组 1 ,匹配<number>
后跟可选字符{2,}
Repeat 2 or more times {2,}
重复 2 次或更多次In the replacement use group 1.在替换使用组 1 中。
import re
strings = [
"<number> <number>",
"<number><number>",
"not <number> test <number>",
" <number> <number><number> <number> test"
]
for s in strings:
print(re.sub(r"(<number>\s*){2,}", r"\1", s))
Output Output
<number>
<number>
not <number> test <number>
<number> test
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.