[英]Obtain word before/after a character with regex in Python
This is supposed to be easy using capturing groups, but I am not getting the correct words.使用捕获组应该很容易,但我没有得到正确的词。 I have been using the following:我一直在使用以下内容:
#Before
print(re.sub(r'\b([A-Za-z0-9]+)\b(?=\.?\s*(\&|\-|and))',r'\1','A. & B.',flags=re.IGNORECASE))
A. & B.
#After
print(re.sub(r'(\&|\-|and)\s*\b([A-Za-z0-9]+)\b',r'\2','A. & B.',flags=re.IGNORECASE))
A. B.
The string can be one of the following:字符串可以是以下之一:
A. - B.
A.-B.
A. & B.
A.&B.
A. AND B.
The idea is to get the word before/after of ampersand| hyphen | and
这个想法是在ampersand| hyphen | and
之前/之后获取单词ampersand| hyphen | and
ampersand| hyphen | and
ampersand| hyphen | and
I divided in two regex to get both words. ampersand| hyphen | and
我两个正则表达式分成来获得这两个词。 In this example, before would get just A
and the after B
.在这个例子中, before 只会得到A
和 after B
。
Why the capturing groups are not printing A
and B
in the previous examples?为什么前面例子中的捕获组没有打印A
和B
?
Thanks in advance :)提前致谢 :)
The string '\\1'
is octal for the decimal value 1 or 0x01 hex.对于十进制值 1 或 0x01 十六进制,字符串'\\1'
是八进制的。
>>> import re
>>> re.sub(r'\b([A-Za-z0-9]+)\b(?=\.?\s*(\&|\-|and))','\1','A. & B.',re.IGNORECASE)
'\x01. & B.'
Regex needs backreferences to be escaped.正则表达式需要转义反向引用。
Either of these replacement strings refer to capture group 1这些替换字符串中的任何一个都指的是捕获组 1
'\\\\r'
>>> import re
>>> re.sub(r'\b([A-Za-z0-9]+)\b(?=\.?\s*(\&|\-|and))','\\1','A. & B.',re.IGNORECASE)
'A. & B.'
Or,或者,
r'\\1'
>>> import re
>>> re.sub(r'\b([A-Za-z0-9]+)\b(?=\.?\s*(\&|\-|and))',r'\1','A. & B.',re.IGNORECASE)
'A. & B.'
Use re.search()
instead and group the desired words before and after one of the options &,-,and
:改用re.search()
并在选项&,-,and
之前和之后对所需的单词进行分组:
text = re.search('(\w+)\.+\s*[\&*\-*AND*and*]*\s*(\w+)\.+', 'A. & B.')
print (text.groups())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.