简体   繁体   English

字符之间的python正则表达式匹配

[英]Python regex match between characters

I'm doing a pretty straightforward regex in python and seeing some odd behavior when I use the "or" operator. 我在python中做一个非常简单的正则表达式,当我使用“或”运算符时看到一些奇怪的行为。

I am trying to parse the following: 我正在尝试解析以下内容:

>> str = "blah [in brackets] stuff"

so that it returns: 这样它就返回:

>> ['blah', 'in brackets', 'stuff']

To match the text between brackets, I am using look behind and look ahead, ie: 为了使方括号之间的文本匹配,我使用了“向后看”和“向前看”,即:

>> '(?<=\[).*?(?=\])'

If used alone this does indeed capture the text in brackets: 如果单独使用,确实可以捕获括号中的文本:

>> re.findall( '(?<=\[).*?(?=\])' , str )
>> ['in brackets']

But when I combine the or operator to parse the strings between spaces, the bracket-match somehow breaks down: 但是,当我结合使用or运算符来分析空格之间的字符串时,括号匹配以某种方式分解:

>> [x for x in re.findall( '(?<=\[).*?(?=\])|.*?[, ]' , str ) if x!=' ' ] 
>> ['blah', '[in ', 'brackets] ']

For the life of me I can't understand this behavior. 对于我的一生,我无法理解这种行为。 Any help would be appreciated. 任何帮助,将不胜感激。

Thanks! 谢谢!

You can do: 你可以做:

>>> s = "blah [in brackets] stuff"

>>> re.findall(r'\b\w+\s*\w+\b', s)
['blah', 'in brackets', 'stuff']

For those interested, this is the successful regex that I ended up going with. 对于那些感兴趣的人,这是我最终获得的成功的正则表达式。 There is probably a more elegant solution somewhere but this works: 某处可能有一个更优雅的解决方案,但这可行:

>>> s = "blah 2.0 stuff 1 1 0 [in brackets] more stuff [1]"

>>> brackets_re = '(?<=\[).*?(?=\])'
>>> space_re = '[-\.\w]+(?= )'
>>> my_re = brackets_re + '|' + space_re

>>> re.findall(my_re, s)
['blah', '2.0', 'stuff', '1', '1', '0', 'in brackets', 'more', 'stuff', '1']

If you are looking for an easy way to do this, then use this. 如果您正在寻找一种简便的方法来进行此操作,请使用此方法。 Note : I replaced str with string as 'str' is a built-in function of python. 注意:我将str替换为字符串,因为“ str”是python的内置函数。

import re
string = "blah [in brackets] stuff"
f = re.findall(r'\w+\w', string)
print(f)

Output: ['blah', 'in brackets', 'stuff'] 输出:['blah','放在括号中,'stuff']

The answers so far don't take into account that you may have more than 2 words inside the brackets, or even one word. 到目前为止,答案没有考虑到括号内可能有两个以上的单词,甚至一个单词。 The following regex will split on the brackets and any leading or trailing white space of the brackets. 以下正则表达式将在方括号和方括号的任何前导或尾随空白处分割。 It will also work if there are more bracketed content in the string. 如果字符串中包含更多括号内容,它也将起作用。

s = "blah [in brackets] stuff"

s = re.split(r'\s*\[|\]\s*', s) # note the 'or' operator is used and literal opening and closing brackets '\[' and '\]'

print(s)

output: ['blah', 'in brackets', 'stuff'] 输出: ['blah', 'in brackets', 'stuff']

And an example using a string with different amounts of words inside brackets and using several sets of brackets: 还有一个示例,该示例在方括号内使用带有不同单词数量的字符串,并使用几组方括号:

s = "blah [in brackets] stuff [three words here] more stuff [one-word] stuff [a digit 1!] stuff."

s = re.split(r'\s*\[|\]\s*', s)

print (s)

output: ['blah', 'in brackets', 'stuff', 'three words here', 'more stuff', 'one-word', 'stuff', 'a digit 1!', 'stuff.'] 输出: ['blah', 'in brackets', 'stuff', 'three words here', 'more stuff', 'one-word', 'stuff', 'a digit 1!', 'stuff.']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM