[英]non-capturing parenthesis with lookbehind and lookahead - Python
So I want to capture the indices in a string like this: 所以我想在这样的字符串中捕获索引:
"Something bad happened! @ data[u'string_1'][u'string_2']['u2'][0]"
I want to capture the strings string_1
, string_2
, u2
, and 0
. 我想捕获字符串
string_1
, string_2
, u2
和0
。
I was able to do this using the following regex: 我可以使用以下正则表达式执行此操作:
re.findall("("
"((?<=\[u')|(?<=\['))" # Begins with [u' or ['
"[a-zA-Z0-9_\-]+" # Followed by any letters, numbers, _'s, or -'s
"(?='\])" # Ending with ']
")"
"|" # OR
"("
"(?<=\[)" # Begins with [
"[0-9]+" # Followed by any numbers
"(?=\])" # Endging with ]
")", message)
Problem is the result will include tuples with empty strings, as such: 问题是结果将包含带有空字符串的元组,例如:
[('string_1', '', ''), ('string_2', '', ''), ('u2', '', ''), ('', '', '0')]
Now I can easily filter out the empty strings from the result, but I would like to prevent them from appearing in the first place. 现在,我可以轻松地从结果中过滤出空字符串,但是我想防止它们首先出现。
I believe that the reason for this is due to my capture groups. 我相信,原因是由于我的捕获小组。 I tried to use
?:
in those group, but then my results were completely gone. 我尝试在那些小组中使用
?:
,但是后来我的结果完全消失了。
This is how I had attempted to do it: 这就是我尝试过的方式:
re.findall("(?:"
"((?<=\[u')|(?<=\['))" # Begins with [u' or ['
"[a-zA-Z0-9_\-]+" # Followed by any letters, numbers, _'s, or -'s
"(?='\])" # Ending with ']
")"
"|" # OR
"(?:"
"(?<=\[)" # Begins with [
"[0-9]+" # Followed by any numbers
"(?=\])" # Endging with ]
")", message)
That results in the following output: 结果如下:
['', '', '', '']
I'm assuming the issue is due to me using lookbehinds along with the non-capturing groups. 我假设问题是由于我使用了lookbehinds以及非捕获组。 Any ideas on whether this is possible to do in Python?
关于这是否可能在Python中执行的任何想法?
Thanks 谢谢
Regex : (?<=\\[)(?:[^'\\]]*')?([^'\\]]+)
or \\[(?:[^'\\]]*')?([^'\\]]+)
正则表达式 :
(?<=\\[)(?:[^'\\]]*')?([^'\\]]+)
或\\[(?:[^'\\]]*')?([^'\\]]+)
Python code : Python代码 :
def Years(text):
return re.findall(r'(?<=\[)(?:[^\'\]]*\')?([^\'\]]+)', text)
print(Years('Something bad happened! @ data[u\'string_1\'][u\'string_2\'][\'u2\'][0]'))
Output: 输出:
['string_1', 'string_2', 'u2', '0']
You can simplify your regex. 您可以简化您的正则表达式。
(?<=\[)u?'?([a-zA-Z0-9_\-]+)(?='?\])
See demo . 参见演示。
https://regex101.com/r/SA6shx/1 https://regex101.com/r/SA6shx/1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.