后向和前向非捕获括号-Python

Question

So I want to capture the indices in a string like this: 所以我想在这样的字符串中捕获索引：

 "Something bad happened! @ data[u'string_1'][u'string_2']['u2'][0]"

I want to capture the strings string_1 , string_2 , u2 , and 0 . 我想捕获字符串string_1 ， string_2 ， u2和0 。

I was able to do this using the following regex: 我可以使用以下正则表达式执行此操作：

re.findall("("
           "((?<=\[u')|(?<=\['))" # Begins with [u' or ['
           "[a-zA-Z0-9_\-]+" # Followed by any letters, numbers, _'s, or -'s
           "(?='\])" # Ending with ']
           ")"
           "|" # OR
           "("
           "(?<=\[)" # Begins with [
           "[0-9]+" # Followed by any numbers
           "(?=\])" # Endging with ]
           ")", message)

Problem is the result will include tuples with empty strings, as such: 问题是结果将包含带有空字符串的元组，例如：

[('string_1', '', ''), ('string_2', '', ''), ('u2', '', ''), ('', '', '0')]

Now I can easily filter out the empty strings from the result, but I would like to prevent them from appearing in the first place. 现在，我可以轻松地从结果中过滤出空字符串，但是我想防止它们首先出现。

I believe that the reason for this is due to my capture groups. 我相信，原因是由于我的捕获小组。 I tried to use ?: in those group, but then my results were completely gone. 我尝试在那些小组中使用?: ，但是后来我的结果完全消失了。

This is how I had attempted to do it: 这就是我尝试过的方式：

re.findall("(?:"
           "((?<=\[u')|(?<=\['))" # Begins with [u' or ['
           "[a-zA-Z0-9_\-]+" # Followed by any letters, numbers, _'s, or -'s
           "(?='\])" # Ending with ']
           ")"
           "|" # OR
           "(?:"
           "(?<=\[)" # Begins with [
           "[0-9]+" # Followed by any numbers
           "(?=\])" # Endging with ]
           ")", message)

That results in the following output: 结果如下：

['', '', '', '']

I'm assuming the issue is due to me using lookbehinds along with the non-capturing groups. 我假设问题是由于我使用了lookbehinds以及非捕获组。 Any ideas on whether this is possible to do in Python? 关于这是否可能在Python中执行的任何想法？

Thanks 谢谢

Answer 1

Regex : (?<=\\[)(?:[^'\\]]*')?([^'\\]]+) or \\[(?:[^'\\]]*')?([^'\\]]+) 正则表达式 ： (?<=\\[)(?:[^'\\]]*')?([^'\\]]+)或\\[(?:[^'\\]]*')?([^'\\]]+)

Python code : Python代码 ：

def Years(text):
        return re.findall(r'(?<=\[)(?:[^\'\]]*\')?([^\'\]]+)', text)

print(Years('Something bad happened! @ data[u\'string_1\'][u\'string_2\'][\'u2\'][0]'))

Output: 输出：

['string_1', 'string_2', 'u2', '0']

Answer 2

You can simplify your regex. 您可以简化您的正则表达式。

(?<=\[)u?'?([a-zA-Z0-9_\-]+)(?='?\])

See demo . 参见演示。

https://regex101.com/r/SA6shx/1 https://regex101.com/r/SA6shx/1

后向和前向非捕获括号-Python

问题描述

2 个解决方案

解决方案1
1 2018-02-06 21:25:42

解决方案2
1 已采纳 2018-02-06 21:35:09

后向和前向非捕获括号-Python

问题描述

2 个解决方案

解决方案1 1 2018-02-06 21:25:42

解决方案2 1 已采纳 2018-02-06 21:35:09

解决方案1
1 2018-02-06 21:25:42

解决方案2
1 已采纳 2018-02-06 21:35:09