逗号分隔的单词正则表达式

Question

I'm having some issues while trying to parse expression such as the following: 尝试解析表达式时遇到一些问题，例如：

word1, word2[a,b,c],   word3, ..., wordN

I'd like to get the following groups: 我想得到以下团体：

g1: word1
g2: word2[a,b,c]
g3: word3

Please note that the [.+] is optional, the regular expression must be able to match expressions like the following the following: 请注意，[。+]是可选的，正则表达式必须能够匹配以下表达式：

word1,word2,word3
word1[a,b,c],word2,word3
word1[a,b,c],word2[e,f,g],word3
word1[a,b,c],word2[e,f,g],word3[i,j,l]

I did some attempts but I can't find the way to correctly separate the groups. 我做了一些尝试，但找不到正确分离组的方法。

Answer 1

I tried this regex on https://regex101.com , and pasted your expressions into the "test strings" box. 我在https://regex101.com上尝试了此正则表达式，并将您的表达式粘贴到“测试字符串”框中。

/^([a-zA-Z0-9]+(?:\[.*\])?),([a-zA-Z0-9]+(?:\[.*\])?),([a-zA-Z0-9]+(?:\[.*\])?)$/gm

Each word is separated by a comma, and of the form: 每个单词都用逗号分隔，形式为：

([a-zA-Z0-9]+(?:\[.*\])?)

Explanation: 说明：

(
  [a-zA-Z0-9]+ # one or more alphanumeric characters (could use \w)
  (?:\[.*\])? # an optional sequence surrounded by []s. (?: ) means a non-capturing group
)

Answer 2

For the time being this seems to be working: 目前看来这是可行的：

import re
rgx = re.compile("(\w+(\[.*?\])*).*?,?")
[key for key, val in rgx.findall("word1, word2[a,b,[c,,,]],     word,3")]

# this regex starts by looking for alpha numberic characters with \w+
# then within that it looks if a `[` is present then till we encounter end of bracket ']' consider everything (\[.*?\])*.
# the output of this is a tuple as ('word2[a,b,c]', '[a,b,c]')
# we iterate over the tuple and take only the 1st values in the tuple

output: 输出：

['word1', 'word2[a,b,[c,,,]', 'word', '3']

another example 另一个例子

[key for key, val in rgx.findall("word1[bbbb,cccc],word2[bbbb,cccc] ")]

output: 输出：

['word1[bbbb,cccc]', 'word2[bbbb,cccc]']

PS: still regexing to improve it. PS：仍在为改善它而感到遗憾。

Answer 3

You can use re.split to split only on commas, that are outside of brackets. 您可以使用re.split来仅分割方括号中的逗号。 This can be determinded by the fact, that those commas will never match a closing bracket before the opening one (using a negative lookahead). 这可以通过以下事实来确定：这些逗号永远不会在打开方括号之前（使用负前瞻）与关闭方括号匹配。 This trick is only possible with non-nested brackets. 此技巧仅适用于非嵌套括号。

import re
print(re.split(r',(?![^[]*\])', 'word1[a,b,c],word2[e,f,g],word3'))

outputs ['word1[a,b,c]', 'word2[e,f,g]', 'word3'] 输出['word1[a,b,c]', 'word2[e,f,g]', 'word3']

http://ideone.com/7vIwFM http://ideone.com/7vIwFM

逗号分隔的单词正则表达式

问题描述

3 个解决方案

解决方案1
1 2017-03-03 10:17:12

解决方案2
1 已采纳 2017-03-03 10:19:41

解决方案3
1 2017-03-03 10:53:41

逗号分隔的单词正则表达式

问题描述

3 个解决方案

解决方案1 1 2017-03-03 10:17:12

解决方案2 1 已采纳 2017-03-03 10:19:41

解决方案3 1 2017-03-03 10:53:41

解决方案1
1 2017-03-03 10:17:12

解决方案2
1 已采纳 2017-03-03 10:19:41

解决方案3
1 2017-03-03 10:53:41