简体   繁体   English

逗号分隔的单词正则表达式

[英]Comma separated words regular expression

I'm having some issues while trying to parse expression such as the following: 尝试解析表达式时遇到一些问题,例如:

word1, word2[a,b,c],   word3, ..., wordN

I'd like to get the following groups: 我想得到以下团体:

g1: word1
g2: word2[a,b,c]
g3: word3

Please note that the [.+] is optional, the regular expression must be able to match expressions like the following the following: 请注意,[。+]是可选的,正则表达式必须能够匹配以下表达式:

word1,word2,word3
word1[a,b,c],word2,word3
word1[a,b,c],word2[e,f,g],word3
word1[a,b,c],word2[e,f,g],word3[i,j,l]

I did some attempts but I can't find the way to correctly separate the groups. 我做了一些尝试,但找不到正确分离组的方法。

I tried this regex on https://regex101.com , and pasted your expressions into the "test strings" box. 我在https://regex101.com上尝试了此正则表达式,并将您的表达式粘贴到“测试字符串”框中。

/^([a-zA-Z0-9]+(?:\[.*\])?),([a-zA-Z0-9]+(?:\[.*\])?),([a-zA-Z0-9]+(?:\[.*\])?)$/gm

Each word is separated by a comma, and of the form: 每个单词都用逗号分隔,形式为:

([a-zA-Z0-9]+(?:\[.*\])?)

Explanation: 说明:

(
  [a-zA-Z0-9]+ # one or more alphanumeric characters (could use \w)
  (?:\[.*\])? # an optional sequence surrounded by []s. (?: ) means a non-capturing group
)

For the time being this seems to be working: 目前看来这是可行的:

import re
rgx = re.compile("(\w+(\[.*?\])*).*?,?")
[key for key, val in rgx.findall("word1, word2[a,b,[c,,,]],     word,3")]

# this regex starts by looking for alpha numberic characters with \w+
# then within that it looks if a `[` is present then till we encounter end of bracket ']' consider everything (\[.*?\])*.
# the output of this is a tuple as ('word2[a,b,c]', '[a,b,c]')
# we iterate over the tuple and take only the 1st values in the tuple

output: 输出:

['word1', 'word2[a,b,[c,,,]', 'word', '3']

another example 另一个例子

[key for key, val in rgx.findall("word1[bbbb,cccc],word2[bbbb,cccc] ")]

output: 输出:

['word1[bbbb,cccc]', 'word2[bbbb,cccc]']

PS: still regexing to improve it. PS:仍在为改善它而感到遗憾。

You can use re.split to split only on commas, that are outside of brackets. 您可以使用re.split来仅分割方括号中的逗号。 This can be determinded by the fact, that those commas will never match a closing bracket before the opening one (using a negative lookahead). 这可以通过以下事实来确定:这些逗号永远不会在打开方括号之前(使用负前瞻)与关闭方括号匹配。 This trick is only possible with non-nested brackets. 此技巧仅适用于非嵌套括号。

import re
print(re.split(r',(?![^[]*\])', 'word1[a,b,c],word2[e,f,g],word3'))

outputs ['word1[a,b,c]', 'word2[e,f,g]', 'word3'] 输出['word1[a,b,c]', 'word2[e,f,g]', 'word3']

http://ideone.com/7vIwFM http://ideone.com/7vIwFM

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 正则表达式查找逗号分隔的数字python - Regular Expression to find comma separated numbers python 逗号分隔词的正则表达式 Python - Regular Expression For Comma Separated Word Python 正则表达式用 pandas dataframe 中的总和替换用逗号分隔的字符串 - Regular expression to replace string separated by comma with thier sum in pandas dataframe 正则表达式:匹配和分组可变数量的空格分隔单词 - Regular expression: matching and grouping a variable number of space separated words 如何对用星号分隔的一系列单词执行正则表达式? - How to perform a regular expression for a series of words separated by asterisks? 列表的正则表达式,以“,”或“和”分隔 - Regular expression for a list separated by ',' or 'and' 删除以逗号分隔的重复单词 - Delete duplicated words separated by comma 匹配逗号分隔的key = value列表的正则表达式,其中value可以包含h​​tml? - Regular expression to match comma separated list of key=value where value can contain html? Python 正则表达式 (regex) 匹配逗号分隔的数字 - 为什么这不起作用? - Python regular expression (regex) match comma separated number - why does this not work? 正则表达式匹配逗号分隔的key = value列表,其中value可以包含逗号 - Regular expression to match comma separated list of key=value where value can contain commas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM