![](/img/trans.png)
[英]Split a string into a list of tuples based selectively on specific commas within the string
[英]Split string based on number of commas
我有一个用逗号分隔的文本。
例如:
FOO( something, BOO(tmp, temp), something else)
可能其他内容也包含带逗号的字符串...
我想将FOO框内的文本拆分为元素,然后粘贴元素。
我所知道的是FOO必须有两个逗号。
我该如何将FOO的内容分为三个要素?
备注: 其他可能是BOO(ddd,ddd)或简单地ddd 。 我不能假设'FOO \\(\\ w +,BOO(\\ w +,\\ w +),\\ w + \\)'的简单正则表达式
假设该字符串是Python代码,则可以为此使用解析器 。 如果仔细查看结果,您可能会同意它的效果不如最初看起来的那么糟糕。
>>> from parser import *
>>> source="FOO( something, BOO(tmp, temp), something)"
>>> st=suite(source)
>>> st2tuple(st)
(257, (268, (269, (270, (271, (272, (302, (306, (307, (308, (309, (312, (313, (314, (315, (316, (317, (318, (319, (320, (1, 'FOO')), (322, (7, '('), (330, (331, (302, (306, (307, (308, (309, (312, (313, (314, (315, (316, (317, (318, (319, (320, (1, 'something')))))))))))))))), (12, ','), (331, (302, (306, (307, (308, (309, (312, (313, (314, (315, (316, (317, (318, (319, (320, (1, 'BOO')), (322, (7, '('), (330, (331, (302, (306, (307, (308, (309, (312, (313, (314, (315, (316, (317, (318, (319, (320, (1, 'tmp')))))))))))))))), (12, ','), (331, (302, (306, (307, (308, (309, (312, (313, (314, (315, (316, (317, (318, (319, (320, (1, 'temp'))))))))))))))))), (8, ')')))))))))))))))), (12, ','), (331, (302, (306, (307, (308, (309, (312, (313, (314, (315, (316, (317, (318, (319, (320, (1, 'something'))))))))))))))))), (8, ')')))))))))))))))))), (4, ''))), (4, ''), (0, ''))
您可以使用支持递归的regex模块来完成此操作(用于处理嵌套结构):
import regex
s = 'FOO( something, BOO(tmp, temp), something else)'
pat = regex.compile(r'''(?(DEFINE) # inside a definition group
# you can define subpatterns to use later
(?P<elt> # define the subpattern "elt"
[^,()]*+
(?:
\( (?&elt) (?: , (?&elt) )* \)
[^,()]*
)*+
)
)
# start of the main pattern
FOO\( \s*
(?P<elt1> (?&elt) ) # capture group "elt1" contains the subpattern "elt"
, \s*
(?P<elt2> (?&elt) ) # same here
, \s*
(?P<elt3> (?&elt) ) # etc.
\)''', regex.VERSION1 | regex.VERBOSE )
m = pat.search(s)
print(m.group('elt1'))
print(m.group('elt2'))
print(m.group('elt3'))
假设您需要FOO
中的元素列表,因此请先对其进行预处理
>>> s = 'FOO( something, BOO(tmp, temp), something else)'
>>> s
'FOO( something, BOO(tmp, temp), something else)'
>>> s = re.sub(r'^[^(]+\(|\)\s*$','',s)
>>> s
' something, BOO(tmp, temp), something else'
使用正则表达式模块:
>>> regex.split(r'[^,(]+\([^)]+\)(*SKIP)(?!)|,', s)
[' something', ' BOO(tmp, temp)', ' something else']
[^,(]+\\([^)]+\\)(*SKIP)(?!)
跳过模式[^,(]+\\([^)]+\\)
|,
实际上是分割输入字符串的替代模式,在这种情况下,
另一个例子:
>>> t = 'd(s,sad,e),g(3,2),c(d)'
>>> regex.split(r'[^,(]+\([^)]+\)(*SKIP)(?!)|,', t)
['d(s,sad,e)', 'g(3,2)', 'c(d)']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.