简体   繁体   中英

Split string based on number of commas

I have a text which is splited with commas.

eg:

FOO( something, BOO(tmp, temp), something else)

It could be that something else contain as well a string with commas...

I would like to split the text inside the brakets of FOO to its elements and then pasrse the elements.

What i do know is that FOO must have two commas.

How could I split the contant of FOO to its three elements?

Remark: something else could be BOO(ddd, ddd) or simply ddd . I can not assume a simple regex regel of 'FOO\\(\\w+, BOO(\\w+, \\w+), \\w+\\)'

Assuming that the string is Python code you can use parser for this. If you look carefully at the result you might agree that it's not as bad as it first appears to be.

>>> from parser import *
>>> source="FOO( something, BOO(tmp, temp), something)"
>>> st=suite(source)
>>> st2tuple(st)
(257, (268, (269, (270, (271, (272, (302, (306, (307, (308, (309, (312, (313, (314, (315, (316, (317, (318, (319, (320, (1, 'FOO')), (322, (7, '('), (330, (331, (302, (306, (307, (308, (309, (312, (313, (314, (315, (316, (317, (318, (319, (320, (1, 'something')))))))))))))))), (12, ','), (331, (302, (306, (307, (308, (309, (312, (313, (314, (315, (316, (317, (318, (319, (320, (1, 'BOO')), (322, (7, '('), (330, (331, (302, (306, (307, (308, (309, (312, (313, (314, (315, (316, (317, (318, (319, (320, (1, 'tmp')))))))))))))))), (12, ','), (331, (302, (306, (307, (308, (309, (312, (313, (314, (315, (316, (317, (318, (319, (320, (1, 'temp'))))))))))))))))), (8, ')')))))))))))))))), (12, ','), (331, (302, (306, (307, (308, (309, (312, (313, (314, (315, (316, (317, (318, (319, (320, (1, 'something'))))))))))))))))), (8, ')')))))))))))))))))), (4, ''))), (4, ''), (0, ''))

You may use this regex

,(?=(?:(?:\([^)]*\))?[^)]*)+\)$)

to split your string in the comas, bot not inside BOO(...)

sample

You can do it with the regex module that supports recursion (useful to deal with nested structures):

import regex

s = 'FOO( something, BOO(tmp, temp), something else)'

pat = regex.compile(r'''(?(DEFINE) # inside a definition group
    # you can define subpatterns to use later
    (?P<elt>     # define the subpattern "elt"
        [^,()]*+
        (?:
            \( (?&elt) (?: , (?&elt) )* \)
            [^,()]*
        )*+
    )
)
# start of the main pattern
FOO\( \s*
    (?P<elt1> (?&elt) )  # capture group "elt1" contains the subpattern "elt"
    , \s*
    (?P<elt2> (?&elt) )  # same here
    , \s*
    (?P<elt3> (?&elt) )  # etc.
\)''', regex.VERSION1 | regex.VERBOSE )

m = pat.search(s)

print(m.group('elt1'))
print(m.group('elt2'))
print(m.group('elt3'))

demo

Assuming that you need a list of elements inside FOO , so pre-processing it first

>>> s = 'FOO( something, BOO(tmp, temp), something else)'
>>> s
'FOO( something, BOO(tmp, temp), something else)'
>>> s = re.sub(r'^[^(]+\(|\)\s*$','',s)
>>> s
' something, BOO(tmp, temp), something else'

Using regex module:

>>> regex.split(r'[^,(]+\([^)]+\)(*SKIP)(?!)|,', s)
[' something', ' BOO(tmp, temp)', ' something else']
  • [^,(]+\\([^)]+\\)(*SKIP)(?!) to skip the pattern [^,(]+\\([^)]+\\)
  • |, alternate pattern to actually split the input string, in this case it is ,


another example:

>>> t = 'd(s,sad,e),g(3,2),c(d)'
>>> regex.split(r'[^,(]+\([^)]+\)(*SKIP)(?!)|,', t)
['d(s,sad,e)', 'g(3,2)', 'c(d)']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM