简体   繁体   中英

Regular Expression Splitting

I am new at regex so I am having trouble splitting the following string:

test_str = "./name[contains(substring(.,1,3),'some')],is-in,up,down"

The string is elimited by commas but if the group contains [], it should not split by comma.

So the result should look like this:

["./name[contains(substring(.,1,3),'some')]", "is-in", "up", "down"]

I am trying this regular expression:

r"./*[a-z]+((\[.*?\])?)*,?/*"

...but there is some problem with "-"

This is not a solution that uses regex, but it's one nonetheless:

# Create a function to get the number of "interesting commas" in the string:
f = lambda x: x.split(']')[1].count(',') if '[' in x and ']' in x else x.count(',')

# Reverse the string and split on the "interesting commas" and then reverse it back to normal:
[x[::-1] for x in test_str[::-1].split(",",f(test_str))][::-1]

Should return:

# ["./name[contains(substring(.,1,3),'some')]", 'is-in', 'up', 'down']

I hope this helps.

instead of using re, I feel you can just use a stack to keep track of the open and closing brackets and concatenate them as necessary. This assumes you always have more or equal number of closing brackets than opening brackets. the following code is self-explainary, hope it can help a little bit.

test_str = "./name[contains(substring(.,1,3),'some')],is-in,up,down"

result = test_str.split(',')

output = []

for token in result:
    if '[' in token:
        output.append(token)
    elif ']' in token and output:
        output[-1] += token
    else:
        output.append(token)
print output

RegExps are not powerful enough for your task and so my solution has to use more than just RegExps.

First, I suggest to isolate the [...] -parts:

 w = re.split(r'(\[.*?\])', test_str)
 ts = [[t] if t.startswith('[') else t.split(',') for t in w ]

Then you get in ts : [['./name'], ["[contains(substring(.,1,3),'some')]"], ['', 'is-in', 'up', 'down']] Afterwards the lists have to be joined:

reduce(lambda x, y: x+[y[0]] if y[0] and y[0].startswith('[') else x+y, ws)

which yields (in this case):

['./name', "[contains(substring(.,1,3),'some')]", '', 'is-in', 'up', 'down']

What remains is: Joining some of the Lists / remove empty strings. This solution should be applicable to most cases ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM