简体   繁体   English

正则表达式拆分

[英]Regular Expression Splitting

I am new at regex so I am having trouble splitting the following string: 我是regex的新手,因此无法拆分以下字符串:

test_str = "./name[contains(substring(.,1,3),'some')],is-in,up,down"

The string is elimited by commas but if the group contains [], it should not split by comma. 该字符串以逗号分隔,但如果该组包含[],则不应以逗号分隔。

So the result should look like this: 因此结果应如下所示:

["./name[contains(substring(.,1,3),'some')]", "is-in", "up", "down"]

I am trying this regular expression: 我正在尝试这个正则表达式:

r"./*[a-z]+((\[.*?\])?)*,?/*"

...but there is some problem with "-" ...但是“-”存在一些问题

This is not a solution that uses regex, but it's one nonetheless: 这不是使用正则表达式的解决方案,但仍然是这样:

# Create a function to get the number of "interesting commas" in the string:
f = lambda x: x.split(']')[1].count(',') if '[' in x and ']' in x else x.count(',')

# Reverse the string and split on the "interesting commas" and then reverse it back to normal:
[x[::-1] for x in test_str[::-1].split(",",f(test_str))][::-1]

Should return: 应该返回:

# ["./name[contains(substring(.,1,3),'some')]", 'is-in', 'up', 'down']

I hope this helps. 我希望这有帮助。

instead of using re, I feel you can just use a stack to keep track of the open and closing brackets and concatenate them as necessary. 我觉得您可以使用堆栈而不是使用re来跟踪打开和关闭方括号,并在必要时将它们连接起来。 This assumes you always have more or equal number of closing brackets than opening brackets. 假设您的闭括号总是比开括号多或相等。 the following code is self-explainary, hope it can help a little bit. 以下代码是不言自明的,希望对您有所帮助。

test_str = "./name[contains(substring(.,1,3),'some')],is-in,up,down"

result = test_str.split(',')

output = []

for token in result:
    if '[' in token:
        output.append(token)
    elif ']' in token and output:
        output[-1] += token
    else:
        output.append(token)
print output

RegExps are not powerful enough for your task and so my solution has to use more than just RegExps. RegExps的功能不足以完成您的任务,因此我的解决方案必须使用的不仅仅是RegExps。

First, I suggest to isolate the [...] -parts: 首先,我建议隔离[...]部分:

 w = re.split(r'(\[.*?\])', test_str)
 ts = [[t] if t.startswith('[') else t.split(',') for t in w ]

Then you get in ts : [['./name'], ["[contains(substring(.,1,3),'some')]"], ['', 'is-in', 'up', 'down']] Afterwards the lists have to be joined: 然后进入ts[['./name'], ["[contains(substring(.,1,3),'some')]"], ['', 'is-in', 'up', 'down']]之后,必须将列表合并:

reduce(lambda x, y: x+[y[0]] if y[0] and y[0].startswith('[') else x+y, ws)

which yields (in this case): 产生(在这种情况下):

['./name', "[contains(substring(.,1,3),'some')]", '', 'is-in', 'up', 'down']

What remains is: Joining some of the Lists / remove empty strings. 剩下的是:加入一些列表/删除空字符串。 This solution should be applicable to most cases ... 此解决方案应适用于大多数情况...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM