简体   繁体   English

正则表达式负向后看python

[英]Regular expression negative lookbehind python

I'm trying to split a string by commas that are not inside brackets (ie the string contains items that are separated by commas, but it also contains commas within brackets that I don't want to separate on). 我正在尝试用不在方括号内的逗号分割字符串(即,字符串包含用逗号分隔的项目,但它也包含在括号内的逗号,但我不想将其分开)。 Like so: 像这样:

A='[1, "A"], [2, "B"], [3, "C"], [4, "D"], [5, "E"], [6, "F"], [7, "G"], [8, "H"], [9, "I"], [10, "J"], [100, "JJ"]'

Which should result in: 这应导致:

['[1, "A"]', ' [2, "B"]', ' [3, "C"]', ' [4, "D"]', ' [5, "E"]', ' [6, "F"]', ' [7, "G"]', ' [8, "H"]', ' [9, "I"]', ' [10, "J"]', '[100, "JJ"]']

I tried using negative lookbehind like this: 我尝试使用负向后看像这样:

B=re.split(r'(?<![[][\d]),',A)

However, this does not work when the number within the brackets goes above one digit such as in the case of [10, "J"]. 但是,当括号中的数字超过1位时,例如在[10,“ J”]的情况下,这将不起作用。 Any help would be greatly appreciated! 任何帮助将不胜感激!

This looks like "split on any comma that is preceded by a ] " could work. 这看起来像“上前面有一个逗号任何分裂]可以工作”。 For good measure I added \\s* to eat up the spaces before the next item. 为了更好地衡量,我加了\\s*来占用下一项之前的空格。

import re

A = '[1, "A"], [2, "B"], [3, "C"], [4, "D"], [5, "E"], [6, "F"], [7, "G"], [8, "H"], [9, "I"], [10, "J"], [100, "JJ"]'

re.split(r"(?<=]),\s*", A)

gives

['[1, "A"]', '[2, "B"]', '[3, "C"]', '[4, "D"]', '[5, "E"]', '[6, "F"]', '[7, "G"]', '[8, "H"]', '[9, "I"]', '[10, "J"]', '[100, "JJ"]']

You can try this: 您可以尝试以下方法:

A='[1, "A"], [2, "B"], [3, "C"], [4, "D"], [5, "E"], [6, "F"], [7, "G"], [8, "H"], [9, "I"], [10, "J"], [100, "JJ"]'
import re
data = re.split('(?<=\]),\s', A)

Output: 输出:

['[1, "A"]', '[2, "B"]', '[3, "C"]', '[4, "D"]', '[5, "E"]', '[6, "F"]', '[7, "G"]', '[8, "H"]', '[9, "I"]', '[10, "J"]', '[100, "JJ"]']

If using split is not a requirement, findall can also be used in with a very simple expression, 如果不要求使用splitfindall也可以使用非常简单的表达式来使用,

In [27]: re.findall(r'\[.+?\]', A)
Out[27]:
['[1, "A"]', '[2, "B"]', '[3, "C"]', '[4, "D"]', '[5, "E"]', '[6, "F"]', '[7, "G"]', '[8, "H"]', '[9, "I"]', '[10, "J"]', '[100, "JJ"]']

try this regex a get each item by group 1: 试试这个正则表达式按组1获取每个项目:

(\\[\\d+,\\s*\\"\\w+\\"\\])

You can see the result in this link: 您可以在以下链接中看到结果:

https://regex101.com/r/K5XV6F/1 https://regex101.com/r/K5XV6F/1

Using the newer regex module you can use 使用更新的regex模块,您可以使用

\[[^][]*\](*SKIP)(*FAIL) # discard anything in square brackets
|                        # or
,\s*                     # match , and whitespaces, eventually


In Python this looks like Python这看起来像

 import regex as re A='[1, "A"], [2, "B"], [3, "C"], [4, "D"], [5, "E"], [6, "F"], [7, "G"], [8, "H"], [9, "I"], [10, "J"], [100, "JJ"]' rx = re.compile(r'\\[[^][]*\\](*SKIP)(*FAIL)|,\\s*') print(rx.split(A)) # ['[1, "A"]', '[2, "B"]', '[3, "C"]', '[4, "D"]', '[5, "E"]', '[6, "F"]', '[7, "G"]', '[8, "H"]', '[9, "I"]', '[10, "J"]', '[100, "JJ"]'] 

See a demo on regex101.com . 参见regex101.com上的演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM