[英]Regular expression negative lookbehind python
I'm trying to split a string by commas that are not inside brackets (ie the string contains items that are separated by commas, but it also contains commas within brackets that I don't want to separate on). 我正在尝试用不在方括号内的逗号分割字符串(即,字符串包含用逗号分隔的项目,但它也包含在括号内的逗号,但我不想将其分开)。 Like so:
像这样:
A='[1, "A"], [2, "B"], [3, "C"], [4, "D"], [5, "E"], [6, "F"], [7, "G"], [8, "H"], [9, "I"], [10, "J"], [100, "JJ"]'
Which should result in: 这应导致:
['[1, "A"]', ' [2, "B"]', ' [3, "C"]', ' [4, "D"]', ' [5, "E"]', ' [6, "F"]', ' [7, "G"]', ' [8, "H"]', ' [9, "I"]', ' [10, "J"]', '[100, "JJ"]']
I tried using negative lookbehind like this: 我尝试使用负向后看像这样:
B=re.split(r'(?<![[][\d]),',A)
However, this does not work when the number within the brackets goes above one digit such as in the case of [10, "J"]. 但是,当括号中的数字超过1位时,例如在[10,“ J”]的情况下,这将不起作用。 Any help would be greatly appreciated!
任何帮助将不胜感激!
This looks like "split on any comma that is preceded by a ]
" could work. 这看起来像“上前面有一个逗号任何分裂
]
可以工作”。 For good measure I added \\s*
to eat up the spaces before the next item. 为了更好地衡量,我加了
\\s*
来占用下一项之前的空格。
import re
A = '[1, "A"], [2, "B"], [3, "C"], [4, "D"], [5, "E"], [6, "F"], [7, "G"], [8, "H"], [9, "I"], [10, "J"], [100, "JJ"]'
re.split(r"(?<=]),\s*", A)
gives 给
['[1, "A"]', '[2, "B"]', '[3, "C"]', '[4, "D"]', '[5, "E"]', '[6, "F"]', '[7, "G"]', '[8, "H"]', '[9, "I"]', '[10, "J"]', '[100, "JJ"]']
You can try this: 您可以尝试以下方法:
A='[1, "A"], [2, "B"], [3, "C"], [4, "D"], [5, "E"], [6, "F"], [7, "G"], [8, "H"], [9, "I"], [10, "J"], [100, "JJ"]'
import re
data = re.split('(?<=\]),\s', A)
Output: 输出:
['[1, "A"]', '[2, "B"]', '[3, "C"]', '[4, "D"]', '[5, "E"]', '[6, "F"]', '[7, "G"]', '[8, "H"]', '[9, "I"]', '[10, "J"]', '[100, "JJ"]']
If using split
is not a requirement, findall
can also be used in with a very simple expression, 如果不要求使用
split
, findall
也可以使用非常简单的表达式来使用,
In [27]: re.findall(r'\[.+?\]', A)
Out[27]:
['[1, "A"]', '[2, "B"]', '[3, "C"]', '[4, "D"]', '[5, "E"]', '[6, "F"]', '[7, "G"]', '[8, "H"]', '[9, "I"]', '[10, "J"]', '[100, "JJ"]']
try this regex a get each item by group 1: 试试这个正则表达式按组1获取每个项目:
(\\[\\d+,\\s*\\"\\w+\\"\\])
You can see the result in this link: 您可以在以下链接中看到结果:
https://regex101.com/r/K5XV6F/1 https://regex101.com/r/K5XV6F/1
Using the newer regex
module you can use 使用更新的
regex
模块,您可以使用
\[[^][]*\](*SKIP)(*FAIL) # discard anything in square brackets
| # or
,\s* # match , and whitespaces, eventually
Python
this looks like
Python
这看起来像
import regex as re A='[1, "A"], [2, "B"], [3, "C"], [4, "D"], [5, "E"], [6, "F"], [7, "G"], [8, "H"], [9, "I"], [10, "J"], [100, "JJ"]' rx = re.compile(r'\\[[^][]*\\](*SKIP)(*FAIL)|,\\s*') print(rx.split(A)) # ['[1, "A"]', '[2, "B"]', '[3, "C"]', '[4, "D"]', '[5, "E"]', '[6, "F"]', '[7, "G"]', '[8, "H"]', '[9, "I"]', '[10, "J"]', '[100, "JJ"]']
See a demo on regex101.com . 参见regex101.com上的演示 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.