简体   繁体   English

与re.sub混淆

[英]Confusion with re.sub

I have the string aa{{{a {{ {aaa{ that I would like to translate to aa { { {a { { {aaa { . 我有一个字符串aa{{{a {{ {aaa{我想翻译成aa { { {a { { {aaa { Basically every { must a space character before it. 基本上每个{之前都必须有一个空格字符。

My regular expression substitution function I am currently using is: re.sub(r'[^\\ ]{', lambda x:x.group(0)[0]+' {', test_case) The result from the function is: aa {{ {a { { {aaa { (Close, but there is a {{ in the string) 我当前使用的正则表达式替换函数是: re.sub(r'[^\\ ]{', lambda x:x.group(0)[0]+' {', test_case)该函数的结果是: aa {{ {a { { {aaa { (关闭,但字符串中有{{

My method performs very well on section like a{a{a . 我的方法在a{a{a这样a{a{a部分上表现很好。 However if two { characters are together like a{{a it only seems to operate on the first { and completely neglect the following { . 但是,如果两个{字符像a{{a一样在一起,那么它似乎只能在第一个{上操作,而完全忽略后面的{

A more clear example will be a large series of {{{{{{{{{{{{ . 一个更清晰的例子是大量的{{{{{{{{{{{{ My regex substitution returns: { {{ {{ {{ {{ {{ { . 我的正则表达式替换返回: { {{ {{ {{ {{ {{ { Which clearly skips over every other character given tightly nested { . 显然跳过了紧紧嵌套{所有其他字符。

Why are they skipping? 他们为什么跳过? Any help to untangle this confusion would be greatly appreciated! 消除这种混乱的任何帮助将不胜感激!

PS I am sorry to everyone out there that have the strong desire to close all the opened curly-brace. PS对所有希望关闭所有打开的花括号的人感到抱歉。

I'd use a negative lookbehind: 我会在后面使用负数:

re.sub(r'(?<!\s)(\{)',r' \1','{{{{{{')

Basically we parse the string until we hit a { . 基本上,我们解析字符串,直到命中{为止。 If the character before it isn't whitespace (that's the (?<!\\s) bit), the { matches and we replace it with a space in front. 如果前面的字符不是空格(即(?<!\\s)位),则{与之匹配,我们将其替换为前面的空格。

They are skipping because your regular expression is consuming two characters: [^\\ ] and { . 之所以跳过它们,是因为您的正则表达式使用两个字符: [^\\ ]{ You need to use 0-width negative lookbehind for the preceding space in order not to consume it: (?!<\\s){ . 您需要对前面的空格使用0宽度负向后搜索,以便不占用它: (?!<\\s){ Then you can just replace it with " {" , without the lambda hassle. 然后,您可以将其替换为" {" ,而不必担心lambda的麻烦。

我希望这可以解决问题:

re.sub (' *{', ' {', test_case)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM