简体   繁体   中英

Removing data between double squiggly brackets with nested sub brackets in python

I'm having some difficulty with this problem. I need to remove all data that's contained in squiggly brackets.

Like such:

Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there.

Becomes:

Hello there.

Here's my first try (I know it's terrible):

while 1:
    firstStartBracket = text.find('{{')
    if (firstStartBracket == -1):
        break;
    firstEndBracket = text.find('}}')
    if (firstEndBracket == -1):
        break;
    secondStartBracket = text.find('{{',firstStartBracket+2);
    lastEndBracket = firstEndBracket;
    if (secondStartBracket == -1 or secondStartBracket > firstEndBracket):
        text = text[:firstStartBracket] + text[lastEndBracket+2:];
        continue;
    innerBrackets = 2;
    position = secondStartBracket;
    while innerBrackets:
        print innerBrackets;
        #everytime we find a next start bracket before the ending add 1 to inner brackets else remove 1
        nextEndBracket = text.find('}}',position+2);
        nextStartBracket = text.find('{{',position+2);
        if (nextStartBracket != -1 and nextStartBracket < nextEndBracket):
            innerBrackets += 1;
            position = nextStartBracket;
            # print text[position-2:position+4];
        else:
            innerBrackets -= 1;
            position = nextEndBracket;
            # print text[position-2:position+4];
            # print nextStartBracket
            # print lastEndBracket
            lastEndBracket = nextEndBracket;
        print 'pos',position;
    text = text[:firstStartBracket] + text[lastEndBracket+2:];

It seems to work but runs out of memory quite fast. Is there any better way to do this (hopefully with regex)?

EDIT: I was not clear so I'll give another example. I need to allow for multiple top level brackets.

Like such:

Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there {{my }} friend.

Becomes:

Hello there friend.

You can use pyparsing module here. Solution based on this answer :

from pyparsing import nestedExpr


s = "Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there {{my }} friend."

expr = nestedExpr('{{', '}}')
result = expr.parseString("{{" + s + "}}").asList()[0]
print(" ".join(item for item in result if not isinstance(item, list)))

Prints:

Hello there friend.

The following would only work if there is only one top-level pair of braces.

If you want to remove everything inside the double curly braces with the braces themselves:

>>> import re
>>> 
>>> s = "Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there."
>>> re.sub(r"\{\{.*\}\} ", "", s)
'Hello there.'

\\{\\{.*\\}\\} would match double curly braces followed by any characters any number of times (intentionally left it "greedy" ) followed by double curly braces and a space.

This is a regex/generator based solution that works with any number of braces. This problem does not need an actual stack because there is only 1 type (well, pair) of token involved. The level fills the role that a stack would fill in a more complex parser.

import re

def _parts_outside_braces(text):
    level = 0
    for part in re.split(r'(\{\{|\}\})', text):
        if part == '{{':
            level += 1
        elif part == '}}':
            level = level - 1 if level else 0
        elif level == 0:
            yield part

x = 'Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there.  {{ second set {{ of }} braces }}'
print(''.join(_parts_outside_braces(x)))

More general points... the capture group in the regex is what makes the braces show up in the output of re.split , otherwise you only get the stuff in between. There's also some support for mismatched braces. For a strict parser, that should raise an exception, as should running off the end of the string with level > 0. For a loose, web-browser style parser, maybe you would want to display those }} as output...

Try the following code:

import re

s = 'Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there'
m = re.search('(.*?) {.*}(.*)',s)
result = m.group(1) + m.group(2)
print(result)

The problem is that you would have to deal with nested structure, which means regular expression may not suffice. However, a simple parser with a memory of depth level may come to rescue - it is very simple to write, just store the depth level into a variable.

I just post a more pythonic way of writing the solution here, which may be a good reference for you.

import re

def rem_bra(inp):
    i = 0
    lvl = 0
    chars = []
    while i < len(inp):
        if inp[i:i+2] == '{{':
            lvl += 1
            i += 1
        elif inp[i:i+2] == '}}':
            lvl -= 1
            i += 1
        else:
            if lvl < 1:
                chars.append(inp[i])
        i += 1
    result = ''.join(chars)

    # If you need no more contigious spaces, add this line:
    result = re.sub(r'\s\s+', r' ', result)

    return result


inp = "Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there."

print(rem_bra(inp))
>>> Hello there.

For good measure, yet another solution. It starts by finding and replacing the leftmost innermost braces and works its way outwards, rightwards. Takes care of multiple top level braces.

import re

def remove_braces(s):
    pattern = r'\{\{(?:[^{]|\{[^{])*?\}\}'
    while re.search(pattern, s):
        s = re.sub(pattern, '', s)
    return s

Not the most efficient, but short.

>>> remove_braces('Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there {{my }} friend.')
'Hello  there  friend.' 

This question makes fun. Here is my attempt:

import re

def find_str(string):

    flag = 0

    for index,item in enumerate(string):

        if item == '{':
            flag += 1

        if item == '}':
            flag -= 1

        if flag == 0:
            yield index

s = 'Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there {{my }} friend.'

index = list(find_str(s))

l = [s[i] for i in index]

s = ' '.join(l)

re.sub('}\s+','',s)

'H ellotherefriend .'

With Python regex package could use a recursive regex .


{{(?>[^}{]+|(?0))*}} ?

Or another variant (requires a bit more step).


{{(?>[^}{]*(?R)?)*}} ?

At (?0) or (?R) pattern is pasted. Use with regex.sub

>>> import regex
>>> str = 'Hello {{world of the {{ crazy}} {{need {{ be}}}} sea }} there.'
>>> regex.sub(r'(?V1){{(?>[^}{]+|(?0))*}} ?', '', str)

(?V1) Version 1 behaves like Perl. Can not test this, you need to try :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM