简体   繁体   中英

How do you find the index positions of unescaped single curly braces in a string?

a = "a"
sample_string = "asdf {{a}} {{ { {a} { {a} }"
## need to find these brackets ^     ^     ^
print(sample_string.format(a=a))

The above string will raise

ValueError: unexpected '{' in field name

I would like to be able to escape the curly brace which _string.formatter_parser is choking on. I started to go down the road of finding all unmatched pairs but realized that wouldn't work for double escaped curly braces. I realized I don't know how to solve this issue.

## this does not solve the problem.
def find_unmatched(s):
    indices = []
    stack = []
    indexstack = []
    for i, e in enumerate(s):
        if e == "{":
            stack.append(e)
            indexstack.append(i)
        elif e == "}":
            if len(stack) < 1:
                indices.append(i)
            else:
                stack.pop()
                indexstack.pop()
    while len(indexstack) > 0:
        indices.append(indexstack.pop())
    return indices

I know I can't simply look for single braces without looking to see if they are also paired. I can't just look for pairs before looking if they are escaped. But there are some cases that throw me off like this:

s1 = f"asdf {{{a}}} {{ {{ {{{a}}} { {a} }"
s2 =  "asdf {{{a}}} {{ {{ {{{a}}} { {a} }"
print(s1)
print(s2.format(a=a))

s1 prints while s2 doesn't.

asdf {a} { { {a} {'a'}
ValueError: unexpected '{' in field name

How do you find the index positions of unescaped curly braces in a string?


Additional info:

The question was asked as to what I was even doing with this. The real-world case is actually a little bit awkward. Strings which are being logged are wrapped in with ansi color codes to colorize the on-screen logs to help differentiate the source of the log line. The same line is also being written to a log file which doesn't contain the ansi codes. To accomplish this a string formatter curly brace entry is added to the line where the log formatters do the format() and replace the braces with either an ansi color code or an empty string.
Example:

"{color.grey}Log entry which {might contain curly} braces in the string {color.reset}"

The logic to replace the color entries is done using a partial formatter where it attempts to itemize all the fields in the string replacing only those which exist in the dictionary passed in. It does the job with exception of singleton curly braces.

def partialformat(s: str, recursionlimit: int = 10, **kwargs):
    """
    vformat does the acutal work of formatting strings. _vformat is the 
    internal call to vformat and has the ability to alter the recursion 
    limit of how many embedded curly braces to handle. But for some reason 
    vformat does not.  vformat also sets the limit to 2!   

    The 2nd argument of _vformat 'args' allows us to pass in a string which 
    contains an empty curly brace set and ignore them.
    """

    class FormatPlaceholder(object):
        def __init__(self, key):
            self.key = key

        def __format__(self, spec):
            result = self.key
            if spec:
                result += ":" + spec
            return "{" + result + "}"

        def __getitem__(self, item):
            return

    class FormatDict(dict):
        def __missing__(self, key):
            return FormatPlaceholder(key)

    class PartialFormatter(string.Formatter):
        def get_field(self, field_name, args, kwargs):
            try:
                obj, first = super(PartialFormatter, self).get_field(field_name, args, kwargs)
            except (IndexError, KeyError, AttributeError):
                first, rest = formatter_field_name_split(field_name)
                obj = '{' + field_name + '}'

                # loop through the rest of the field_name, doing
                #  getattr or getitem as needed
                for is_attr, i in rest:
                    if is_attr:
                        try:
                            obj = getattr(obj, i)
                        except AttributeError as exc:
                            pass
                    else:
                        obj = obj[i]

            return obj, first

    fmttr = PartialFormatter()
    try:
        fs, _ = fmttr._vformat(s, ("{}",), FormatDict(**kwargs), set(), recursionlimit)
    except ValueError as exc:
        #if we are ever to auto escape unmatched curly braces, it shall go here.
        raise exc
    except Exception as exc:
        raise exc
    return fs

Usage:

class Color:
    grey = '\033[90m'
    reset = '\033[0m'

colorobj = Color()

try:
    s = partialformat(s, **{"color" : colorobj})
except ValueError as exc:
    pass

outputs:

"Log entry which {might contain curly} braces in the string"

or

"\033[90mLog entry which {might contain curly} braces in the string \033[0m"

Additional Edit:

The problem I'm facing is when a string contains a single curly brace I cannot call partialformat on the string as it raise a ValueError Exception "Single '{' encountered in format string" . This causes the ability to colorize the log line to fail.

s = "{trco.grey}FAILED{trco.r} message {blah blah blah"

I figured I might be able to automatically escape the singleton curly braces if I can detect where they are in the string. It's just proving to be more difficult than I had expected.

Yet another edit:

I believe this is a problem with order of events.

  1. Original string s = "text with a { single curly brace"
  2. Colorizer function adds some basic curly braced text that will be replaced later: "{color.red}text with a { single curly brace{color.reset}"
  3. During logging.Formatter.doFormat() do a replace on {color.red} with the ansi color code.

Try this:

string = "abcd {{a}} {{{{a}{{a}}"
indices = []
for i, e in enumerate(string):
    if e == '{':
        indices.append(i)
    elif e == '}':
        indices.pop()
print(indices)

this prints: [11, 12, 13] , which are the indices

what I did is iterate over the letters and count only the opened braces, knowing that the deepest curly braces closes first, and then return the indices of these opened braces

Regex would work for this job.

>>>import re
>>>t = re.finditer("\s{\s", "asdf {{a}} {{ { {a} { {a} }") 
>>>for a in t:
    print (a.start())
13
19

The original question was how can you identify curly braces that aren't matched pairs. The problem is I was trying to identify them at a point where it is impossible to do so.

Example:

Some would say this middle brace is out of place.

"{{a}}{b}}"
     ^

While others might think the last one is out of place

"{{a}}{b}}"
         ^

It's impossible to know from the text snippet alone which brace shouldn't be there. Thus my original question is not definitively solvable. At the time I wrote this post I didn't realize I was asking the wrong question.

My original problem: How do you add a marker to logged text which could be formatted later (eg during the .doFormat() method of logging) which can be replaced with either the ansi color code or stripped out depending on which formatter is handling the text?

So that a string that is going to be logged to screen will contain ansi color codes, but when it is written to the file log those codes are stripped out.

As far as proper StackOverflow etiquette goes, I'm not sure if I should completely rework my question, close it, or just answer it here.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM