简体   繁体   English

将格式设置控制字符(退格和回车)应用于字符串,而无需递归

[英]Apply formatting control characters (backspace and carriage return) to string, without needing recursion

What is the easiest way to "interpret" formatting control characters in a string, to show the results as if they were printed. “解释”字符串中的格式控制字符以显示结果就像打印出来一样,最简单的方法是什么。 For simplicity, I will assume there are no newlines in the string. 为简单起见,我假设字符串中没有换行符。

So for example, 例如

>>> sys.stdout.write('foo\br')

shows for , therefore 显示for ,因此

interpret('foo\\br') should be 'for' interpret('foo\\br')应该是'for'

>>>sys.sdtout.write('foo\rbar')

shows bar , therefore 显示bar ,因此

interpret('foo\\rbar') should be 'bar' interpret('foo\\rbar')应该是'bar'


I can write a regular expression substitution here, but, in the case of '\\b' replacement, it would have to be applied recursively until there are no more occurrences. 我可以在此处编写正则表达式替换,但是,如果使用'\\b'替换,则必须递归应用它,直到不再出现。 It would be quite complex if done without recursion. 如果不进行递归,那将是相当复杂的。

Is there an easier way? 有更容易的方法吗?

If efficiency doesn't matter, a simple stack would work fine: 如果效率不重要,那么简单的堆栈就可以了:

string = "foo\rbar\rbash\rboo\b\bba\br"

res = []
for char in string:
    if char == "\r":
        res.clear()
    elif char == "\b":
        if res: del res[-1]
    else:
        res.append(char)

"".join(res)
#>>> 'bbr'

Otherwise, I think this is about as fast as you can hope for in complex cases: 否则,我认为这在复杂的情况下可以达到您期望的最快速度:

string = "foo\rbar\rbash\rboo\b\bba\br"

try:
    string = string[string.rindex("\r")+1:]
except ValueError:
    pass

split_iter = iter(string.split("\b"))
res = list(next(split_iter, ''))
for part in split_iter:
    if res: del res[-1]
    res.extend(part)

"".join(res)
#>>> 'bbr'

Note that I haven't timed this. 请注意,我还没有计时。

Python's does not have any built-in or standard library module for doing this. Python没有这样做的内置或标准库模块。 However if you only care for simple control characters like \\r , \\b and \\n you can write a simple function to handle this: 但是,如果仅关心\\r\\b\\n等简单控制字符,则可以编写一个简单函数来处理此问题:

def interpret(text):
    lines = []
    current_line = []
    for char in text:
        if char == '\n':
            lines.append(''.join(current_line))
            current_line = []
        elif char == '\r':
            current_line.clear()
            # del current_line[:]  # in old python versions
        elif char == '\b':
            del current_line[-1:]
        else:
            current_line.append(char)
    if current_line:
        lines.append(current_line)
    return '\n'.join(lines)

You can extend the function handling any control character you want. 您可以扩展该函数来处理所需的任何控制字符。 For example you might want to ignore some control characters that don't get actually displayed in a terminal (eg the bell \\a ) 例如,您可能想忽略一些实际上不会显示在终端中的控制字符(例如,贝尔\\a

UPDATE: after 30 minutes of asking for clarifications and an example string, we find the question is actually quite different: "How to repeatedly apply formatting control characters (backspace) to a Python string?" 更新:经过30分钟的询问和一个示例字符串,我们发现问题实际上是完全不同的:“如何将格式化控制字符(退格)反复应用于 Python字符串?” In that case yes you apparently need to apply the regex/fn repeatedly until you stop getting matches. 在那种情况下,您显然需要重复应用regex / fn,直到停止获取匹配项为止。 SOLUTION: 解:

import re

def repeated_re_sub(pattern, sub, s, flags=re.U):
    """Match-and-replace repeatedly until we run out of matches..."""
    patc = re.compile(pattern, flags)

    sold = ''
    while sold != s:
        sold = s
        print "patc=>%s<    sold=>%s<   s=>%s<" % (patc,sold,s)
        s = patc.sub(sub, sold)
        #print help(patc.sub)

    return s

print repeated_re_sub('[^\b]\b', '', 'abc\b\x08de\b\bfg')
#print repeated_re_sub('.\b', '', 'abcd\b\x08e\b\bfg')

[multiple previous answers, asking for clarifications and pointing out that both re.sub(...) or string.replace(...) could be used to solve the problem, non-recursively.] [先前的多个答案,需要澄清,并指出re.sub(...)string.replace(...)均可用于非递归地解决问题。]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM