简体   繁体   English

Python正则表达式匹配VT100转义序列

[英]Python regex to match VT100 escape sequences

I'm writing a Python program that logs terminal interaction (similar to the script program), and I'd like to filter out the VT100 escape sequences before writing to disk. 我正在编写一个记录终端交互的Python程序(类似于脚本程序),我想在写入磁盘之前过滤掉VT100转义序列。 I'd like to use a function like this: 我想使用这样的函数:

def strip_escapes(buf):
    escape_regex = re.compile(???) # <--- this is what I'm looking for
    return escape_regex.sub('', buf)

What should go in escape_regex ? escape_regex应该怎么escape_regex

The combined expression for escape sequences can be something generic like this: 转义序列的组合表达式可以像这样通用:

(\x1b\[|\x9b)[^@-_]*[@-_]|\x1b[@-_]

Should be used with re.I 应该与re.I一起使用

This incorporates: 这包括:

  1. Two-byte sequences, ie \\x1b followed by a character in the range of @ until _ . 双字节序列,即\\x1b后跟@_范围内的字符。
  2. One-byte CSI, ie \\x9b as opposed to \\x1b + "[" . 单字节CSI,即\\x9b而不是\\x1b + "["

However, this will not work for sequences that define key mappings or otherwise included strings wrapped in quotes. 但是,这不适用于定义键映射的序列或以其他方式包含用引号括起的字符串。

VT100 codes are already grouped(mostly) according to similar patterns here: VT100代码已根据类似的模式(大多数)进行分组:

http://ascii-table.com/ansi-escape-sequences-vt-100.php http://ascii-table.com/ansi-escape-sequences-vt-100.php

I think the simplest approach would be to use some tool like regexbuddy to define a regex for each VT100 codes group. 我认为最简单的方法是使用regexbuddy之类的工具为每个VT100代码组定义一个正则表达式。

I found the following solution to successfully parse vt100 color codes and remove the non-printable escape sequences. 我找到了以下解决方案来成功解析vt100颜色代码并删除不可打印的转义序列。 The code snippet found here successfully removed all codes for me when running a telnet session using telnetlib: 在使用telnetlib运行telnet会话时, 此处找到的代码段成功删除了所有代码:

    def __processReadLine(self, line_p):
    '''
    remove non-printable characters from line <line_p>
    return a printable string.
    '''

    line, i, imax = '', 0, len(line_p)
    while i < imax:
        ac = ord(line_p[i])
        if (32<=ac<127) or ac in (9,10): # printable, \t, \n
            line += line_p[i]
        elif ac == 27:                   # remove coded sequences
            i += 1
            while i<imax and line_p[i].lower() not in 'abcdhsujkm':
                i += 1
        elif ac == 8 or (ac==13 and line and line[-1] == ' '): # backspace or EOL spacing
            if line:
                line = line[:-1]
        i += 1

    return line

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM