简体   繁体   中英

Replace control characters and spaces with escape sequences

I want to replace control characters (ASCII 0-31) and spaces (ASCII 32) with hex escape codes. For example:

$ escape 'label=My Disc'
label=My\x20Disc
$ escape $'multi\nline\ttabbed string'
multi\x0Aline\x09tabbed\x20string
$ escape '\'
\\

For context, I'm writing a script which statuses a DVD drive. Its output is designed to be parsed by another program. My idea is to print each piece of info as a separate space-separated word. For example:

$ ./discStatus --monitor
/dev/dvd: no-disc
/dev/dvd: disc blank writable size=0 capacity=2015385600
/dev/dvd: disc not-blank not-writable size=2015385600 capacity=2015385600

I want to add the disc's label to this output. To fit with the parsing scheme I need to escape spaces and newlines. I might as well do all the other control characters as well.

I'd prefer to stick to bash, sed, awk, tr, etc., if possible. I can't think of a really elegant way to do this with those tools, though. I'm willing to use perl or python if there's no good solution with basic shell constructs and tools.

Here's a Perl one-liner I came up with. It uses /e to run code in the replacements.

perl -pe 's/([\x00-\x20\\])/sprintf("\\x%02X", ord($1))/eg'

A slight deviation from the example in my question: it emits \\x5C for backslashes instead of \\\\ .

I would use a higher-level language. There are three different types of replacement going on (single character to multicharacter for the control characters and space, identity for other printable characters, and the special case of doubling the backslash), which I think is too much for awk , sed , and the like to handle simply.

Here's my approach for Python

def translate(c):
    cp = ord(c)
    if cp in range(33):
        return '\\x%02x'%(cp,)
    elif c == '\\':
        return r'\\'
    else:
        return c

if __name__ == '__main__':
    import sys
    print ''.join( map(translate, sys.argv[1]) )

If speed is a concern, you can replace the translate function with a prebuilt dictionary mapping each character to its desired string representation.

Wow, it looks like a fairly trivial sed script along the lines of 's|\\n|\\\\n|' for each character you want to substitute.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM