简体   繁体   English

如何在bash中反转转义反斜杠编码,如“\\”和“\\ 303 \\ 266”?

[英]How do I reverse escape backslash encodings like “\ ” and “\303\266” in bash?

I have a script that records files with UTF8 encoded names. 我有一个脚本,记录UTF8编码名称的文件。 However the script's encoding / environment wasn't set up right, and it just recoded the raw bytes. 但是,脚本的编码/环境设置不正确,它只是重新编码原始字节。 I now have lots of lines in the file like this: 我现在在文件中有很多行,如下所示:

.../My\ Folders/My\ r\303\266m/...

So there are spaces in the filenames with \\ and UTF8 encoded stuff like \\303\\266 (which is ö ). 因此,文件名中有空格,其中包含\\和UTF8编码的内容,如\\303\\266 (即ö )。 I want to reverse this encoding? 我想反转这种编码? Is there some easy set of bash command line commands I can chain together to remove them? 是否有一些简单的bash命令行命令可以链接在一起删除它们?

I could get millions of sed commands but that'd take ages to list all the non-ASCII characters we have. 我可以获得数百万个sed命令,但是要花费很长时间才能列出我们拥有的所有非ASCII字符。 Or start parsing it in python. 或者开始在python中解析它。 But I'm hoping there's some trick I can do. 但我希望我能做到一些技巧。

Here's a rough stab at the Unicode characters: 这是对Unicode字符的粗略抨击:

text="/My\ Folders/My\ r\303\266m/"
text="echo \$\'"$(echo "$text"|sed -e 's|\\|\\\\|g')"\'"
# the argument to the echo must not be quoted or escaped-quoted in the next step
text=$(eval "echo $(eval "$text")")
read text < <(echo "$text")
echo "$text"

This makes use of the $'string' quoting feature of Bash. 这使用了Bash的$'string'引用功能。

This outputs "/My Folders/My röm/". 这将输出“/ My Folders /Myröm/”。

As of Bash 4.4, it's as easy as: 从Bash 4.4开始,它就像:

text="/My Folders/My r\303\266m/"
echo "${text@E}"

This uses a new feature of Bash called parameter transformation . 这使用了Bash的一个新功能,称为参数转换 The E operator causes the parameter to be treated as if its contents were inside $'string' in which backslash escaped sequences, in this case octal values, are evaluated. E运算符使得参数被视为其内容在$'string'中,其中反斜杠转义序列(在本例中为八进制值)被计算。

It is not clear exactly what kind of escaping is being used. 目前尚不清楚究竟使用了什么类型的转义。 The octal character codes are C, but C does not escape space. 八进制字符代码是C,但C不会逃避空间。 The space escape is used in the shell, but it does not use octal character escapes. 空间转义在shell中使用,但它不使用八进制字符转义。

Something close to C-style escaping can be undone using the command printf %b $escaped . 使用命令printf %b $escaped可以撤消接近C风格转义的内容。 (The documentation says that octal escapes start with \\0 , but that does not seem to be required by GNU printf.) Another answer mentions read for unescaping shell escapes, although if space is the only one that is not handled by printf %b then handling that case with sed would probably be better. (文档说八进制转义以\\0开头,但GNU printf似乎并不需要这样做。)另一个答案提到read unescaping shell转义,尽管如果空间是唯一一个不由printf %b处理的那么使用sed处理这种情况可能会更好。

In the end I used something like this: 最后我使用了这样的东西:

cat file | sed 's/%/%%/g' | while read -r line ; do printf "${line}\n" ; done | sed 's/\\ / /g'

Some of the files had % in them, which is a printf special character, so I had to 'double it up' so that it would be escaped and passed straight through. 有些文件中含有% ,这是一个printf特殊字符,所以我不得不“加倍”,以便它可以被转义并直接通过。 The -r in read stops read escaping the \\ 's however read doesn't turn "\\ " into " " , so I needed the final sed . -rread停止读取逃避\\的读取但是不转"\\ "" " ,所以我需要最终sed

Use printf to solve the issue with utf-8 text. 使用printf解决utf-8文本的问题。 Use read to take care of spaces (\\ ) . 使用read来处理空格(\\ )

Like this: 像这样:

$ text='/My\ Folders/My\ r\303\266m/'
$ IFS='' read t < <(printf "$text")
$ echo "$t"
/My Folders/My röm/

The built-in 'read' function will handle part of the problem: 内置的“读取”功能将处理部分问题:

$ echo "with\ spaces" | while read r; do echo $r; done
with spaces

Pass the file (line by line) to the following perl script. 将文件(逐行)传递给以下perl脚本。

#!/usr/bin/per

sub encode {
    $String = $_[0];
    $_ = $String;
    while(/(\\[0-9]+|.)/g) {
        $Match = $1;

        if ($Match =~ /\\([0-9]+)/) {
            $Code = oct(0 + $1);
            $Char = ((($Code >= 32) && ($Code  160))
                ? chr($Code)
                : sprintf("\\x{%X}", $Code);
            printf("%s", $Char);
        } else {
            print "$Match";
        }
    }

    print "\n";
}

while ($#ARGV >= 0) {
    $File = shift();
    open(my $F, ") {
        $String =~ s/\\ / /g;
        &encode($Line);
    }
}

Like this: 像这样:

$ ./PerlEncode.pl Test.txt

Where Test.txt contains: Test.txt包含:

/My\ Folders/My\ r\303\266m/
/My\ Folders/My\ r\303\266m/
/My\ Folders/My\ r\303\266m/

The line "$String =~ s/\\ / /g;" 行“$ String = ~s / \\ / / g;” replace "\\ " with " " and sub encode parse those unicode char. 将“\\”替换为“”,子编码解析那些unicode char。

Hope this help 希望这有帮助

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM