简体   繁体   English

即使要求使用 UTF-8 对其进行编码,Perl output 仍无法读取

[英]Perl output unreadable even despite asking to encode it using UTF-8

As is apparent in the title of my question I use Perl to do some regex related query.正如我的问题的标题所示,我使用 Perl 进行一些与正则表达式相关的查询。 However, I use UTF8 encoded input with diacritics in the body and when I generate an output it is always UTF-16 LE / UCS-2 LE BOM encoded.但是,我在正文中使用带有变音符号的 UTF8 编码输入,当我生成 output 时,它始终是UTF-16 LE / UCS-2 LE BOM编码的。

My main problem is that the diacritics in the input are replaced with '??'我的主要问题是输入中的变音符号被替换为'??' in the output.在 output 中。 I think this problem lies in the encoding.我认为这个问题在于编码。 I have tried to strip a lot of the code which could have been responsible for my problem, but the problem persisted.我试图剥离很多可能导致我的问题的代码,但问题仍然存在。

This is my code:这是我的代码:

# Input = élèvàtòr ôpëràtör
# Output = ??l??v??t??r ??p??r??t??r

use utf8;
use open qw(:std :utf8);

    {
    while (<STDIN>)
        {
        $line = $_;

        # remove long span (), <>
        $line =~ s/[\(\)]//g;

        # remove long span [] with everything in between
        while ($line =~ s/\[[^\[\]]*\]//g) {;}
        while ($line =~ s/\<[^\<\>]*\>//g) {;}

        printf("$line");
        }
    }

I have changed it multiple times as to the suggestions of several other answers to similar questions here on this site:关于本网站上类似问题的其他几个答案的建议,我已经多次更改:

I have tried this option:我试过这个选项:

use utf8;  # Source is encoded using UTF-8
use open ':std', ':encoding(locale)';

This resulted in my CLI stating这导致我的 CLI 声明

Cannot find encoding "locale" at /usr/share/perl5/core_perl/open.pm line 126.
Cannot find encoding "locale" at /usr/share/perl5/core_perl/open.pm line 134.

I have also tried running the following options tagged at the end of my command in my CLI:我还尝试在我的 CLI 中运行在命令末尾标记的以下选项:

-CDSL -le 'print "\x{1815}"'
-CO

More lines I put in my code without success:我在代码中输入的更多行没有成功:

binmode(STDOUT, ":utf8");
use open ":encoding(utf8)";
use open IN => ":encoding(utf8)", OUT => ":utf8";

Someone also recommended to use the也有人建议使用

'environment' '环境'

but I couldn't find anything as to how I would do that.但我找不到任何关于我将如何做到这一点的信息。 The code he suggested was:他建议的代码是:

export PERL_UNICODE=SDL

But I don't know where to put this or alter it.但我不知道在哪里放置或更改它。

I hope someone can help me with this problem.我希望有人可以帮助我解决这个问题。

The output you showed does not appear to be UTF-16 or UCS-2 as you claim.您显示的 output 似乎不是您声称的 UTF-16 或 UCS-2。 (There are too few ? .) There's also no evidence of a BOM. (太少了? 。)也没有 BOM 的证据。

On the other hand, the output is consistent with UTF-8.另一方面,output 与 UTF-8 一致。 é , è , à , ò , ô , ë , à and ö all encode as two bytes using UTF-8. éèàòôëàö都使用 UTF-8 编码为两个字节。

Perl is doing exactly what you asked, but you are viewing UTF-8 with a tool or terminal that expects a different encoding. Perl 完全按照您的要求进行操作,但是您正在使用需要不同编码的工具或终端查看 UTF-8。 You need to provide the proper encoding for your tool or terminal, or adjust what your tool or terminal expects.您需要为您的工具或终端提供正确的编码,或者调整您的工具或终端的期望。

For example, you can tell a Windows Console to expect UTF-8 by using chcp 65001 .例如,您可以使用chcp 65001告诉 Windows 控制台期待 UTF-8。

Since you provided absolutely no information about your tool or terminal, this is as far as we can help.由于您完全没有提供有关您的工具或终端的信息,因此我们可以提供帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM