简体   繁体   English

我如何猜测字符串在Perl中是否具有文本或二进制数据?

[英]How can I guess if a string has text or binary data in Perl?

What is the best way to find out if the scalar value is ASCII/UTF8 (text) or a binary data in Perl? 找出标量值是ASCII / UTF8(文本)还是Perl中的二进制数据的最佳方法是什么? Is this code right?: 此代码正确吗?:

if (is_utf8($scalar, 1) or ($scalar =~ m/\A [[:ascii:]]* \Z/xms)) {
     # $scalar is a text
}
else {
     # $scalar is a binary
}

Is there a better way? 有没有更好的办法?

is_utf8 tests whether the Perl utf8 flag is turned on or not. is_utf8测试Perl utf8标志是否打开。 It's possible for a scalar to contain correctly formed utf-8 and not have the flag turned on. 标量可能包含正确格式的utf-8且未打开标志。 I think it's possible to deliberately turn the flag on even with malformed utf-8 too, but I'm not sure. 我认为即使是格式错误的utf-8,也有可能故意将标志打开,但是我不确定。

To check whether the scalar contains UTF-8 data, you need to check the flag, and if it is not, also try something like 要检查标量是否包含UTF-8数据,您需要检查该标志,如果不是,还可以尝试类似

eval {
    my $utf8 = decode_utf8 ($scalar);
}

and then check for errors in $@ . 然后检查$@错误。

To check whether a non-UTF-8 scalar contains non-ASCII data, your idea $scalar =~ m/\\A [[:ascii:]]* \\Z/xms looks ok. 要检查非UTF-8标量是否包含非ASCII数据,您的想法$scalar =~ m/\\A [[:ascii:]]* \\Z/xms看起来不错。

The best way, clearly, is to simply keep track when you are reading the data. 显然, 最好的方法是在读取数据时仅进行跟踪。 You as the programmer should already know whether you are getting text (and its encoding) or binary data. 作为程序员,您应该已经知道要获取文本(及其编码)还是二进制数据。 When you're reading text, you Encode::decode() it (see http://p3rl.org/UNI for details) into Perl text strings. 阅读文本时,可以将其Encode::decode() (有关详细信息,请参见http://p3rl.org/UNI )成Perl文本字符串。

If you really don't know beforehand, the -T and -B file tests offer a heuristic. 如果您真的不知道,则-T-B文件测试可提供启发式功能。

Disregard Kinopiko's answer, in the vast majority of cases, you should not need to know about the internal representation of data, and messing with the utility functions from the utf8 pragma module is the wrong approach. 忽略Kinopiko的回答,在大多数情况下,您不需要了解数据的内部表示,并且弄混utf8 pragma模块中的实用程序功能是错误的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM