將UTF8字符串轉換為Perl中的數值

Question

例如，

my $str = '中國c'; # Chinese language of china

我想打印出數值

20013,22283,99

Answer 1

unpack比split和ord更有效，因為它不需要創建一堆臨時的1個字符的字符串：

use utf8;

my $str = '中國c'; # Chinese language of china

my @codepoints = unpack 'U*', $str;

print join(',', @codepoints) . "\n"; # prints 20013,22283,99

快速基准測試顯示它比split+ord快3倍：

use utf8;
use Benchmark 'cmpthese';

my $str = '中國中國中國中國中國中國中國中國中國中國中國中國中國中國c';

cmpthese(0, {
  'unpack'     => sub { my @codepoints = unpack 'U*', $str; },
  'split-map'  => sub { my @codepoints = map { ord } split //, $str },
  'split-for'  => sub { my @cp; for my $c (split(//, $str)) { push @cp, ord($c) } },
  'split-for2' => sub { my $cp; for my $c (split(//, $str)) { $cp = ord($c) } },
});

結果：

               Rate  split-map  split-for split-for2     unpack
split-map   85423/s         --        -7%       -32%       -67%
split-for   91950/s         8%         --       -27%       -64%
split-for2 125550/s        47%        37%         --       -51%
unpack     256941/s       201%       179%       105%         --

較短的字符串差異不太明顯，但unpack速度仍然快兩倍。 （ split-for2比其他分割快一點，因為它不構建代碼點列表。）

Answer 2

見perldoc -f ord ：

foreach my $c (split(//, $str))
{
    print ord($c), "\n";
}

或者壓縮成一行： my @chars = map { ord } split //, $str;

Data :: Dumper ed，這會產生：

Answer 3

要在源代碼中識別出use utf8; ，必須use utf8; 預先：

$ perl
use utf8;
my $str = '中國c'; # Chinese language of china
foreach my $c (split(//, $str))
{
    print ord($c), "\n";
}
__END__
20013
22283
99

或更簡潔，

print join ',', map ord, split //, $str;

Answer 4

http://www.perl.com/pub/2012/04/perlunicook-standard-preamble.html

#!/usr/bin/env perl


 use utf8;      # so literals and identifiers can be in UTF-8
 use v5.12;     # or later to get "unicode_strings" feature
 use strict;    # quote strings, declare variables
 use warnings;  # on by default
 use warnings  qw(FATAL utf8);    # fatalize encoding glitches
 use open      qw(:std :utf8);    # undeclared streams in UTF-8
 # use charnames qw(:full :short);  # unneeded in v5.16

# http://perldoc.perl.org/functions/sprintf.html
# vector flag
# This flag tells Perl to interpret the supplied string as a vector of integers, one for each character in the string. 

my $str = '中國c';

printf "%*vd\n", ",", $str;

將UTF8字符串轉換為Perl中的數值

問題描述

4 個解決方案

解決方案1
13 已采納 2010-08-22 21:59:48

解決方案2
3 2010-08-22 17:35:09

解決方案3
3 2010-08-22 18:20:33

解決方案4
2 2014-01-10 11:38:59

將UTF8字符串轉換為Perl中的數值

問題描述

4 個解決方案

解決方案1 13 已采納 2010-08-22 21:59:48

解決方案2 3 2010-08-22 17:35:09

解決方案3 3 2010-08-22 18:20:33

解決方案4 2 2014-01-10 11:38:59

解決方案1
13 已采納 2010-08-22 21:59:48

解決方案2
3 2010-08-22 17:35:09

解決方案3
3 2010-08-22 18:20:33

解決方案4
2 2014-01-10 11:38:59