简体   繁体   English

在Perl中,将字符串转换为字符列表的理智方法是什么?

[英]In Perl, what is the sane way for converting a string into a list of its characters?

I have been wondering if there's a nicer, but concise way for splitting a string into its characters 我一直想知道是否有一种更好,更简洁的方法将字符串拆分为字符

@characters = split //, $string

is not that hard to read, but somehow the use of a regular expression looks like overkill to me. 并不难读,但不知怎的,使用正则表达式对我来说太过分了。

I have come up with this: 我想出了这个:

@characters = map { substr $string, $_, 1 } 0 .. length($string) - 1

but I find it uglier and less readable. 但我发现它更丑陋且不太可读。 What is your preferred way of splitting that string into its characters? 将字符串拆分为字符的首选方法是什么?

Various examples, and speed comparisons. 各种例子和速度比较。

I thought it might be a good idea to see how fast some of the ways are to split a string on every character. 我认为看一下在每个角色上分割字符串的方法有多快可能是个好主意。

I ran the test against several versions of Perl that I happen to have on my computer. 我对我的计算机上碰巧遇到的几个Perl版本进行了测试。

test.pl test.pl

use 5.010;
use Benchmark qw(:all) ;
my %bench = (
   'split' => sub{
     state $string = 'x' x 1000;
     my @chars = split //, $string;
     \@chars;
   },
   'split-string' => sub{
     state $string = 'x' x 1000;
     my @chars = split '', $string;
     \@chars;
   },
   'split-capture' => sub{
     state $string = 'x' x 1000;
     my @chars = split /(.)/, $string;
     \@chars;
   },
   'unpack' => sub{
     state $string = 'x' x 1000;
     my @chars = unpack( '(a)*', $string );
     \@chars;
   },
   'match' => sub{
     state $string = 'x' x 1000;
     my @chars = $string =~ /./gs;
     \@chars;
   },
   'match-capture' => sub{
     state $string = 'x' x 1000;
     my @chars = $string =~ /(.)/gs;
     \@chars;
   },
   'map-substr' => sub{
     state $string = 'x' x 1000;
     my @chars = map { substr $string, $_, 1 } 0 .. length($string) - 1;
     \@chars;
   },
);
# set the initial state of $string
$_->() for values %bench;
cmpthese( -10, \%bench );
for perl in /usr/bin/perl /opt/perl-5.10.1/bin/perl /opt/perl-5.11.2/bin/perl;
do
  $perl -v | perl -nlE'if( /(v5\.\d+\.\d+)/ ){
    say "## Perl $1";
    say "<pre>";
    last;
  }';
  $perl test.pl;
  echo -e '</pre>\n';
done

Perl v5.10.0 Perl v5.10.0

Rate split-capture match-capture map-substr match unpack split split-string
split-capture 296/s            --          -20%       -20%  -23%   -58%  -63%         -63%
match-capture 368/s           24%            --        -0%   -4%   -48%  -54%         -54%
map-substr    370/s           25%            0%         --   -3%   -48%  -53%         -54%
match         382/s           29%            4%         3%    --   -46%  -52%         -52%
unpack        709/s          140%           93%        92%   86%     --  -11%         -11%
split         793/s          168%          115%       114%  107%    12%    --          -0%
split-string  795/s          169%          116%       115%  108%    12%    0%           --

Perl v5.10.1 Perl v5.10.1

Rate split-capture map-substr match-capture match unpack split split-string
split-capture 301/s            --       -31%          -41%  -47%   -60%  -65%         -66%
map-substr    435/s           45%         --          -14%  -23%   -42%  -50%         -50%
match-capture 506/s           68%        16%            --  -10%   -32%  -42%         -42%
match         565/s           88%        30%           12%    --   -24%  -35%         -35%
unpack        743/s          147%        71%           47%   32%     --  -15%         -15%
split         869/s          189%       100%           72%   54%    17%    --          -1%
split-string  875/s          191%       101%           73%   55%    18%    1%           --

Perl v5.11.2 Perl v5.11.2

Rate split-capture match-capture match map-substr unpack split-string split
split-capture 300/s            --          -28%  -32%       -38%   -59%         -63%  -63%
match-capture 420/s           40%            --   -5%       -13%   -42%         -48%  -49%
match         441/s           47%            5%    --        -9%   -39%         -46%  -46%
map-substr    482/s           60%           15%    9%         --   -34%         -41%  -41%
unpack        727/s          142%           73%   65%        51%     --         -10%  -11%
split-string  811/s          170%           93%   84%        68%    12%           --   -1%
split         816/s          171%           94%   85%        69%    12%           1%    --

As you can see split is the quickest, owing to the fact that this is a special case in the code for split . 正如你所看到的, split是最快的,因为这是split代码中的特殊情况。

split-capture is the slowest, probably because it has to set $1 , along with several other match variables. split-capture是最慢的,可能是因为它必须设置$1 ,以及其他几个匹配变量。

So I would recommend going with plain old split //, ... , or the roughly equivalent split '', ... . 因此,我建议使用普通的旧split //, ... ,或大致等效的split '', ...

Why would using a regular expression be "overkill"? 为什么使用正则表达式会“过度杀伤”? Many worry that regexes in Perl are overkill because they think that running them involves a highly complex and slow regex algorithm. 许多人担心Perl中的正则表达式过度,因为他们认为运行它们涉及高度复杂和缓慢的正则表达式算法。 That's not always true: the implementation is highly optimized and many simple cases are treated specially: what looks like a regex may actually perform as well as a simple substring search. 这并非总是如此:实现是高度优化的,并且特别处理了许多简单的情况:看起来像正则表达式实际上可以执行的操作和简单的子字符串搜索一样。 I wouldn't be surprised at all if this type of split is optimized as well. 如果这种类型的split也得到优化,我也不会感到惊讶。 split is faster than your map in some tests I ran. split 你更快map在一些测试中,我跑。 unpack appears to be slightly faster than split . unpack似乎比split更快。

I recommend split because it is the "idiomatic" way. 我建议split因为它是“惯用”的方式。 You'll find it in perldoc, in many books, and any good Perl programmer should know it (if you are not sure your audience will understand it, you can always add a comment to the code like someone suggested.) 你可以在perldoc,很多书中找到它,任何优秀的Perl程序员都应该知道它(如果你不确定你的读者会理解它,你总是可以像有人建议的那样在代码中添加注释。)

OTOH, if regexes are "overkill" only because the syntax is ugly, then it's too subjective for me to say anything. OTOH,如果正则表达式“过度杀伤”只是因为语法难看,那么对我来说这太过主观了。 ;-) ;-)

It doesn't get much clearer than using the split function to split a string. 它比使用split函数拆分字符串要清晰得多。 I suppose you could argue that the null pattern is unintuitive; 我想你可以说零点模式是不直观的; though I find it clear enough. 虽然我觉得很清楚。 If you want a "clean" alternative wrap it in a sub: 如果你想要一个“干净”的替代方案将它包装在一个sub中:

my @characters = chars($string);
sub chars { split //, $_[0] }

For less readable and more concise (and still with regex overkill): 对于不太可读和更简洁(仍然使用正则表达式矫枉过正):

@characters = $string =~ /./g;

(I learned this idiom from playing code-golf.) (我从打代码高尔夫中学到了这个习语。)

You're right. 你是对的。 The standard way to do it is split //, $string . 执行此操作的标准方法是split //, $string To make code more readable you can create a simple function: 为了使代码更具可读性,您可以创建一个简单的函数:

sub get_characters {
    my ($string) = @_;
    return ( split //, $string );
}

@characters = get_characters($string);

I prefer using the split technique. 我更喜欢使用分割技术。 It is well-known, and it is documented. 它是众所周知的,并且有记载。

Yet another way... 还有另一种方式......

@characters = $string =~ /./gs;

Use split with a null pattern to break up the string into individual characters: 使用带有空模式的split将字符串拆分为单个字符:

@characters = split //, $string;

If you just want the char codes, use unpack: 如果您只想要char代码,请使用unpack:

@values = unpack("C*", $string);

You may need to include use utf8 for unpack to work properly. 您可能需要use utf8来解压缩才能正常工作。 And you can also use unpack + chr to split the string into individual characters, just TMTOWTDI: 您还可以使用unpack + chr将字符串拆分为单个字符,只需TMTOWTDI:

@characters = map chr, unpack("C*", $string);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM