[英]In Perl, what is the sane way for converting a string into a list of its characters?
I have been wondering if there's a nicer, but concise way for splitting a string into its characters 我一直想知道是否有一种更好,更简洁的方法将字符串拆分为字符
@characters = split //, $string
is not that hard to read, but somehow the use of a regular expression looks like overkill to me. 并不难读,但不知怎的,使用正则表达式对我来说太过分了。
I have come up with this: 我想出了这个:
@characters = map { substr $string, $_, 1 } 0 .. length($string) - 1
but I find it uglier and less readable. 但我发现它更丑陋且不太可读。 What is your preferred way of splitting that string into its characters?
将字符串拆分为字符的首选方法是什么?
I thought it might be a good idea to see how fast some of the ways are to split a string on every character. 我认为看一下在每个角色上分割字符串的方法有多快可能是个好主意。
I ran the test against several versions of Perl that I happen to have on my computer. 我对我的计算机上碰巧遇到的几个Perl版本进行了测试。
use 5.010;
use Benchmark qw(:all) ;
my %bench = (
'split' => sub{
state $string = 'x' x 1000;
my @chars = split //, $string;
\@chars;
},
'split-string' => sub{
state $string = 'x' x 1000;
my @chars = split '', $string;
\@chars;
},
'split-capture' => sub{
state $string = 'x' x 1000;
my @chars = split /(.)/, $string;
\@chars;
},
'unpack' => sub{
state $string = 'x' x 1000;
my @chars = unpack( '(a)*', $string );
\@chars;
},
'match' => sub{
state $string = 'x' x 1000;
my @chars = $string =~ /./gs;
\@chars;
},
'match-capture' => sub{
state $string = 'x' x 1000;
my @chars = $string =~ /(.)/gs;
\@chars;
},
'map-substr' => sub{
state $string = 'x' x 1000;
my @chars = map { substr $string, $_, 1 } 0 .. length($string) - 1;
\@chars;
},
);
# set the initial state of $string
$_->() for values %bench;
cmpthese( -10, \%bench );
for perl in /usr/bin/perl /opt/perl-5.10.1/bin/perl /opt/perl-5.11.2/bin/perl;
do
$perl -v | perl -nlE'if( /(v5\.\d+\.\d+)/ ){
say "## Perl $1";
say "<pre>";
last;
}';
$perl test.pl;
echo -e '</pre>\n';
done
Rate split-capture match-capture map-substr match unpack split split-string split-capture 296/s -- -20% -20% -23% -58% -63% -63% match-capture 368/s 24% -- -0% -4% -48% -54% -54% map-substr 370/s 25% 0% -- -3% -48% -53% -54% match 382/s 29% 4% 3% -- -46% -52% -52% unpack 709/s 140% 93% 92% 86% -- -11% -11% split 793/s 168% 115% 114% 107% 12% -- -0% split-string 795/s 169% 116% 115% 108% 12% 0% --
Rate split-capture map-substr match-capture match unpack split split-string split-capture 301/s -- -31% -41% -47% -60% -65% -66% map-substr 435/s 45% -- -14% -23% -42% -50% -50% match-capture 506/s 68% 16% -- -10% -32% -42% -42% match 565/s 88% 30% 12% -- -24% -35% -35% unpack 743/s 147% 71% 47% 32% -- -15% -15% split 869/s 189% 100% 72% 54% 17% -- -1% split-string 875/s 191% 101% 73% 55% 18% 1% --
Rate split-capture match-capture match map-substr unpack split-string split split-capture 300/s -- -28% -32% -38% -59% -63% -63% match-capture 420/s 40% -- -5% -13% -42% -48% -49% match 441/s 47% 5% -- -9% -39% -46% -46% map-substr 482/s 60% 15% 9% -- -34% -41% -41% unpack 727/s 142% 73% 65% 51% -- -10% -11% split-string 811/s 170% 93% 84% 68% 12% -- -1% split 816/s 171% 94% 85% 69% 12% 1% --
As you can see split is the quickest, owing to the fact that this is a special case in the code for split
. 正如你所看到的, split是最快的,因为这是
split
代码中的特殊情况。
split-capture is the slowest, probably because it has to set $1
, along with several other match variables. split-capture是最慢的,可能是因为它必须设置
$1
,以及其他几个匹配变量。
So I would recommend going with plain old split //, ...
, or the roughly equivalent split '', ...
. 因此,我建议使用普通的旧
split //, ...
,或大致等效的split '', ...
Why would using a regular expression be "overkill"? 为什么使用正则表达式会“过度杀伤”? Many worry that regexes in Perl are overkill because they think that running them involves a highly complex and slow regex algorithm.
许多人担心Perl中的正则表达式过度,因为他们认为运行它们涉及高度复杂和缓慢的正则表达式算法。 That's not always true: the implementation is highly optimized and many simple cases are treated specially: what looks like a regex may actually perform as well as a simple substring search.
这并非总是如此:实现是高度优化的,并且特别处理了许多简单的情况:看起来像正则表达式实际上可以执行的操作和简单的子字符串搜索一样。 I wouldn't be surprised at all if this type of
split
is optimized as well. 如果这种类型的
split
也得到优化,我也不会感到惊讶。 split
is faster than your map
in some tests I ran. split
比你更快map
在一些测试中,我跑。 unpack
appears to be slightly faster than split
. unpack
似乎比split
更快。
I recommend split
because it is the "idiomatic" way. 我建议
split
因为它是“惯用”的方式。 You'll find it in perldoc, in many books, and any good Perl programmer should know it (if you are not sure your audience will understand it, you can always add a comment to the code like someone suggested.) 你可以在perldoc,很多书中找到它,任何优秀的Perl程序员都应该知道它(如果你不确定你的读者会理解它,你总是可以像有人建议的那样在代码中添加注释。)
OTOH, if regexes are "overkill" only because the syntax is ugly, then it's too subjective for me to say anything. OTOH,如果正则表达式“过度杀伤”只是因为语法难看,那么对我来说这太过主观了。 ;-)
;-)
It doesn't get much clearer than using the split
function to split a string. 它比使用
split
函数拆分字符串要清晰得多。 I suppose you could argue that the null pattern is unintuitive; 我想你可以说零点模式是不直观的; though I find it clear enough.
虽然我觉得很清楚。 If you want a "clean" alternative wrap it in a sub:
如果你想要一个“干净”的替代方案将它包装在一个sub中:
my @characters = chars($string);
sub chars { split //, $_[0] }
For less readable and more concise (and still with regex overkill): 对于不太可读和更简洁(仍然使用正则表达式矫枉过正):
@characters = $string =~ /./g;
(I learned this idiom from playing code-golf.) (我从打代码高尔夫中学到了这个习语。)
You're right. 你是对的。 The standard way to do it is
split //, $string
. 执行此操作的标准方法是
split //, $string
。 To make code more readable you can create a simple function: 为了使代码更具可读性,您可以创建一个简单的函数:
sub get_characters {
my ($string) = @_;
return ( split //, $string );
}
@characters = get_characters($string);
Use split
with a null pattern to break up the string into individual characters: 使用带有空模式的
split
将字符串拆分为单个字符:
@characters = split //, $string;
If you just want the char codes, use unpack: 如果您只想要char代码,请使用unpack:
@values = unpack("C*", $string);
You may need to include use utf8
for unpack to work properly. 您可能需要
use utf8
来解压缩才能正常工作。 And you can also use unpack
+ chr
to split the string into individual characters, just TMTOWTDI: 您还可以使用
unpack
+ chr
将字符串拆分为单个字符,只需TMTOWTDI:
@characters = map chr, unpack("C*", $string);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.