简体   繁体   English

在Perl中拆分包含数字的字符串

[英]Splitting a string which contains numerical digits in Perl

I want to split a string(which has numerical digits). 我想分割一个字符串(具有数字)。 In the below example, I want to split the string at k and k1. 在下面的示例中,我想将字符串拆分为k和k1。

my @array1=("0","23","1","4","65","7");
$k=1;$k1=0;
my $j=join("",@array1);
my @ar=split(/($k|$k1)/,$j);
print join(";",@ar),"\n\n";

The output is ;0;23;1;4657 输出为;0;23;1;4657

In the above output, extra semicolon ";" 在上面的输出中,多余的分号“;” is printing 正在打印

The expected output is 0;23;1;4657 预期输出为0;23;1;4657

When I try the above code for the below example, the output is correct: (0;5;123;4;6) ; 当我为以下示例尝试上述代码时,输​​出正确: (0;5;123;4;6) ; the extra semicolon is not printing here. 多余的分号不在此处打印。

my @array1=("0","5","1234","6");
$k=5;$k1=4;

I am not sure, for what reason the first example is printing extra semicolon ";" 我不确定,第一个示例出于什么原因要打印多余的分号“;”。

Can some one help me in this? 有人可以帮我吗?

The difference is when you split around the first character, you get an empty value at the beginning. 区别在于,当您分割第一个字符时,开头会得到一个空值。 Hence the extra ; 因此额外的; before the 0 (and after the ""). 在0之前(在“”之后)。 You'll similarly find ;; 您将同样找到;; when splitting on two adjacent characters 在两个相邻字符上分割时

So the absolute simplest fix would be to use grep to remove empty string: 因此,绝对最简单的解决方法是使用grep删除空字符串:

my @ar=split(/($k|$k1)/,$j);
@ar = grep /./, @ar;

This removes the empty strings in @ar. 这将删除@ar中的空字符串。

In the bigger picture, you might want to look at why you're joining strings just to split them back apart. 从更大的角度看,您可能想看看为什么要加入字符串只是为了将它们分开。 You're also splitting around a number in one place that could appear in another. 您还在一个地方分散了一个数字,而这个地方可能会出现在另一个地方。 Like if $k=1 and @array1 = (11, 23, 1, 4); 就像$ k = 1和@ array1 =(11,23,1,4);

This is a highly contrived example, has many issues (eg $k and $k1 need to be declared with "my", you should use strict etc.) and it's probably going to do something you don't want. 这是一个非常人为的示例,有很多问题(例如,需要用“ my”声明$ k和$ k1,应use strict等),并且可能会执行您不想要的事情。

The bottom line, and the reason you see the leading semicolon, is that if you split by a delimiter that matches at the beginning of the string, split will return an empty list element for that. 底线以及您看到前导分号的原因是,如果您通过在字符串开头匹配的定界符进行拆分 ,则split将为此返回一个空列表元素。

print join ';', split /0/, '0123';

There's some interesting behaviour in this code which I was not aware of and that's not been brought out in the other answers. 这段代码中有一些有趣的行为,我没有意识到,而其他答案中没有提到。 What normally happens with a split on a regular expression is that the characters that you're splitting on are omitted from the result. 在正则表达式上进行split通常会发生的情况是,要拆分的字符会从结果中省略。 However, it seems that if you have capturing parentheses in the regex, then the captured material is kept in the result. 但是,如果您在正则表达式中捕获了括号,则捕获的内容将保留在结果中。

Script 脚本

#!/usr/bin/env perl

use strict;
use warnings;

my @array1 = ("0", "23", "1", "4", "65", "7");
my $j = join("", @array1);
my $k;
my $k1;
my @ar;
print "Join [$j]\n";

$k = 1;
$k1 = 0;
printf "%-25s", "Version 1 /($k|$k1)/:";
@ar = split(/($k|$k1)/, $j);
print "[", join(";", @ar), "]\n";

printf "%-25s", "Version 2 /($k|$k1)/:";
$k = "1";
$k1 = "0";
@ar = split(/($k|$k1)/, $j);
print "[", join(";", @ar), "]\n";

printf "%-25s", "Version 3 /[01]/:";
@ar = split(/[01]/, $j);
print "[", join(";", @ar), "]\n";

printf "%-25s", "Version 4 /(0|1)/:";
@ar = split(/(0|1)/, $j);
print "[", join(";", @ar), "]\n";

printf "%-25s", "Version 5 /0|1/:";
@ar = split(/0|1/, $j);
print "[", join(";", @ar), "]\n";

printf "%-25s", "Version 6 /([46])/:";
@ar = split(/([46])/, $j);
print "[", join(";", @ar), "]\n";

printf "%-25s", "Version 7 /(?:[46])/:";
@ar = split(/(?:[46])/, $j);
print "[", join(";", @ar), "]\n";

Output 产量

Join [02314657]
Version 1 /(1|0)/:       [;0;23;1;4657]
Version 2 /(1|0)/:       [;0;23;1;4657]
Version 3 /[01]/:        [;23;4657]
Version 4 /(0|1)/:       [;0;23;1;4657]
Version 5 /0|1/:         [;23;4657]
Version 6 /([46])/:      [0231;4;;6;57]
Version 7 /(?:[46])/:    [0231;;57]

As you can see, when the capturing parentheses are present in the regex on which the string is split, the (captured) splitting characters are preserved. 正如您所看到的,当捕获字符串包含在正则表达式中,在该正则表达式中分割字符串时,将保留(捕获的)分割字符。 When the parentheses are missing or are explicitly non-capturing (Version 7), then the splitting characters are not preserved. 当括号丢失或明显不被捕获时(版本7),则不保留分割字符。

And, if you read the manual carefully, the split description does include the paragraph: 而且,如果您仔细阅读了手册, split说明的确会包含以下段落:

If the PATTERN contains capturing groups , then for each separator, an additional field is produced for each substring captured by a group (in the order in which the groups are specified, as per backreferences ; if any group does not match, then it captures the undef value instead of a substring. Also, note that any such additional field is produced whenever there is a separator (that is, whenever a split occurs), and such an additional field does not count towards the LIMIT. 如果PATTERN包含捕获组 ,则对于每个分隔符,将为由组捕获的每个子字符串生成一个附加字段(按照反向引用 ,按指定组的顺序;如果任何组不匹配,则捕获undef值,而不是一个子字符串。另外,请注意,任何这样的附加字段产生每当有一个分离器(即,每当发生分裂),并且这样的一个附加字段不向限制计数。

Followed by some examples. 接下来是一些例子。

Testing with Perl 5.16.0 on Mac OS X 10.7.5. 在Mac OS X 10.7.5上使用Perl 5.16.0进行测试。

One option is to use a regex instead of split . 一种选择是使用正则表达式而不是split This works for both data sets you've shown: 这适用于您显示的两个数据集:

use strict;
use warnings;

my @array1 = ( "0", "23", "1", "4", "65", "7" );
my $k      = 1;
my $k1     = 0;

my $j      = join( '', @array1 );
my @ar = $j =~ /([$k$k1]|[^$k$k1]+)/g;
print join( ";", @ar );

Output: 输出:

0;23;1;4657

Perl提供了一个非常强大的正则表达式替换构造,在这种情况下,它无需执行split -fu和join -fu:

$string =~ s{(?:$k|$k1)\K}{;}g ;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM