简体   繁体   English

在Perl中第n个字符的位置/索引处分割字符串(或正则表达式匹配项)?

[英]Split string (or regex match) at position/index of nth character in Perl?

There is a similarly worded question, but I think this is slightly different. 有一个措辞类似的问题,但我认为这略有不同。

Basically, say I have this string: 基本上,说我有这个字符串:

" aa{bb{dccd " aa{bb{dccd

Here I would like to split the string at the last brace { ; 在这里,我想在最后一个括号{处分割字符串。 and have the parts returned as an array. 并将零件作为数组返回。 I can easily find the position (0-based index) of this character using rindex : 我可以使用rindex轻松找到此字符的位置(从0开始的索引):

perl -e '
$aa="aa{bb{dccd" ;
$ri = rindex($aa, "{") ;
print "$ri\n"; '

5

... and given that I'm not a Perl coder, first thing I think of is to use something like $str = split($aa, 3) . ...而且鉴于我不是Perl编码器,我想到的第一件事是使用类似$str = split($aa, 3) Unfortunately, that is not correct syntax - split takes a regex as first argument (what to match for), and string as second - and it doesn't take an integer position index as argument. 不幸的是,这是不正确的语法split将正则表达式作为第一个参数(要匹配的内容),将字符串作为第二个参数-并且它不采用整数位置索引作为参数。

I found posts like Perl Guru Forums: Perl Programming Help: Intermediate: split or splice string on char count? 我发现了类似Perl Guru论坛的帖子:Perl编程帮助:中级:在字符数上拆分或拼接字符串? , which recommend using substr in a similar context; ,建议在类似的情况下使用substr however, I'd have to write two substr s to populate the list as per the example above, and so I'd rather hear about alternatives to substr. 但是,按照上面的示例,我必须写两个substr来填充列表,所以我宁愿听到替代substr的信息。

Basically, if the problem of matching the position of N-th character can be expressed as a regex match, the split could work just as well - so that would be my primary question. 基本上,如果可以将第N个字符的位置匹配问题表示为正则表达式匹配,则split也可以正常工作-所以这将是我的主要问题。 However, I'd also be interested in hearing if there are Perl built-in functions that could accept a list/array of integers specifying character positions, and return an array containing the split sections. 但是,我也想知道是否有Perl内置函数可以接受指定字符位置的整数列表/数组,并返回包含拆分部分的数组。

EDIT: 编辑:

To summarize the above - I'd like to have the character indexes, because I'd like to print them out for debugging; 综上所述-我想拥有字符索引,因为我想将它们打印出来以进行调试; and at the same time, use them for splitting a string into array - but without using substr s. 同时使用它们将字符串拆分为数组-但不使用substr

EDIT2: I just realized that I left something out from the OP -- and that is, that in the problem that I'm working on, I have to first retrieve character indexes (by rindex or otherwise); EDIT2:我刚刚意识到我在OP中遗漏了一些东西-也就是说,在我正在解决的问题中,我必须首先检索字符索引(通过rindex或其他方式); then I have to do calculations on them (so they may increase, or decrease) - and only then am I supposed to split the string (based on the new index values). 然后我必须对它们进行计算(因此它们可能会增加或减少)-只有这样,我才应该对字符串进行分割(基于新的索引值)。 It may have been that my original example was too simple, and didn't express this focus on indexes/character positions much ( and not to mention that my first thought of split implies character indexes anyways - but I really cannot remember which programming language it came from :) ) 可能是我的原始示例太简单了,没有太多地关注索引/字符位置( 更不用说我对split初衷仍然意味着字符索引-但是我真的不记得它使用哪种编程语言来自:)

my ($pre, $post) = split /\{(?!.*\{)/s, $s;

or 要么

my ($pre, $post) = $s =~ /^(.*)\{(.*)/s;

The second is probably better. 第二个可能更好。

If you need the index of the { , use length($pre) . 如果您需要{的索引,请使用length($pre) (With the second solution, you could also use $-[2] - 1 . See @- and @+ in perlvar .) (使用第二种解决方案,您还可以使用$-[2] - 1 。请参见perlvar中的 @-@+ 。)

You wrote: 你写了:

I'd also be interested in hearing if there are Perl built-in functions that could accept a list/array of integers specifying character positions, and return an array containing the split sections. 我还想知道是否有Perl内置函数可以接受指定字符位置的整数列表/数组,并返回包含拆分部分的数组。

To create a function that takes a list of offsets and produces a list of substrings with those split positions, convert the offsets to lengths and pass these as an argument to unpack . 要创建一个使用偏移量列表并生成具有这些拆分位置的子字符串列表的函数,请将偏移量转换为长度,并将其作为参数传递给unpack

There's a &cut2fmt function in Chapter 1 of the Perl Cookbook that does this very thing. Perl Cookbook的第1章中有一个&cut2fmt函数可以完成此任务。 Here is an excerpt, reproduced here by kind permission of the author: 这是摘录,经作者的允许在此处转载:

Sometimes you prefer to think of your data as being cut up at specific columns. 有时,您更喜欢将数据视为在特定列中被分割。 For example, you might want to place cuts right before positions 8, 14, 20, 26, and 30. Those are the column numbers where each field begins. 例如,您可能希望将剪切片段放置在位置8、14、20、26和30之前。这些是每个字段开始的列号。 Although you could calculate that the proper unpack format is "A7 A6 A6 A6 A4 A*" , this is too much mental strain for the virtuously lazy Perl programmer. 尽管您可以计算出正确的unpack格式为"A7 A6 A6 A6 A4 A*" ,但是对于那些懒惰的Perl程序员来说,这太麻烦了。 Let Perl figure it out for you. 让Perl为您解决。 Use the cut2fmt function below: 使用下面的cut2fmt函数:

sub cut2fmt {
      my(@positions) = @_;
      my $template   = '';
      my $lastpos    = 1;
      foreach $place (@positions) {
          $template .= "A" . ($place - $lastpos) . " ";
          $lastpos   = $place;
      }
      $template .= "A*";
      return $template;
  }

  $fmt = cut2fmt(8, 14, 20, 26, 30);
  print "$fmt\n";

  A7 A6 A6 A6 A4 A*

So the way you would use that is like this: 因此,您将使用以下方式:

$fmt = cut2fmt(8, 14, 20, 26, 30);
@list = unpack($fmt, $string);

or directly as 或直接作为

@list = unpack(cut2fmt(8, 14, 20, 26, 30), $string);

I believe this is what you were asking for. 我相信这就是您要的。

Here are some ways: 以下是一些方法:

split /.*\K{/, $str;
split /{(?!.*{)/, $str;
$str =~ /(.*){(.*)/;

Use /regex/s if the string can span multiple lines. 如果字符串可以跨越多行,请使用/regex/s

The way to do this using rindex is to employ substr to extract the two parts of the string according to the position of the { . 使用rindex进行此操作的rindex是使用substr根据{的位置提取字符串的两个部分。

Note that this includes the { in the suffix part. 请注意,这在后缀部分包括{ To exclude it you would use $i + 1 in the second substr call. 要排除它,您可以在第二个substr调用中使用$i + 1

my $str = "aa{bb{dccd";

my $i = rindex $str, '{';
my $pref = substr $str, 0, $i;
my $suff = substr $str, $i;

print $pref, "\n";
print $suff, "\n";

output 输出

aa{bb
{dccd

Update 更新资料

I have just read about your wish to avoid substr and do the split in a single operation. 我刚刚读过有关您希望避免使用substr并在一次操作中进行拆分的愿望。 unpack will do that for you, like this 像这样unpack就能帮到您

my $str = "aa{bb{dccd";

my $i = rindex $str, '{';

my ($pref, $suff) = unpack "A$i A*", $str;

print $pref, "\n";
print $suff, "\n";

with identical output to the previous code. 与先前的代码具有相同的输出。

I still don't see what's so difficult about this. 我仍然看不出这有什么困难。 Is it that you don't want to discard the brace (or whatever your delimiter is)? 您是否不想舍弃括号(或任何分隔符)? These adaptations of @Qtax's solutions leave the brace in either the first or second substring: @Qtax解决方案的这些改编使大括号保留在第一个或第二个子字符串中:

# split before the brace
split /.*\K(?=\{)/, $str;
split /(?=\{(?!.*\{))/, $str;
$str =~ /(.*)(\{.*)/;

# split after the brace
split /.*\{\K)/, $str;
split /(?<=\{(?!.*\{))/, $str;
$str =~ /(.*\{)(.*)/;

(I know it isn't necessary to escape the brace, but I think it's a little easier to read this way.) (我知道没有必要逃避括号,但是我认为用这种方式阅读起来要容易一些。)

Right, I'll post this as an answer, this is how far I got. 是的,我将其作为答案发布,这是我取得的成就。

Thanks to these resources: 由于这些资源:

... I learned about the "curly brace" regex operator, {n} which ' Matches the preceding character, or character range, n times exactly '. ...我了解了“大括号”正则表达式运算符{n} ,它' 与前面的字符或字符范围完全匹配n次 Thus, I can match for /.{5}(.)/ : 因此,我可以匹配/.{5}(.)/

perl -e '
$aa="aa{bb{dccd" ;
$aa =~ /.{5}(.)/  && print "--${1}--\n"; '

--{--

this selects through first 5 "any" characters - and then select and print the next one. 这将通过前5个“任意”字符进行选择-然后选择并打印下一个。 Or: 要么:

/               # start regex
 {              # match "{" character
  {5}           # repeat previous five times
     (.)        # select into match group (the $1) next character
        /       # end regex

So, finally, I can use the rindex to perform such a split: 因此,最后,我可以使用rindex进行这样的拆分:

perl -e '
$aa="aa{bb{dccd" ;
$ri = rindex($aa, "{") ;
$aa =~ /.{$ri}(.)/  && print "--${1}--\n";
@res = split(/^.{$ri}(.)/, $aa);
print join("; ", @res) . "\n"; '

--{--
; {; dccd

.. but given that also requires some capturing at start, so here are other variants: ..但鉴于这也需要在开始时进行一些捕获,因此这里有其他变体:

@res = split(/^(.{$ri})(.)/, $aa);

--{--
; aa{bb; {; dccd


@res = split(/^(.{$ri})./, $aa);

--{--
; aa{bb; dccd

... which both would work for me - except I have a blank as first item, which I'd like to get rid of in one pass (without calling extra splice ), but don't know how to :) ...两者都对我有用-除了我有一个空白作为第一项内容,我想在一次通过中消除它(不调用多余的splice ,但是不知道如何:)

Cheers! 干杯!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM