简体   繁体   English

Perl:匹配数组元素,然后将PREVIOUS(5个索引返回)数组元素复制到新数组

[英]Perl: Matching an array element then copying a PREVIOUS (5 indices back) array element to a new array

essentially what I'm trying to do is search through a large text file to identify every element that says " 基本上我正在尝试做的是搜索一个大文本文件来识别所有“ no hits found 没有找到命中 ", and copy that matches identifier to a new list. I am fine with the first part of this, but what I can't seem to figure out is how to then copy the element of the array exactly 5 indices back (which is an identifier) and copy it to a different array. “,并将匹配标识符的副本复制到一个新列表。我对第一部分很好,但我似乎无法弄清楚如何将数组的元素正好复制回5个索引(这是一个标识符)并将其复制到不同的数组。

I tried something like this, 我试过这样的事,

$fastafile = 'HpHcTEST.txt';
open(FASTAFILE, $fastafile);
@seq = <FASTAFILE>;
my $fastaid;
foreach (@seq) {
    if ($_ =~ /\*\*\*\*\* No hits found \*\*\*\*\*/){
        $fastaid .= $_[-5];
    }
}

print "here are the IDs\n";
print $fastaid;

with a tonne of variants of the [-5], but none of them worked.. I can't seem find any documentation on how to back reference and attain a previous element if a match is met. 使用[-5]的一大堆变体,但它们都没有工作..我似乎无法找到任何关于如何匹配,如果匹配满足后引用并获得前一个元素的文档。 Anyone know how to code for this? 有谁知道如何为此编码?

Thank you very much for your time. 非常感谢您的宝贵时间。

Andrew 安德鲁

A quick fix 快速解决

One way to do it is to walk over @seq with an index. 一种方法是使用索引遍历@seq

my @fastaid;

for (my $i = 0; $i < @seq; ++$i) {
    if ($seq[$i] =~ /\*\*\*\*\* No hits found \*\*\*\*\*/){
        push @fastaid, $seq[$i - 5] if $i >= 5;
    }
}

Note the change away from the scalar to an array named @fastaid , which you might print using 请注意从标量到名为@fastaid的数组的@fastaid ,您可以使用它来打印

print "Here are the IDs:\n";
print "  - $_\n" for @fastaid;

or even 甚至

print "Here are the IDs:\n",
      map "  - $_\n", @fastaid;

Adding polish 添加抛光剂

As brian d foy notes in a comment below, the code could be more elegant and express the intent more directly. 正如下面的评论中的brian d foy注释,代码可以更优雅,更直接地表达意图。

my $id_offset = 5;
my @fastaid;

for ($id_offset .. $#seq) {
    if ($seq[$_] =~ /\*\*\*\*\* No hits found \*\*\*\*\*/){
        push @fastaid, $seq[$_ - $id_offset];
    }
}

As documented in the “Scalar Values” section of perldata , $#seq is the index or of the last element in @seq . 正如perldata的“标量值”部分所述$#seq是索引或@seq中的最后一个元素。 The .. range operator correctly handles the case where @seq is fewer than $id_offset elements in length. 所述..范围操作者正确地处理其中的情况下@seq是少于$id_offset长度的元件。

The explicit regex-bind operator is still a bit unperlish. 显式的regex-bind运算符仍然有点不太常见。 You could go with 你可以去

my $id_offset = 5;
my @fastaid;

for my $i ($id_offset .. $#seq) {
  for ($seq[$i]) {
    push @fastaid, $seq[$i - $id_offset]
      if /\*\*\*\*\* No hits found \*\*\*\*\*/;
  }
}

or if you have at least version 5.10 或者如果您的版本至少为5.10

use feature 'switch';

# ...

my $id_offset = 5;
my @fastaid;

for my $i ($id_offset .. $#seq) {
  given ($seq[$i]) {
    when (/\*\*\*\*\* No hits found \*\*\*\*\*/) {
      push @fastaid, $seq[$i - $id_offset];
    }
  }
}

Historical note 历史记录

Back in the day, there was some talk of repurposing $# to track the index of an array traversal so you could have written 回到当天,有人谈到重新利用$#来跟踪数组遍历的索引,这样你就可以写了

for (@fastaid) {
    if (/\*\*\*\*\* No hits found \*\*\*\*\*/) {
        push @fastaid, $seq[$# - 5] if $# >= 5;
    }
}

but that never materialized. 但那从未实现过。

You can iterate over the indices and subscript to get the array elements: 您可以迭代索引和下标来获取数组元素:

for (5..$#seq) {
    $fastaid .= $seq[$_-5] if $seq[$_] =~ /your_regex/;
}

In Perl 5.12 or better you can also use each : 在Perl 5.12或更高版本中,您还可以使用each

while (my ($index, $value) = each @seq) {
    next if $index < 5;
    $fastaid .= $seq[$index-5] if $value =~ /your_regex/;
}
my @fasta_id = map { $seq[$_] =~ /your_regex/ ? $seq[$_-5] : () } 5 .. $#seq;

Use a 'for' loop instead of 'foreach', 使用'for'循环代替'foreach',

for ($index=0; $index < $#seq + 1; $index++) {
    if ($seq[$index] =~ /\*\*\*\*\* No hits found \*\*\*\*\*/){
        $fastaid .= $seq[$index-5];
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM