[英]Perl: Matching an array element then copying a PREVIOUS (5 indices back) array element to a new array
essentially what I'm trying to do is search through a large text file to identify every element that says " 基本上我正在尝试做的是搜索一个大文本文件来识别所有“ no hits found
没有找到命中 ", and copy that matches identifier to a new list. I am fine with the first part of this, but what I can't seem to figure out is how to then copy the element of the array exactly 5 indices back (which is an identifier) and copy it to a different array.
“,并将匹配标识符的副本复制到一个新列表。我对第一部分很好,但我似乎无法弄清楚如何将数组的元素正好复制回5个索引(这是一个标识符)并将其复制到不同的数组。
I tried something like this, 我试过这样的事,
$fastafile = 'HpHcTEST.txt';
open(FASTAFILE, $fastafile);
@seq = <FASTAFILE>;
my $fastaid;
foreach (@seq) {
if ($_ =~ /\*\*\*\*\* No hits found \*\*\*\*\*/){
$fastaid .= $_[-5];
}
}
print "here are the IDs\n";
print $fastaid;
with a tonne of variants of the [-5], but none of them worked.. I can't seem find any documentation on how to back reference and attain a previous element if a match is met. 使用[-5]的一大堆变体,但它们都没有工作..我似乎无法找到任何关于如何匹配,如果匹配满足后引用并获得前一个元素的文档。 Anyone know how to code for this?
有谁知道如何为此编码?
Thank you very much for your time. 非常感谢您的宝贵时间。
Andrew 安德鲁
One way to do it is to walk over @seq
with an index. 一种方法是使用索引遍历
@seq
。
my @fastaid;
for (my $i = 0; $i < @seq; ++$i) {
if ($seq[$i] =~ /\*\*\*\*\* No hits found \*\*\*\*\*/){
push @fastaid, $seq[$i - 5] if $i >= 5;
}
}
Note the change away from the scalar to an array named @fastaid
, which you might print using 请注意从标量到名为
@fastaid
的数组的@fastaid
,您可以使用它来打印
print "Here are the IDs:\n";
print " - $_\n" for @fastaid;
or even 甚至
print "Here are the IDs:\n",
map " - $_\n", @fastaid;
As brian d foy
notes in a comment below, the code could be more elegant and express the intent more directly. 正如下面的评论中的
brian d foy
注释,代码可以更优雅,更直接地表达意图。
my $id_offset = 5;
my @fastaid;
for ($id_offset .. $#seq) {
if ($seq[$_] =~ /\*\*\*\*\* No hits found \*\*\*\*\*/){
push @fastaid, $seq[$_ - $id_offset];
}
}
As documented in the “Scalar Values” section of perldata , $#seq
is the index or of the last element in @seq
. 正如perldata的“标量值”部分所述 ,
$#seq
是索引或@seq
中的最后一个元素。 The ..
range operator correctly handles the case where @seq
is fewer than $id_offset
elements in length. 所述
..
范围操作者正确地处理其中的情况下@seq
是少于$id_offset
长度的元件。
The explicit regex-bind operator is still a bit unperlish. 显式的regex-bind运算符仍然有点不太常见。 You could go with
你可以去
my $id_offset = 5;
my @fastaid;
for my $i ($id_offset .. $#seq) {
for ($seq[$i]) {
push @fastaid, $seq[$i - $id_offset]
if /\*\*\*\*\* No hits found \*\*\*\*\*/;
}
}
or if you have at least version 5.10 或者如果您的版本至少为5.10
use feature 'switch';
# ...
my $id_offset = 5;
my @fastaid;
for my $i ($id_offset .. $#seq) {
given ($seq[$i]) {
when (/\*\*\*\*\* No hits found \*\*\*\*\*/) {
push @fastaid, $seq[$i - $id_offset];
}
}
}
Back in the day, there was some talk of repurposing $#
to track the index of an array traversal so you could have written 回到当天,有人谈到重新利用
$#
来跟踪数组遍历的索引,这样你就可以写了
for (@fastaid) {
if (/\*\*\*\*\* No hits found \*\*\*\*\*/) {
push @fastaid, $seq[$# - 5] if $# >= 5;
}
}
but that never materialized. 但那从未实现过。
You can iterate over the indices and subscript to get the array elements: 您可以迭代索引和下标来获取数组元素:
for (5..$#seq) {
$fastaid .= $seq[$_-5] if $seq[$_] =~ /your_regex/;
}
In Perl 5.12 or better you can also use each
: 在Perl 5.12或更高版本中,您还可以使用
each
:
while (my ($index, $value) = each @seq) {
next if $index < 5;
$fastaid .= $seq[$index-5] if $value =~ /your_regex/;
}
my @fasta_id = map { $seq[$_] =~ /your_regex/ ? $seq[$_-5] : () } 5 .. $#seq;
Use a 'for' loop instead of 'foreach', 使用'for'循环代替'foreach',
for ($index=0; $index < $#seq + 1; $index++) {
if ($seq[$index] =~ /\*\*\*\*\* No hits found \*\*\*\*\*/){
$fastaid .= $seq[$index-5];
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.