[英]How to categorizing by one column in Perl
I'm studying about Gene programing by perl.我正在研究 perl 的基因编程。 It is little difficult for me using Perl.使用 Perl 对我来说有点困难。 I am sorry about i'm not good at English.我很抱歉我英语不好。
I want to categorize by one column in perl.我想按 perl 中的一列进行分类。
This is my file.这是我的文件。 And file name is Annuum.v.2.1.gff3文件名为 Annuum.v.2.1.gff3
PGAv.1.6.scaffold1 PROTEIN gene 909002 910083 . + . ID=CA.PGAv.1.6.scaffold1.1
PGAv.1.6.scaffold1 PROTEIN mRNA 909002 910083 .+ ID=TC.CA.PGAv.1.6.scaffold1.1;Parent=CA.PGAv.1.6.scaffold1.1
PGAv.1.6.scaffold1 PROTEIN exon 909002 909168 . + 0 Parent=TC.CA.PGAv.1.6.scaffold1.1
PGAv.1.6.scaffold1 PROTEIN CDS 909002 909168 . + 0 Parent=TC.CA.PGAv.1.6.scaffold1.1
PGAv.1.6.scaffold1 PROTEIN exon 909759 910083 . + 1 Parent=TC.CA.PGAv.1.6.scaffold1.1
PGAv.1.6.scaffold1 PROTEIN CDS 909759 910083 . + 1 Parent=TC.CA.PGAv.1.6.scaffold1.1
PGAv.1.6.scaffold1 ISGAP gene 930723 931169 783 + . ID=CA.PGAv.1.6.scaffold1.2
PGAv.1.6.scaffold1 ISGAP mRNA 930723 931169 783 + . ID=TC.CA.PGAv.1.6.scaffold1.2;Parent=CA.PGAv.1.6.scaffold1.2
PGAv.1.6.scaffold1 ISGAP exon 930723 931169 . + . Parent=TC.CA.PGAv.1.6.scaffold1.2
PGAv.1.6.scaffold1 ISGAP CDS 930723 931169 . + . Parent=TC.CA.PGAv.1.6.scaffold1.2
I want to categorize by second column and i want to show the count number and ID.我想按第二列分类,我想显示计数和 ID。 Like this below.像下面这样。 I forgot sth.我忘了…… These are all genes.这些都是基因。
PROTEIN number CA.PGAv.1.6.scaffold1.1, CA.PGAv.1.6.scaffold1.3, ...
ISGAP number CA.PGAv.1.6.scaffold1.2, CA.PGAv.1.6.scaffold1.26, ...
Please help me.请帮我。 Thanks.谢谢。
We can use an implicit loop and autosplit mode together with a hash of the categories and arrays for each category.我们可以将隐式循环和自动拆分模式与类别的 hash 和每个类别的 arrays 一起使用。 This gives:这给出了:
#!/usr/bin/perl -anF\t
next unless /ID=([^;]*)\n/; # process only lines with only ID
$categories{$F[1]} = 1; # remember category (field 1)
push @{$F[1]}, $1; # add item to the category's array
END { for (keys %categories)
{ $number = @{$_}; # number of items in category
print "$_\t$number\t", join(", ", @{$_}), $/
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.