简体   繁体   English

如何按 Perl 中的一列分类

[英]How to categorizing by one column in Perl

I'm studying about Gene programing by perl.我正在研究 perl 的基因编程。 It is little difficult for me using Perl.使用 Perl 对我来说有点困难。 I am sorry about i'm not good at English.我很抱歉我英语不好。

I want to categorize by one column in perl.我想按 perl 中的一列进行分类。

This is my file.这是我的文件。 And file name is Annuum.v.2.1.gff3文件名为 Annuum.v.2.1.gff3

PGAv.1.6.scaffold1  PROTEIN gene    909002  910083  .   +   .   ID=CA.PGAv.1.6.scaffold1.1
PGAv.1.6.scaffold1  PROTEIN mRNA    909002  910083  .+ ID=TC.CA.PGAv.1.6.scaffold1.1;Parent=CA.PGAv.1.6.scaffold1.1

PGAv.1.6.scaffold1  PROTEIN exon    909002  909168  .   +   0   Parent=TC.CA.PGAv.1.6.scaffold1.1
PGAv.1.6.scaffold1  PROTEIN CDS 909002  909168  .   + 0 Parent=TC.CA.PGAv.1.6.scaffold1.1

PGAv.1.6.scaffold1  PROTEIN exon    909759  910083  .   +   1   Parent=TC.CA.PGAv.1.6.scaffold1.1
PGAv.1.6.scaffold1  PROTEIN CDS 909759  910083  .   +   1   Parent=TC.CA.PGAv.1.6.scaffold1.1

PGAv.1.6.scaffold1  ISGAP   gene    930723  931169  783 +   .   ID=CA.PGAv.1.6.scaffold1.2
PGAv.1.6.scaffold1  ISGAP   mRNA    930723  931169  783 +   .   ID=TC.CA.PGAv.1.6.scaffold1.2;Parent=CA.PGAv.1.6.scaffold1.2

PGAv.1.6.scaffold1  ISGAP   exon    930723  931169  .   +   .   Parent=TC.CA.PGAv.1.6.scaffold1.2

PGAv.1.6.scaffold1  ISGAP   CDS 930723  931169  .   +   .   Parent=TC.CA.PGAv.1.6.scaffold1.2

I want to categorize by second column and i want to show the count number and ID.我想按第二列分类,我想显示计数和 ID。 Like this below.像下面这样。 I forgot sth.我忘了…… These are all genes.这些都是基因。

PROTEIN number      CA.PGAv.1.6.scaffold1.1, CA.PGAv.1.6.scaffold1.3, ...

ISGAP   number          CA.PGAv.1.6.scaffold1.2, CA.PGAv.1.6.scaffold1.26, ...

Please help me.请帮我。 Thanks.谢谢。

We can use an implicit loop and autosplit mode together with a hash of the categories and arrays for each category.我们可以将隐式循环和自动拆分模式与类别的 hash 和每个类别的 arrays 一起使用。 This gives:这给出了:

#!/usr/bin/perl -anF\t
next unless /ID=([^;]*)\n/; # process only lines with only ID
$categories{$F[1]} = 1;     # remember category (field 1)
push @{$F[1]}, $1;          # add item to the category's array
END { for (keys %categories)
      { $number = @{$_};    # number of items in category
        print "$_\t$number\t", join(", ", @{$_}), $/
      }
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM