[英]Parsing a text file with multiple columns
我正在尝试提取以下文件中的11列:
http://bioinfo.mc.vanderbilt.edu/TSGene/Human_716_TSGs.txt
...到入门级大学生物信息学项目的标量列表中。 我的努力很有效,但并非十全十美,因为各列之间的空白量各不相同(请参见文件顶部以获取详细信息)。
use strict;
use warnings;
open FH, '<', 'tsg.txt' or die $!;
my $data = do {local $/; <FH>};
close FH or die $!;
my($id, $sym, $alias, $xref, $chromo, $band, $name, $gene_t, $desc, $nuc_seq,
$pro_seq) = $data =~ /(\S+)\s+
(\S+)\s+
(\S+)\s+
(\S+)\s+
(\S+)\s+
(\S+)\s+
(\S+)\s+
/xms;
print "GeneID: $id", "\n";
print "Gene_symbol: $sym", "\n";
print "Alias: $alias", "\n";
print "XRef: $xref", "\n";
print "Chromosome: $chromo", "\n";
print "Cytoband: $band", "\n";
print "Full_name: $name", "\n";
#print "Gene_type: $gene_t", "\n";
#print "Description: $desc", "\n";
#print "Nucleotide_sequence: $nuc_seq", "\n";
#print "Protein_sequence: $pro_seq", "\n";
谢谢您的帮助。
该文件看起来像它的tab
分开,您应该能够使用\\t
上的split
将每一行存储到一个数组中:
my @columns = split( "\t", $data );
然后,您可以通过建立索引来访问列:
my $id = $columns[0];
等等
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.