简体   繁体   English

如何使用 Perl 从制表符分隔的文件中提取特定列?

[英]How can I use Perl extract a particular column from a tab-separated file?

I am really new at Perl and have been trying to piece together a solution for this.我是 Perl 的新手,我一直在尝试为此拼凑出一个解决方案。 When I run this program I don't get any errors and it doesn't display anything.当我运行这个程序时,我没有收到任何错误,也没有显示任何内容。

The code is as follows:代码如下:

#!/usr/bin/perl
open (DATA, "<test1.txt") or die ("Unable to open file");
use strict; use warnings;
my $search_string = "Ball";
while ( my $row = <DATA> ) {

    last unless $row =~ /\S/;
    chomp $row;
    my @cells = split /\t/, $row;

    if ($cells[0] =~/$search_string/){
        print $cells[0];
    }
}

my test data file looks like this我的测试数据文件看起来像这样

Camera Make     Camera Model    Text    Ball    Swing
a       b       c       d       e
f       g       h       i       j
k       l       m       n       o

I am trying to see how this works before i use the actual test data file..在我使用实际的测试数据文件之前,我想看看它是如何工作的。

So how do I search for say "Ball" and have it return "din"那么我如何搜索说“Ball”并让它返回“din”

The reason you don't get any errors is because your program does exactly what you told it to (print all first column values that contain the string "Ball").您没有收到任何错误的原因是因为您的程序完全按照您的要求执行(打印包含字符串“Ball”的所有第一列值)。 Since none of the cells in the first column contain that string, your program prints nothing.由于第一列中没有任何单元格包含该字符串,因此您的程序不打印任何内容。

Your problem is not with your Perl (it could use some minor stylistic improvement - specifically you're using obsolete form of open() - but is mostly fine), it's with your algorithm .你的问题不在于你的 Perl (它可以使用一些小的风格改进 - 特别是你使用过时的open()形式 - 但大部分都很好),它与你的算法有关

HINT: your first task in the algorithm should be finding WHICH column (by number) is the "Ball" column.提示:您在算法中的第一个任务应该是找到哪个列(按数字)是“球”列。

Try this out:试试这个:

use strict;
use warnings;
use Data::Dumper;
use List::MoreUtils qw<first_index>;

my $column = first_index { $_ eq 'Ball' } split /\t/, <DATA>;
say Data::Dumper->Dump( [ $column ], [ '*column' ] );
my @balls  = map { [split /\t/]->[$column] } <DATA>;
say Data::Dumper->Dump( [ \@balls ], [ '*balls' ] );
__DATA__
Camera Make Camera Model    Text    Ball    Swing
a   b   c   d   e
f   g   h   i   j
k   l   m   n   o

You would pretty much have to change the handle from DATA to some file you open -ed.您几乎必须将句柄从DATA更改为您open的某个文件。

open( my $in, '<', '/path/to/data.file' ) 
    or die "Could not open file: $!"
    ;

And then substitute <DATA> for <$in> .然后用<DATA>代替<$in>

Try this instead:试试这个:

#!/usr/bin/perl
use strict;
use warnings;

open (DATA, "<test1.txt") or die ("Unable to open file");
my $search_string = "Ball";

my $header = <DATA>;
my @header_titles = split /\t/, $header;
my $extract_col = 0;

for my $header_line (@header_titles) {
  last if $header_line =~ m/$search_string/;
  $extract_col++;
}

print "Extracting column $extract_col\n";

while ( my $row = <DATA> ) {
  last unless $row =~ /\S/;
  chomp $row;
  my @cells = split /\t/, $row;
  print "$cells[$extract_col] ";
}

You can use Text::CSV_XS to very conveniently extract the data for you.您可以使用Text::CSV_XS非常方便地为您提取数据。 It might be overkill for your limited data, but it is a very solid solution.对于您有限的数据来说,这可能是矫枉过正,但它是一个非常可靠的解决方案。

Here I just use the DATA tag to contain the data, but if you prefer, you can replace that with a filehandle, such as open my $fh, '<', 'text1.txt';这里我只是使用DATA标记来包含数据,但如果您愿意,可以将其替换为文件句柄,例如open my $fh, '<', 'text1.txt'; and change *DATA to $fh .并将*DATA更改为$fh

Output: Output:

d i n

Code:代码:

use warnings;
use strict;
use Text::CSV_XS;
use autodie;

my $csv = Text::CSV_XS->new( { sep_char => "\t" } );
my @list;
$csv->column_names ($csv->getline (*DATA));
while ( my $hr = $csv->getline_hr(*DATA) ) {
    push @list, $hr->{'Ball'};
}

print "@list\n";
__DATA__
Camera Make Camera Model    Text    Ball    Swing
a   b   c   d   e
f   g   h   i   j
k   l   m   n   o

ETA: If you're going to cut & paste to try it out, make sure that the tabs are carried over in the data. ETA:如果您要剪切和粘贴来尝试一下,请确保选项卡在数据中保留。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何对选项卡分隔的数据文件中的列值进行平均,忽略标题行和左列? - How do I average column values from a tab-separated data file, ignoring a header row and the left column? 如何从命令行针对索引列的字典文件过滤制表符分隔的数据文件? - How to filter a tab-separated data file against a dictionary file for an index column from command-line? 如何使用Perl提取HTML文件的特定部分 - How can I use Perl to extract a particular part of an HTML file 如何匹配制表符分隔的矩阵文件中的字符串列表 - How to match a list of strings in a tab-separated matrix file 从制表符分隔文件中提取最后一列 - extract the last column from a tab separated file 如何从Perl中的逗号分隔值中提取值? - How can I extract a value from comma separated values in Perl? 如何扫描多个文本文件以获取信息并将其复制到制表符分隔的文件中 - How to scan multiple text files for information and copy it to a tab-separated file 如何在Perl中正确处理包含制表符分隔值的文件? - How can I correctly process this file containing tab separated values in Perl? 如何解析制表符分隔的数据文件,并将提取的数据分组到Perl中? - How can I parse a tab separated data file and group the extracted data in Perl? 从制表符分隔的表中提取文本子字符串 - Extracting a text sub-string from tab-separated table
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM