简体   繁体   English

Unix-将多个文件中的单个列编译为单个制表符分隔的文件

[英]Unix - Compile a single column from many files into a single, tab-delimited file

I have a large number of files with the same, tab-delimited format: 我有大量使用制表符分隔格式的文件:

Column A    Column B
Data_A1      Data_B1
Data_A2      Data_B2
Data_A3      Data_B3

These files all have the same number of lines. 这些文件都具有相同的行数。

I want to compile every file's Column B data into a single tab-delimited file. 我想将每个文件的B列数据编译为单个制表符分隔的文件。 Right now, my best plan is to write a Perl script along these lines: 现在,我最好的计划是按照以下方式编写Perl脚本:

#!/usr/bin/perl

my $file = shift @ARGV;
my $ref = shift @ARGV;
open ( FILE, $file ); # FILE WITH FORMAT DESCRIBED ABOVE
while (<FILE>) {
        chomp;
        my @a = split("\t", $_);
        push(@B, $a[1]);
}
close FILE;

my $counter = 0;
open (REF, $ref); # TAB-DELIMITED COMPILATION OF EVERY FILE'S COLUMN B
while (<REF>) {
        chomp;
        print "$_\t$B[$counter]\n";
}
close REF;

Then, write a BASH script that loops through all the files and saving the output of the Perl script as its input for the next iteration of the shell loop: 然后,编写一个循环遍历所有文件的BASH脚本,并将Perl脚本的输出保存为Shell循环的下一次迭代的输入:

#!/bin/bash

for file in *.txt 
     do 
          perl Script.pl $file Infile > Temp
          mv Temp Infile
     done

But this feels like a huge amount of work for something so simple. 但是,对于如此简单的事情来说,这感觉像是大量的工作。 Is there a simple Unix command that can do the same thing? 是否有一个简单的Unix命令可以执行相同的操作?

Expected Output: 预期产量:

File1_Column_B    File2_Column_B    File3_Column_B    ...
Data_B1           Data_B1           Data_B1           ...
Data_B2           Data_B2           Data_B2           ...
Data_B3           Data_B3           Data_B3           ...
...

bash: 重击:

paste -d'\t' input*.txt | 
awk -F'\t' '{for (i=2; i<=NF; i+=2) printf "%s%s", $i, FS; print ""}'

This pastes all the files together, with all columns, then use awk to extract only the even-numbered columns. 这会将所有文件和所有列粘贴在一起,然后使用awk仅提取偶数列。

You can do all the work in Perl: 您可以在Perl中完成所有工作:

#!/usr/bin/perl
use warnings;
use strict;

my ($result, @input) = @ARGV;        # output input1 input2...

my @table;

for my $i (0 .. $#input) {
    my $infile = $input[$i];
    open my $IN, '<', $infile or die "$infile: $!";
    while (<$IN>) {
        $table[ $. - 1 ][$i] = (split)[1];
    }
}

open my $OUT, '>', $result or die "$result: $!";
for my $row (@table) {
    print {$OUT} join("\t", @$row), "\n";
}
close $OUT;

You can use awk to select the columns you want and paste to paste them together. 您可以使用awk选择所需的列,然后paste以将它们粘贴在一起。

Example: 例:

paste -d '\t' <(awk '{print $2}' file1.tsv) <(awk '{print $3}' file2.tsv) 

NOTE: <(command) Allows the output of your command to be used as file. 注意: <(command)允许将<(command)的输出用作文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何构建具有许多计算值的制表符分隔的文本文件? - How to build a tab-delimited text file with many calculated values? 将文本从另一个文件追加到现有制表符分隔文件中的列 - Appending text to column in existing tab-delimited file from another file 将文本预先添加到制表符分隔文件中的特定列? - Prepending text to a specific column in a tab-delimited file? 从制表符分隔的文件中提取部分数据 - Extracting parts of data from a tab-delimited file 将 2 个 txt 文件批量合并到单个制表符分隔的文件中 - Merge 2 txt files in a single tab delimited file in batch 匹配2个制表符分隔的文本文件中的列 - match columns in 2 tab-delimited text files 在UNIX环境中操作制表符分隔的表 - manipulating tab-delimited table within unix environment Bash - 删除目录中只有一列的制表符分隔文件 - Bash - Delete tab-delimited files in a directory that have only one column 制作一个制表符分隔的文件,其中包含来自文件的信息以及文件名的一部分 - Make a tab-delimited file containing information from a file and part of a filename UNIX Shell脚本,用于从文件运行grep命令列表,并在单个定界文件中获取结果 - UNIX shell script to run a list of grep commands from a file and getting result in a single delimited file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM