简体   繁体   English

从多个文件中提取列 'x',并用 'x' 转置文件名

[英]Extract column 'x' from multiple files, and transpose file name with 'x'

I am trying to extract column "m" from multiple txt files (file1.txt, file2.txt,,,etc) and transpose each column to a row in new file.我试图从多个 txt 文件(file1.txt、file2.txt 等)中提取列“m”并将每一列转置为新文​​件中的一行。

Below is file1.txt :下面是file1.txt

contig_1    contig_1    geneX       ctg1_886;ctg1_887;ctg1_888
contig_2    contig_2    geneY       ctg1_886;ctg1_887;ctg1_888
contig_3    contig_3    genesZ      ctg1_886;ctg1_887;ctg1_888

I would like to have a summary.txt file which looks like:我想要一个summary.txt文件,它看起来像:

file1 geneX geneY geneZ
file2 geneA geneY
.
.
.
etc. 

Total row numbers may vary between files.总行数可能因文件而异。 I tried using awk without success.我尝试使用awk没有成功。

Following glenn jackmans advise from the comments, an GNU AWK solution would look like this:根据评论中的 glenn jackmans 建议,GNU AWK 解决方案如下所示:

awk 'BEGIN {ORS=" "} BEGINFILE{print FILENAME} {print $3} ENDFILE{ printf("\n")}'  file*.txt

And an awk solution could look like this (sorry only gnu awk for testing):一个 awk 解决方案可能如下所示(抱歉,只有 gnu awk 用于测试):

awk 'BEGIN {ORS=" "} FNR==1 {printf("\n%s", FILENAME)} {print $3} END{printf("\n")} '

Explanation解释

There are several special patterns:有几种特殊的模式:

  • BEGIN , its action is executed once at the beginning. BEGIN ,它的动作在开始时执行一次。 Here the ORS ( output record separator) is set to space, the effect is that you get from each original row a new column, this is the transpose step这里ORS (输出记录分隔符)设置为空格,效果是你从每一个原始行得到一个新列,这就是转置步骤
  • the END action is executed once at the end END动作在最后执行一次
  • the BEGINFILE and ENDFILE actions are executed once at the beginning and end of the processing of each file. BEGINFILEENDFILE操作在每个文件处理的开始和结束时执行一次。 Here the FILENAME respectively a linefeed is printed.这里FILENAME分别打印了一个换行符。

Assuming the field separators are multiple spaces:假设字段分隔符是多个空格:

for f in file*.txt ; do 
    echo $f `tr -s ' ' < $f | cut -d ' ' -f 3`
done > summary.txt

If the data is <tab> separated:如果数据是<tab>分隔的:

for f in file*.txt ; do 
    echo $f `cut -f 3 $f`
done > summary.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM