[英]How to combine all files in a directory, adding their individual file names as a new column in final merged file
I have a directory with files that looks like this:我有一个目录,其中的文件如下所示:
CCG02-215-WGS.format.flt.txt
CCG05-707-WGS.format.flt.txt
CCG06-203-WGS.format.flt.txt
CCG04-967-WGS.format.flt.txt
CCG05-710-WGS.format.flt.txt
CCG06-215-WGS.format.flt.txt
Contents of each files look like this每个文件的内容如下所示
1 9061390 14 93246140
1 58631131 2 31823410
1 108952511 3 110694548
1 168056494 19 23850376
etc...
Ideal output would be a file, let's call it all-samples.format.flt.txt, that would contain the concatenation of all files, but an additional column that displays which sample/file the row came from ( some minor formatting involved to remove the.format.flt.txt ):理想的 output 将是一个文件,我们称之为 all-samples.format.flt.txt,它将包含所有文件的连接,但是一个额外的列显示该行来自哪个样本/文件(一些小的格式涉及删除.format.flt.txt ):
1 9061390 14 93246140 CCG02-215-WGS
...
1 58631131 2 31823410 CCG05-707-WGS
...
1 108952511 3 110694548 CCG06-203-WGS
...
1 168056494 19 23850376 CCG04-967-WGS
Currently, I have the following code which works for individual files.目前,我有以下适用于单个文件的代码。
awk 'BEGIN{OFS="\t"; split(ARGV[1],f,".")}{print $1,$2,$3,$4,f[1]}' CCG05-707-WGS.format.flt.txt
#OUTPUT
1 58631131 2 31823410 CCG05-707-WGS
...
However, when I try to apply it to all files, using the star, it adds the first filename it finds to all the files as the 4th column.但是,当我尝试使用星号将其应用于所有文件时,它会将找到的第一个文件名作为第 4 列添加到所有文件中。
awk 'BEGIN{OFS="\t"; split(ARGV[1],f,".")}{print $1,$2,$3,$4,f[1]}' *
#OUTPUT, 4th column should be as seen in previous code block
1 9061390 14 93246140 CCG02-215-WGS
...
1 58631131 2 31823410 CCG02-215-WGS
...
1 108952511 3 110694548 CCG02-215-WGS
...
1 168056494 19 23850376 CCG02-215-WGS
I feel like the solution may just lie in adding an additional parameter to awk... but I'm not sure where to start.我觉得解决方案可能只是在 awk 中添加一个附加参数......但我不确定从哪里开始。
Thanks!谢谢!
UPDATE更新
Using OOTB awk var FILENAME solved the issue, plus some elegant formatting logic for the file names.使用 OOTB awk var FILENAME 解决了这个问题,加上文件名的一些优雅的格式化逻辑。
Thank @RavinderSingh13!感谢@RavinderSingh13!
awk 'BEGIN{OFS="\t"} FNR==1{file=FILENAME;sub(/..*/,"",file)} {print $0,file}' *.txt awk 'BEGIN{OFS="\t"} FNR==1{file=FILENAME;sub(/..*/,"",file)} {print $0,file}' *.txt
You may use:您可以使用:
Any version awk
:任何版本
awk
:
awk -v OFS='\t' 'FNR==1{split(FILENAME, a, /\./)} {print $0, a[1]}' *.txt
Or in gnu-awk:或者在 gnu-awk 中:
awk -v OFS='\t' 'BEGINFILE{split(FILENAME, a, /\./)} {print $0, a[1]}' *.txt
With your shown samples please try following awk
code.对于您展示的样品,请尝试遵循
awk
代码。 We need to use FILENAME
OOTB variable here of awk
.我们需要在这里使用
awk
的FILENAME
OOTB 变量。 Then whenever there is first line of any txt file(all txt files passed to this program) then remove everything from .
然后,只要有任何 txt 文件的第一行(所有 txt 文件传递给该程序),然后从
.
to till last of value and in main program printing current line followed by file(file's name as per requirement)到最后一个值并在主程序中打印当前行,后跟文件(文件名根据要求)
awk '
BEGIN { OFS="\t" }
FNR==1{
file=FILENAME
sub(/\..*/,"",file)
}
{
print $0,file
}
' *.txt
OR in a one-liner form try following awk
code:或以单行形式尝试遵循
awk
代码:
awk 'BEGIN{OFS="\t"} FNR==1{file=FILENAME;sub(/\..*/,"",file)} {print $0,file}' *.txt
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.