简体   繁体   English

处理多个文件并将它们附加在linux / unix中

[英]Process multiple files and append them in linux/unix

I have over 100 files with at least 5-8 columns (tab-separated) in each file. 我有100多个文件,每个文件中至少有5-8列(制表符分隔)。 I need to extract first three columns from each file and add fourth column with some predefined text and append them. 我需要从每个文件中提取前三列,并在第四列中添加一些预定义的文本并将其附加。

Let's say I have 3 files: file001.txt , file002.txt , file003.txt . 假设我有3个文件: file001.txtfile002.txtfile003.txt

file001.txt : file001.txt

chr1 1 2 15
chr2 3 4 17

file002.txt : file002.txt

chr1 1 2 15
chr2 3 4 17

file003.txt : file003.txt

chr1 1 2 15
chr2 3 4 17

combined_file.txt : combined_file.txt

chr1 1 2 f1
chr2 3 4 f1
chr1 1 2 f2
chr2 3 4 f2
chr1 1 2 f3
chr2 3 4 f3

For simplicity I kept file contents same. 为简单起见,我将文件内容保持不变。 My script is as follows: 我的脚本如下:

#!/bin/bash
for i in {1..3}; do
j=$(printf '%03d' $i)
awk 'BEGIN { OFS="\t"}; {print $1,$2,$3}' file${j}.txt | awk -v k="$j" 'BEGIN {print $0"\t$k”}' | cat >> combined_file.txt
done

But the script is giving the following errors: 但是脚本给出了以下错误:

awk: non-terminated string $k”}... at source line 1 context is awk:源代码行1上下文中的非终止字符串$ k”} ...是

<<< awk: giving up source line number 2 awk: non-terminated string $k”}... at source line 1 context is <<< awk: giving up source line number 2 <<< awk:放弃源代码行2 awk:未终止的字符串$ k“} ...在源代码行1上下文中是<<< awk:放弃源代码行2

Can some one help me to figure it out? 有人可以帮我弄清楚吗?

You don't need two different awk scripts. 您不需要两个不同的awk脚本。 And you don't use $ to refer to variables in awk , that's used to refer to input fields (ie $k means access the field whose number is in the variable k ). 而且,您不使用$来引用awk中的变量,它用来引用输入字段(即$k表示访问其数字在变量k的字段)。

for i in {1..3}; do
    j=$(printf '%03d' $i)
    awk -v k="$j" -v OFS='\t' '{print $1, $2, $3, k}' file$j.txt
done > combined_file.txt

As pointed out in the comments your problem is youre trying to use odd characters as if they were double quotes. 正如评论中指出的那样,您的问题是您试图像使用双引号一样使用奇数字符。 Once you fix that though, you don't need a loop or any of that other complexity all you need is: 但是,一旦解决该问题,就不需要循环或其他任何复杂性:

$ awk 'BEGIN{FS=OFS="\t"} {$NF="f"ARGIND} 1' file*
chr1    1       2       f1
chr2    3       4       f1
chr1    1       2       f2
chr2    3       4       f2
chr1    1       2       f3
chr2    3       4       f3

The above used GNU awk for ARGIND. 上面将GNU awk用于ARGIND。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM