简体   繁体   English

在多个输入文件上使用awk

[英]Using awk on multiple input files

There's a bash script I've been working on and within this script at some point, I have been trying to figure out how to process two CSV files at once using awk , which will be used to produce several output files. 我一直在处理一个bash脚本,并且在该脚本中的某个时刻,我一直在尝试弄清楚如何使用awk一次处理两个CSV文件,该文件将用于生成多个输出文件。 Shortly, there's a main file which keeps the content to be dispatched to some other output files whose names and the number of records they need to be hold, will be derived from another file. 不久,就有一个主文件,该文件将要分发的内容保留到其他一些输出文件中,这些文件的名称和需要保留的记录数将从另一个文件派生。 First n records will go to first output file and consequent n+1 to n+k to second one and so on. n记录将进入第一个输出文件,随后n+1n+k进入第二个文件,依此类推。

To be more clear here's an example of how the main record file might look: 为了更加清楚,这是一个主记录文件的外观示例:

x11,x21
x12,x22
x13,x23
x14,x24
x15,x25
x16,x26
x17,x27
x18,x28
x19,x29

and how the other file might look like: 以及其他文件的外观:

out_file_name_1,2
out_file_name_2,3
out_file_name_3,4

Then the first output file named as out_file_name_1 should look like: 然后,第一个名为out_file_name_1输出文件应如下所示:

x11,x21
x12,x22

Then the second output file named as out_file_name_2 should look like: 然后,第二个名为out_file_name_2输出文件应如下所示:

x13,x23
x14,x24
x15,x25

And the last one should look like: 最后一个应该看起来像:

x16,x26
x17,x27
x18,x28
x19,x29

Hopefully it is clear enough. 希望它已经足够清楚了。

Here's a solution in awk since you asked, but clearly triplee's answer is the nicer approach. 自您提出以来,这是awk中的解决方案,但显然,三元组的答案是更好的方法。

$ cat oak.awk
BEGIN { FS = ","; fidx = 1 }

# Processing files.txt, init parallel arrays with filename and number of records
# to print to each one.
NR == FNR {
    file[NR] = $1
    records[NR] = $2
    next
}

# Processing main.txt. Print record to current file. Decrement number of records to print,
# advancing to the next file when number of records to print reaches 0
fidx in file && records[fidx] > 0 {
    print > file[fidx]
    if (! --records[fidx]) ++fidx
    next
}

# If we get here, either we ran out of files before reading all the records
# or a file was specified to contain zero records    
{ print "Error: Insufficient number of files or file with non-positive number of records"
  exit 1 }


$ cat files.txt
out_file_name_1,2
out_file_name_2,3
out_file_name_3,4

$ cat main.txt
x11,x21
x12,x22
x13,x23
x14,x24
x15,x25
x16,x26
x17,x27
x18,x28
x19,x29

$ awk -f oak.awk files.txt main.txt

$ cat out_file_name_1
x11,x21
x12,x22

$ cat out_file_name_2
x13,x23
x14,x24
x15,x25

$ cat out_file_name_3
x16,x26
x17,x27
x18,x28
x19,x29

I wouldn't use Awk for this. 我不会为此使用Awk。

while IFS=, read -u 3 filename lines; do
    head -n "$lines" >"$filename"
done 3<other.csv <main.csv

The read -u to read from a particular file descriptor is not completely portable, I believe, but your question is tagged so I am assuming that is not a problem here. 我相信,从特定文件描述符读取的read -u并不是完全可移植的,但是您的问题被标记为所以我假设这里不是问题。

Demo: http://ideone.com/6FisHT 演示: http : //ideone.com/6FisHT

If you end up with empty files after the first, maybe try to replace the inner loop with additional read statements. 如果您在第一个文件之后得到空文件,则可以尝试使用其他read语句替换内部循环。

while IFS=, read -u 3 filename lines; do
    for i in $(seq 1 "$lines"); do
        read -r line
        echo "$line"
    done >"$filename"
done 3<other.csv <main.csv

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM