简体   繁体   中英

Using awk on multiple input files

There's a bash script I've been working on and within this script at some point, I have been trying to figure out how to process two CSV files at once using awk , which will be used to produce several output files. Shortly, there's a main file which keeps the content to be dispatched to some other output files whose names and the number of records they need to be hold, will be derived from another file. First n records will go to first output file and consequent n+1 to n+k to second one and so on.

To be more clear here's an example of how the main record file might look:

x11,x21
x12,x22
x13,x23
x14,x24
x15,x25
x16,x26
x17,x27
x18,x28
x19,x29

and how the other file might look like:

out_file_name_1,2
out_file_name_2,3
out_file_name_3,4

Then the first output file named as out_file_name_1 should look like:

x11,x21
x12,x22

Then the second output file named as out_file_name_2 should look like:

x13,x23
x14,x24
x15,x25

And the last one should look like:

x16,x26
x17,x27
x18,x28
x19,x29

Hopefully it is clear enough.

Here's a solution in awk since you asked, but clearly triplee's answer is the nicer approach.

$ cat oak.awk
BEGIN { FS = ","; fidx = 1 }

# Processing files.txt, init parallel arrays with filename and number of records
# to print to each one.
NR == FNR {
    file[NR] = $1
    records[NR] = $2
    next
}

# Processing main.txt. Print record to current file. Decrement number of records to print,
# advancing to the next file when number of records to print reaches 0
fidx in file && records[fidx] > 0 {
    print > file[fidx]
    if (! --records[fidx]) ++fidx
    next
}

# If we get here, either we ran out of files before reading all the records
# or a file was specified to contain zero records    
{ print "Error: Insufficient number of files or file with non-positive number of records"
  exit 1 }


$ cat files.txt
out_file_name_1,2
out_file_name_2,3
out_file_name_3,4

$ cat main.txt
x11,x21
x12,x22
x13,x23
x14,x24
x15,x25
x16,x26
x17,x27
x18,x28
x19,x29

$ awk -f oak.awk files.txt main.txt

$ cat out_file_name_1
x11,x21
x12,x22

$ cat out_file_name_2
x13,x23
x14,x24
x15,x25

$ cat out_file_name_3
x16,x26
x17,x27
x18,x28
x19,x29

I wouldn't use Awk for this.

while IFS=, read -u 3 filename lines; do
    head -n "$lines" >"$filename"
done 3<other.csv <main.csv

The read -u to read from a particular file descriptor is not completely portable, I believe, but your question is tagged so I am assuming that is not a problem here.

Demo: http://ideone.com/6FisHT

If you end up with empty files after the first, maybe try to replace the inner loop with additional read statements.

while IFS=, read -u 3 filename lines; do
    for i in $(seq 1 "$lines"); do
        read -r line
        echo "$line"
    done >"$filename"
done 3<other.csv <main.csv

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM