简体   繁体   中英

Detect column separators in awk

I'm trying to separate in various files an initial *.txt file using awk. I got the following format.

inline  xline   X    Y     Horizon  Time    
1       159  806313 939258 KF2      0.80
....
81      149  805004 948030 FallRiver 0.85965
....
243     146  804252 965837 TensleepBbase 1.1862

In this case my separator is the fifth column (KF2,FallRiver,TensleepBbase). My idea is to iterate and break the loop when the value of the fifth column change but I don't know how to structure the algorithm in AWK.

The expected result is 3 txt files. One for each Horizon key word:

File1.txt

inline  xline   X    Y     Horizon  Time    
1       159  806313 939258 KF2      0.80
...
end of KF2 Horizon keyword

File2.txt

inline  xline   X    Y     Horizon  Time    
81      149  805004 948030 FallRiver 0.85965
...
end of FallRiver Horizon keyword

....

Thank you.

Using this input file,

inline  xline   X    Y     Horizon  Time    
1       159  806313 939258 KF2      0.80
2       9  806313 939258 KF2      0.80
3       59  806313 939258 KF2      0.80
81      149  805004 948030 FallRiver 0.85965
82      345  5678   948030 FallRiver 0.85965
243     146  804252 965837 TensleepBbase 1.1862

I would do this:

awk 'NR==1 { hdr=$0;next}            # Pick up column headers, and avoid other processing
           { hrz=$5;                 # Save current horizon
             if(hrz!=oldhrz){        # Check if horizon has changed
                if(length(oldhrz)>0)print "End of ",oldhrz > file
                file=++f ".txt"      # Work out name of output file
                print hdr > file     # Print column headers to new file
                oldhrz=hrz           # Remember which is the current horizon
             } 
             print > file
           }
     END   { print "End of ",hrz > file}' input.txt

Output

1.txt

inline  xline   X    Y     Horizon  Time
1       159  806313 939258 KF2      0.80
2       9  806313 939258 KF2      0.80
3       59  806313 939258 KF2      0.80
End of  KF2

2.txt

inline  xline   X    Y     Horizon  Time
81      149  805004 948030 FallRiver 0.85965
82      345  5678   948030 FallRiver 0.85965
End of  FallRiver

3.txt

inline  xline   X    Y     Horizon  Time
243     146  804252 965837 TensleepBbase 1.1862
End of  TensleepBbase

Without the header, typical awk usecase

awk '{print > $5}' infile

to eliminate header

awk 'NR>1{print > $5}' infile

the output files will be missing the header though. For handling headers,

awk 'NR==1{header=$0;next} !k[$5]++{print header > $5}  {print >> $5}' infile

it grabs the header, creates unique files by column 5 value with the header and appends the lines to the corresponding files.

If you want to use FileX.txt as filenames instead of the field values. You can map them as well

awk 'NR==1{header=$0;next} !k[$5]++{f[$5]="File"++i".txt"; print header > f[$5]}  {print >> f[$5]}' infile

Finally, adding the footer can be done in the END statement by iterating over all unique entries

awk 'NR==1{header=$0;next} !k[$5]++{f[$5]="File"++i".txt"; print header > f[$5]}  {print >> f[$5]} END{for(t in f) print "End of "t" Horizon Keyword" >> f[t]}' infile

As suggested in the comments, you can replace >> with > .

It SOUNDS like all you need is:

awk '
NR==1 { hdr=$0; fldNr=5; fldName=$fldNr; next }
$fldNr != prev {
    if (out) {
        print "end of", prev, fldName, "keyword" > out
    }
    out="File" ++cnt ".txt"
    print hdr > out
    prev=$fldNr
}
{ print > out }
END { print "end of", prev, fldName, "keyword" > out }
' file

but without testable sample input/output it's an untested guess.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM