I'm trying to separate in various files an initial *.txt file using awk. I got the following format.
inline xline X Y Horizon Time
1 159 806313 939258 KF2 0.80
....
81 149 805004 948030 FallRiver 0.85965
....
243 146 804252 965837 TensleepBbase 1.1862
In this case my separator is the fifth column (KF2,FallRiver,TensleepBbase). My idea is to iterate and break the loop when the value of the fifth column change but I don't know how to structure the algorithm in AWK.
The expected result is 3 txt files. One for each Horizon key word:
File1.txt
inline xline X Y Horizon Time
1 159 806313 939258 KF2 0.80
...
end of KF2 Horizon keyword
File2.txt
inline xline X Y Horizon Time
81 149 805004 948030 FallRiver 0.85965
...
end of FallRiver Horizon keyword
....
Thank you.
Using this input file,
inline xline X Y Horizon Time
1 159 806313 939258 KF2 0.80
2 9 806313 939258 KF2 0.80
3 59 806313 939258 KF2 0.80
81 149 805004 948030 FallRiver 0.85965
82 345 5678 948030 FallRiver 0.85965
243 146 804252 965837 TensleepBbase 1.1862
I would do this:
awk 'NR==1 { hdr=$0;next} # Pick up column headers, and avoid other processing
{ hrz=$5; # Save current horizon
if(hrz!=oldhrz){ # Check if horizon has changed
if(length(oldhrz)>0)print "End of ",oldhrz > file
file=++f ".txt" # Work out name of output file
print hdr > file # Print column headers to new file
oldhrz=hrz # Remember which is the current horizon
}
print > file
}
END { print "End of ",hrz > file}' input.txt
Output
1.txt
inline xline X Y Horizon Time
1 159 806313 939258 KF2 0.80
2 9 806313 939258 KF2 0.80
3 59 806313 939258 KF2 0.80
End of KF2
2.txt
inline xline X Y Horizon Time
81 149 805004 948030 FallRiver 0.85965
82 345 5678 948030 FallRiver 0.85965
End of FallRiver
3.txt
inline xline X Y Horizon Time
243 146 804252 965837 TensleepBbase 1.1862
End of TensleepBbase
Without the header, typical awk usecase
awk '{print > $5}' infile
to eliminate header
awk 'NR>1{print > $5}' infile
the output files will be missing the header though. For handling headers,
awk 'NR==1{header=$0;next} !k[$5]++{print header > $5} {print >> $5}' infile
it grabs the header, creates unique files by column 5 value with the header and appends the lines to the corresponding files.
If you want to use FileX.txt as filenames instead of the field values. You can map them as well
awk 'NR==1{header=$0;next} !k[$5]++{f[$5]="File"++i".txt"; print header > f[$5]} {print >> f[$5]}' infile
Finally, adding the footer can be done in the END
statement by iterating over all unique entries
awk 'NR==1{header=$0;next} !k[$5]++{f[$5]="File"++i".txt"; print header > f[$5]} {print >> f[$5]} END{for(t in f) print "End of "t" Horizon Keyword" >> f[t]}' infile
As suggested in the comments, you can replace >>
with >
.
It SOUNDS like all you need is:
awk '
NR==1 { hdr=$0; fldNr=5; fldName=$fldNr; next }
$fldNr != prev {
if (out) {
print "end of", prev, fldName, "keyword" > out
}
out="File" ++cnt ".txt"
print hdr > out
prev=$fldNr
}
{ print > out }
END { print "end of", prev, fldName, "keyword" > out }
' file
but without testable sample input/output it's an untested guess.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.