[英]Detect column separators in awk
I'm trying to separate in various files an initial *.txt file using awk. 我正在尝试使用awk在各种文件中分隔一个初始* .txt文件。 I got the following format.
我得到以下格式。
inline xline X Y Horizon Time
1 159 806313 939258 KF2 0.80
....
81 149 805004 948030 FallRiver 0.85965
....
243 146 804252 965837 TensleepBbase 1.1862
In this case my separator is the fifth column (KF2,FallRiver,TensleepBbase). 在这种情况下,我的分隔符是第五列(KF2,FallRiver,TensleepBbase)。 My idea is to iterate and break the loop when the value of the fifth column change but I don't know how to structure the algorithm in AWK.
我的想法是在第五列的值更改时迭代并打破循环,但我不知道如何在AWK中构造算法。
The expected result is 3 txt files. 预期的结果是3个txt文件。 One for each Horizon key word:
每个Horizon关键字一个:
File1.txt File1.txt
inline xline X Y Horizon Time
1 159 806313 939258 KF2 0.80
...
end of KF2 Horizon keyword
File2.txt File2.txt
inline xline X Y Horizon Time
81 149 805004 948030 FallRiver 0.85965
...
end of FallRiver Horizon keyword
.... ....
Thank you. 谢谢。
Using this input file, 使用此输入文件,
inline xline X Y Horizon Time
1 159 806313 939258 KF2 0.80
2 9 806313 939258 KF2 0.80
3 59 806313 939258 KF2 0.80
81 149 805004 948030 FallRiver 0.85965
82 345 5678 948030 FallRiver 0.85965
243 146 804252 965837 TensleepBbase 1.1862
I would do this: 我会这样做:
awk 'NR==1 { hdr=$0;next} # Pick up column headers, and avoid other processing
{ hrz=$5; # Save current horizon
if(hrz!=oldhrz){ # Check if horizon has changed
if(length(oldhrz)>0)print "End of ",oldhrz > file
file=++f ".txt" # Work out name of output file
print hdr > file # Print column headers to new file
oldhrz=hrz # Remember which is the current horizon
}
print > file
}
END { print "End of ",hrz > file}' input.txt
Output 输出量
1.txt 1.txt
inline xline X Y Horizon Time
1 159 806313 939258 KF2 0.80
2 9 806313 939258 KF2 0.80
3 59 806313 939258 KF2 0.80
End of KF2
2.txt 2.txt
inline xline X Y Horizon Time
81 149 805004 948030 FallRiver 0.85965
82 345 5678 948030 FallRiver 0.85965
End of FallRiver
3.txt 3.txt
inline xline X Y Horizon Time
243 146 804252 965837 TensleepBbase 1.1862
End of TensleepBbase
Without the header, typical awk usecase 没有标题,典型的awk用例
awk '{print > $5}' infile
to eliminate header 消除标题
awk 'NR>1{print > $5}' infile
the output files will be missing the header though. 但是,输出文件将缺少标题。 For handling headers,
为了处理标题,
awk 'NR==1{header=$0;next} !k[$5]++{print header > $5} {print >> $5}' infile
it grabs the header, creates unique files by column 5 value with the header and appends the lines to the corresponding files. 它抓取标题,并按标题的第5列值创建唯一文件,并将这些行附加到相应的文件中。
If you want to use FileX.txt as filenames instead of the field values. 如果要使用FileX.txt作为文件名而不是字段值。 You can map them as well
您也可以映射它们
awk 'NR==1{header=$0;next} !k[$5]++{f[$5]="File"++i".txt"; print header > f[$5]} {print >> f[$5]}' infile
Finally, adding the footer can be done in the END
statement by iterating over all unique entries 最后,可以通过遍历所有唯一条目在
END
语句中添加页脚
awk 'NR==1{header=$0;next} !k[$5]++{f[$5]="File"++i".txt"; print header > f[$5]} {print >> f[$5]} END{for(t in f) print "End of "t" Horizon Keyword" >> f[t]}' infile
As suggested in the comments, you can replace >>
with >
. 如注释中所建议,您可以将
>>
替换为>
。
It SOUNDS like all you need is: 它听起来像您需要的是:
awk '
NR==1 { hdr=$0; fldNr=5; fldName=$fldNr; next }
$fldNr != prev {
if (out) {
print "end of", prev, fldName, "keyword" > out
}
out="File" ++cnt ".txt"
print hdr > out
prev=$fldNr
}
{ print > out }
END { print "end of", prev, fldName, "keyword" > out }
' file
but without testable sample input/output it's an untested guess. 但是没有可测试的示例输入/输出,这是未经测试的猜测。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.