简体   繁体   English

在awk中检测列分隔符

[英]Detect column separators in awk

I'm trying to separate in various files an initial *.txt file using awk. 我正在尝试使用awk在各种文件中分隔一个初始* .txt文件。 I got the following format. 我得到以下格式。

inline  xline   X    Y     Horizon  Time    
1       159  806313 939258 KF2      0.80
....
81      149  805004 948030 FallRiver 0.85965
....
243     146  804252 965837 TensleepBbase 1.1862

In this case my separator is the fifth column (KF2,FallRiver,TensleepBbase). 在这种情况下,我的分隔符是第五列(KF2,FallRiver,TensleepBbase)。 My idea is to iterate and break the loop when the value of the fifth column change but I don't know how to structure the algorithm in AWK. 我的想法是在第五列的值更改时迭代并打破循环,但我不知道如何在AWK中构造算法。

The expected result is 3 txt files. 预期的结果是3个txt文件。 One for each Horizon key word: 每个Horizo​​n关键字一个:

File1.txt File1.txt

inline  xline   X    Y     Horizon  Time    
1       159  806313 939258 KF2      0.80
...
end of KF2 Horizon keyword

File2.txt File2.txt

inline  xline   X    Y     Horizon  Time    
81      149  805004 948030 FallRiver 0.85965
...
end of FallRiver Horizon keyword

.... ....

Thank you. 谢谢。

Using this input file, 使用此输入文件,

inline  xline   X    Y     Horizon  Time    
1       159  806313 939258 KF2      0.80
2       9  806313 939258 KF2      0.80
3       59  806313 939258 KF2      0.80
81      149  805004 948030 FallRiver 0.85965
82      345  5678   948030 FallRiver 0.85965
243     146  804252 965837 TensleepBbase 1.1862

I would do this: 我会这样做:

awk 'NR==1 { hdr=$0;next}            # Pick up column headers, and avoid other processing
           { hrz=$5;                 # Save current horizon
             if(hrz!=oldhrz){        # Check if horizon has changed
                if(length(oldhrz)>0)print "End of ",oldhrz > file
                file=++f ".txt"      # Work out name of output file
                print hdr > file     # Print column headers to new file
                oldhrz=hrz           # Remember which is the current horizon
             } 
             print > file
           }
     END   { print "End of ",hrz > file}' input.txt

Output 输出量

1.txt 1.txt

inline  xline   X    Y     Horizon  Time
1       159  806313 939258 KF2      0.80
2       9  806313 939258 KF2      0.80
3       59  806313 939258 KF2      0.80
End of  KF2

2.txt 2.txt

inline  xline   X    Y     Horizon  Time
81      149  805004 948030 FallRiver 0.85965
82      345  5678   948030 FallRiver 0.85965
End of  FallRiver

3.txt 3.txt

inline  xline   X    Y     Horizon  Time
243     146  804252 965837 TensleepBbase 1.1862
End of  TensleepBbase

Without the header, typical awk usecase 没有标题,典型的awk用例

awk '{print > $5}' infile

to eliminate header 消除标题

awk 'NR>1{print > $5}' infile

the output files will be missing the header though. 但是,输出文件将缺少标题。 For handling headers, 为了处理标题,

awk 'NR==1{header=$0;next} !k[$5]++{print header > $5}  {print >> $5}' infile

it grabs the header, creates unique files by column 5 value with the header and appends the lines to the corresponding files. 它抓取标题,并按标题的第5列值创建唯一文件,并将这些行附加到相应的文件中。

If you want to use FileX.txt as filenames instead of the field values. 如果要使用FileX.txt作为文件名而不是字段值。 You can map them as well 您也可以映射它们

awk 'NR==1{header=$0;next} !k[$5]++{f[$5]="File"++i".txt"; print header > f[$5]}  {print >> f[$5]}' infile

Finally, adding the footer can be done in the END statement by iterating over all unique entries 最后,可以通过遍历所有唯一条目在END语句中添加页脚

awk 'NR==1{header=$0;next} !k[$5]++{f[$5]="File"++i".txt"; print header > f[$5]}  {print >> f[$5]} END{for(t in f) print "End of "t" Horizon Keyword" >> f[t]}' infile

As suggested in the comments, you can replace >> with > . 如注释中所建议,您可以将>>替换为>

It SOUNDS like all you need is: 它听起来像您需要的是:

awk '
NR==1 { hdr=$0; fldNr=5; fldName=$fldNr; next }
$fldNr != prev {
    if (out) {
        print "end of", prev, fldName, "keyword" > out
    }
    out="File" ++cnt ".txt"
    print hdr > out
    prev=$fldNr
}
{ print > out }
END { print "end of", prev, fldName, "keyword" > out }
' file

but without testable sample input/output it's an untested guess. 但是没有可测试的示例输入/输出,这是未经测试的猜测。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM