[英]How to extract certain lines from text file by REGEX
I have this complicated text file with over 22,000 lines: 我有超过22,000行的复杂文本文件:
>Cluster 35
0 2856nt, >tru_clu8_1_inde2_or1... *
>Cluster 36
0 1179nt, >gl_isotig07707... *
1 914nt, >un_isotig04557... at +/94.20%
2 1282nt, >cp_isotig06284... at -/92.43%
3 1137nt, >cp_isotig02981... at -/93.84%
>Cluster 37
0 2835nt, >yl_JTQ_com670_c0_seq1... *
>Cluster 38
0 2275nt, >pb_iso00211... at +/93.93%
1 2647nt, >yl_JTQ_com323_c0_seq1... at +/91.39%
I want clusters with only 1 hit: 我想要只有1个命中的集群:
>Cluster 35
0 2856nt, >tru_clu8_1_inde2_or1... *
>Cluster 37
0 2835nt, >yl_JTQ_com670_c0_seq1... *
Then if possible output in this format: 然后,如果可能,以这种格式输出:
>Cluster 35 tru_clu8_1_inde2_or1
>Cluster 37 yl_JTQ_com670_c0_seq1
$ awk 'NR>2{if(/^>/ && b ~ /^>/) print b"\n"a} {b=a ; a=$0}' infile.txt
>Cluster 35
0 2856nt, >tru_clu8_1_inde2_or1... *
>Cluster 37
0 2835nt, >yl_JTQ_com670_c0_seq1... *
Edit: 编辑:
This will however not work if there is a final cluster with one hit. 但是,如果有一个命中的最终集群,这将不起作用。 This workaround may work, also includes formatted output:
此解决方法可能会起作用,还包括格式化输出:
$ echo ">" >> infile.txt
$ awk 'NR>2{if(/^>/ && b ~ /^>/) {a=gensub(/^.*>(\w+).*/,"\\1", "g", a) ; print b,a} } {b=a ; a=$0}' infile.txt
>Cluster 35 tru_clu8_1_inde2_or1
>Cluster 37 yl_JTQ_com670_c0_seq1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.