[英]extracting lines from a multi-column file
I have the following format of data set: 我有以下格式的数据集:
Identified_____ID#2357_____ReadSequence:1238
Unknown_____0_____ReadSequence:0979
Unknown_____0_____ReadSequence:5476
Identified_____ID#567899_____ReadSequence:4376
Using awk
command, how can I extract the ReadSequences
but only lines which have been identified (based on the first column entries)? 使用awk
命令,如何提取ReadSequences
但仅提取已识别的行(基于第一列条目)?
$ awk -F"_____" '$1=="Identified" {print $3}' test.in
ReadSequence:1238
ReadSequence:4376
If you only want the ReadSequence ids, gsub
is your friend: 如果只需要ReadSequence ID,则gsub
是您的朋友:
$ awk -F"_____" '$1=="Identified" {gsub(/^.*:/,"",$3); print $3}' test.in
1238
4376
awk -F'_____' '/^Identified/ {print $NF}' file
ReadSequence:1238
ReadSequence:4376
OR 要么
awk '/^Identified/ {split($0,a,"_____");print a[3]}' info
ReadSequence:1238
ReadSequence:4376
OR if you only want to read the value of ReadSequence then 或者,如果您只想读取ReadSequence的值,则
awk -F'_____' '/^Identified/ {split($NF,a,":"); print a[2]}' file
1238
4376
$ awk -F':' '/^Identified/{print $NF}' file
1238
4376
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.