简体   繁体   中英

bash script and awk to sort a file

so I have a project for uni, and I can't get through the first exercise. Here is my problem: I have a file, and I want to select some data inside of it and 'display' it in another file. But the data I'm looking for is a little bit scattered in the file, so I need several awk commands in my script to get them.

Query= fig|1240086.14.peg.1

Length=76
                                                                  Score     E
Sequences producing significant alignments:                          (Bits)  Value

 fig|198628.19.peg.2053                                              140     3e-42


> fig|198628.19.peg.2053
Length=553

Here on the picture, you can see that there are 2 types of 'Length=', and I only want to 'catch' the "Length=" that are just after a "Query=". I have to use awk so I tried this :

 awk '{if(/^$/ && $(NR+1)/^Length=/) {split($(NR+1), b, "="); print b[2]}}'

but it doesn't work... does anyone have an idea?

awk solution:

awk '/^Length=/ && r~/^Query/{ sub(/^[^=]+=/,""); printf "%s ",$0 }
     NF{ r=$0 }END{ print "" }' file

  • NF{ r=$0 } - capture the whole non-empty line
  • /^Length=/ && r~/^Query/ - on encountering Length line having previous line started with Query (ensured by r~/^Query/ )

You need to understand how Awk works. It reads a line, evaluates the script, then starts over, reading one line at a time. So there is no way to say "the next line contains this". What you can do is "if this line contains, then remember this until ..."

awk '/Query=/ { q=1; next } /Length/ && q { print } /./ { q=0 }' file

This sets the flag q to 1 (true) when we see Query= and then skips to the next line. If we see Length and we recently saw Query= then q will be 1, and so we print. In other cases, set q back to "not recently seen" on any non-empty line. (I put in the non-empty condition to allow for empty lines anywhere without affecting the overall logic.)

It sounds like this is what you want for the first part of your question:

$ awk -F'=' '!NF{next} f && ($1=="Length"){print $2} {f=($1=="Query")}' file
76

but idk what the second part is about since there's no "data" lines in your input and only 1 valid output from your sample input best I can tell.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM