简体   繁体   English

bash脚本和awk对文件进行排序

[英]bash script and awk to sort a file

so I have a project for uni, and I can't get through the first exercise. 所以我有一个针对uni的项目,但我无法完成第一个练习。 Here is my problem: I have a file, and I want to select some data inside of it and 'display' it in another file. 这是我的问题:我有一个文件,我想在其中选择一些数据并将其“显示”在另一个文件中。 But the data I'm looking for is a little bit scattered in the file, so I need several awk commands in my script to get them. 但是我要查找的数据有点分散在文件中,因此我需要在脚本中使用几个awk命令来获取它们。

Query= fig|1240086.14.peg.1

Length=76
                                                                  Score     E
Sequences producing significant alignments:                          (Bits)  Value

 fig|198628.19.peg.2053                                              140     3e-42


> fig|198628.19.peg.2053
Length=553

Here on the picture, you can see that there are 2 types of 'Length=', and I only want to 'catch' the "Length=" that are just after a "Query=". 在图片上的这里,您可以看到“ Length =”有两种类型,我只想“捕获”紧接在“ Query =“之后的“ Length =”。 I have to use awk so I tried this : 我必须使用awk,所以我尝试了这个:

 awk '{if(/^$/ && $(NR+1)/^Length=/) {split($(NR+1), b, "="); print b[2]}}'

but it doesn't work... does anyone have an idea? 但这是行不通的...有人有想法吗?

awk solution: awk解决方案:

awk '/^Length=/ && r~/^Query/{ sub(/^[^=]+=/,""); printf "%s ",$0 }
     NF{ r=$0 }END{ print "" }' file

  • NF{ r=$0 } - capture the whole non-empty line NF{ r=$0 } -捕获整个非空行
  • /^Length=/ && r~/^Query/ - on encountering Length line having previous line started with Query (ensured by r~/^Query/ ) /^Length=/ && r~/^Query/ -在遇到以Query开始的前一行的Length行时(由r~/^Query/确保)

You need to understand how Awk works. 您需要了解Awk的工作方式。 It reads a line, evaluates the script, then starts over, reading one line at a time. 它读取一行,评估脚本,然后重新开始,一次读取一行。 So there is no way to say "the next line contains this". 因此,没有办法说“下一行包含此内容”。 What you can do is "if this line contains, then remember this until ..." 可以做的是“如果此行包含,请记住这一点,直到...”

awk '/Query=/ { q=1; next } /Length/ && q { print } /./ { q=0 }' file

This sets the flag q to 1 (true) when we see Query= and then skips to the next line. 当我们看到Query= ,会将标志q为1(true),然后跳到下一行。 If we see Length and we recently saw Query= then q will be 1, and so we print. 如果我们看到Length ,并且最近看到Query=q将为1,因此我们进行打印。 In other cases, set q back to "not recently seen" on any non-empty line. 在其他情况下,请在任何非空行上将q设置回“最近未见”。 (I put in the non-empty condition to allow for empty lines anywhere without affecting the overall logic.) (我将非空条件置于允许在任何地方空行而不会影响整体逻辑的位置。)

It sounds like this is what you want for the first part of your question: 听起来这是您要在问题的第一部分中得到的:

$ awk -F'=' '!NF{next} f && ($1=="Length"){print $2} {f=($1=="Query")}' file
76

but idk what the second part is about since there's no "data" lines in your input and only 1 valid output from your sample input best I can tell. 但是idk第二部分是关于什么的,因为您输入的内容中没有“数据”行,而从样本输入中只能看到1个有效的输出,这是我能告诉的最好的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM