简体   繁体   English

BASH列表搜索Awk

[英]BASH List Search Awk

I'm very new to scripting and trying to work something out in bash. 我非常擅长编写脚本并试图用bash来解决问题。 I have a data file filled with information that looks like this: 我有一个数据文件,其中包含如下信息:

2   aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aaaa.aaa 11111   aaaa    1111    [1] 1   
4   aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aaaa.aaa 11111   aaaa    1111    [1]   1 
8   aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aaaa.aaa 11111   aaaa    1111    [1] 1   
10  aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aaaa.aaa 11111   aaaa    1111    [1] 1   
12  aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aaaaa.aaa    11111   aaaa    1111    [1] 1   
14  aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aaaa.aaa 11111   aaaa    1111    [1] 1   
16  aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aaaaa.aaa    11111   aaaa    1111    [1] 1   
18  aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aaaa.aaa 11111   aaaa    1111    [1] 1   
20  aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aaaaa.aaa    11111   aaaa    1111    [1] 1   
24  aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aaaaa.aaa    11111   aaaa    1111    [1] 1   
26  aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aaaaa.aaa    11111   aaaa    1111    [1] 1   
28  aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aaaa.aaa 11111   aaaa    1111    [1] 1   
30  aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aaaa.aaa 11111   aaaa    1111    [1] 1   
32  aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aaaaa.aaa    11111   aaaa    1111    [1] 1   

Where the *a's represent various letters and the 1's represent various numbers. 其中* a代表各种字母,1代表各种数字。

The lists are all supposed to go down vertically from 2 to 32 counting by 2's, however a lot of the lists are missing a couple components, such as the one I posted above which is missing 6 and 22. What I am trying to do is write a script that would go through and check to see if each number is there, and if not, add a line with the number at the front and nothing else trailing, so you'd have: 这些列表都应该从2到32垂直向下计数2,但是很多列表都缺少一些组件,例如我上面发布的那个缺少6和22的组件。我想要做的是编写一个脚本,通过并检查每个数字是否在那里,如果没有,在前面加上一个数字的行,没有其他任何尾随,所以你有:

2   aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aaaa.aaa 11111   aaaa    1111    
[1] 1   
4   aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aaaa.aaa 11111   aaaa    1111    [1]   1 
6

8   aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aa11_1.aaa aaaa.aaa 11111   aaaa    1111    [1] 1      
...

From what I have read I believe the AWK function will be the most likely to succeed, however I'm not sure how to make it work. 根据我的阅读,我相信AWK功能最有可能成功,但我不确定如何使其工作。 Thanks! 谢谢!

This awk is based on the output index range of 2-32 (by two) entries: 此awk基于2-32(两个)条目的输出索引范围:

awk '{a[$1]=$0} END {for(i=2;i<=32;i+=2) print (i in a ? a[i] : i)}' data

The breakdown: 细分:

  • Store all the existing lines in an array by their index 按索引将所有现有行存储在数组中
  • At the END , walk all known indexes (2-32) and either print the line, or the missing index END ,遍历所有已知索引(2-32)并打印该行或缺少的索引

Try something like 尝试类似的东西

awk '{
        while( $1 > last_printed + 2) { 
            last_printed+=2; 
            print last_printed;
        }
        print;
        last_printed = $1;
     }' FILENAME

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM