简体   繁体   English

如何使用awk提取特定单词后的数字?

[英]How to extract the number after specific word using awk?

I have several lines of text.我有几行文字。 I want to extract the number after specific word using awk.我想使用 awk 提取特定单词后的数字。

I tried the following code but it does not work.我尝试了以下代码,但它不起作用。

At first, create the test file by: vi test.text .首先,通过vi test.text创建测试文件。 There are 3 columns (the 3 fields are generated by some other pipeline commands using awk).有 3 列(这 3 个字段是由其他一些使用 awk 的管道命令生成的)。

Index  AllocTres                              CPUTotal
1      cpu=1,mem=256G                         18
2      cpu=2,mem=1024M                        16
3                                             4
4      cpu=12,gres/gpu=3                      12
5                                             8
6                                             9
7      cpu=13,gres/gpu=4,gres/gpu:ret6000=2   20
8      mem=12G,gres/gpu=3,gres/gpu:1080ti=1   21

Please note there are several empty fields in this file.请注意,此文件中有几个空字段。 what I want to achieve is to extract the number after the first gres/gpu= in each line (if no gres/gpu= occurs in this line, the default number is 0 ) using a pipeline like: cat test.text | awk '{some_commands}'我想要实现的是使用如下管道提取每行中第一个gres/gpu=之后的数字(如果此行中没有出现gres/gpu= ,则默认数字为0 ): cat test.text | awk '{some_commands}' cat test.text | awk '{some_commands}' to output 4 columns: cat test.text | awk '{some_commands}'输出 4 列:

Index  AllocTres                              CPUTotal   GPUAllocated
1      cpu=1,mem=256G                         18         0
2      cpu=2,mem=1024M                        16         0
3                                             4          0
4      cpu=12,gres/gpu=3                      12         3
5                                             8          0
6                                             9          0
7      cpu=13,gres/gpu=4,gres/gpu:ret6000=2   20         4
8      mem=12G,gres/gpu=3,gres/gpu:1080ti=1   21         3

Firstly: awk do not need cat , it could read files on its' own.首先: awk不需要cat ,它可以自己读取文件。 Combining cat and awk is generally discouraged as useless use of cat .通常不鼓励将catawk结合使用,因为 cat 无用

For this task I would use GNU AWK following way, let file.txt content be对于此任务,我将使用 GNU AWK以下方式,让file.txt内容为

cpu=1,mem=256G
cpu=2,mem=1024M

cpu=12,gres/gpu=3


cpu=13,gres/gpu=4,gres/gpu:ret6000=2
mem=12G,gres/gpu=3,gres/gpu:1080ti=1

then然后

awk 'BEGIN{FS="gres/gpu="}{print $2+0}' file.txt

output输出

0
0
0
3
0
0
4
3

Explanation: I inform GNU AWK that field separator ( FS ) is gres/gpu= then for each line I do print 2nd field increased by zero.说明:我通知 GNU AWK字段分隔符 ( FS ) 是gres/gpu=然后对于每一行我打印的第二个字段增加了零。 For lines without gres/gpu= $2 is empty string, when used in arithmetic context this is same as zero so zero plus zero gives zero.对于没有gres/gpu= $2的行是空字符串,当在算术上下文中使用时,这与零相同,因此零加零等于零。 For lines with at least one gres/gpu= increasing by zero provokes GNU AWK to find longest prefix which is legal number, thus 3 (4th line) becomes 3 , 4, (7th line) becomes 4 , 3, (8th line) becomes 3 .对于至少有一个gres/gpu=增加零的行,GNU AWK会找到合法数字的最长前缀,因此3 (第 4 行)变为34, (第 7 行)变为43, (第 8 行)变为3 .

(tested in GNU Awk 5.0.1) (在 GNU Awk 5.0.1 中测试)

With your shown samples in GNU awk you can try following code.使用您在 GNU awk中显示的示例,您可以尝试以下代码。 Written and tested in GNU awk .用 GNU awk编写和测试。 Simple explanation would be using awk 's match function where using regex gres\/gpu=([0-9]+) (escaping / here) and creating one and only capturing group to capture all digits coming after = .简单的解释是使用awkmatch函数,其中使用正则表达式gres\/gpu=([0-9]+) (在此处转义/ )并创建一个且唯一的捕获组来捕获=之后的所有数字。 Once match is found printing current line followed by array's arr's 1st element +0 (to print zero in case no match found for any line) here.一旦找到匹配,则在此处打印当前行,然后是数组的 arr 的第一个元素+0 (在没有找到任何行的匹配的情况下打印零)。

awk '
FNR==1{
  print $0,"GPUAllocated"
  next
}
{
  match($0,/gres\/gpu=([0-9]+)/,arr)
  print $0,arr[1]+0
}
' Input_file

Using sed使用sed

$ sed '1s/$/\tGPUAllocated/;s~.*gres/gpu=\([0-9]\).*~& \t\1~;1!{\~gres/gpu=[0-9]~!s/$/ \t0/}' input_file
Index  AllocTres                              CPUTotal  GPUAllocated
1      cpu=1,mem=256G                         18        0
2      cpu=2,mem=1024M                        16        0
3                                             4         0
4      cpu=12,gres/gpu=3                      12        3
5                                             8         0
6                                             9         0
7      cpu=13,gres/gpu=4,gres/gpu:ret6000=2   20        4
8      mem=12G,gres/gpu=3,gres/gpu:1080ti=1   21        3
awk '
    BEGIN{FS="\t"} 
    NR==1{
        $(NF+1)="GPUAllocated"
    }
    NR>1{
        $(NF+1)=FS 0
    } 
    /gres\/gpu=/{
        split($0, a, "=")
        gp=a[3]; gsub(/[ ,].*/, "", gp)  
        $NF=FS gp
    }1' test.text 

Index  AllocTres                              CPUTotal GPUAllocated
1      cpu=1,mem=256G                         18        0
2      cpu=2,mem=1024M                        16        0
3                                             4         0
4      cpu=12,gres/gpu=3                      12        3
5                                             8         0
6                                             9         0
7      cpu=13,gres/gpu=4,gres/gpu:ret6000=2   20        4
8      mem=12G,gres/gpu=3,gres/gpu:1080ti=1   21        3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM