简体   繁体   English

如何使用cut和awk命令提取表格格式的文本输入?

[英]How to use cut and awk commands to extract text input in a tabular format?

I have file input.txt as below: 我有以下文件input.txt:

filename: test1.v 文件名:test1.v

BUG: bug 102 is fixed by some user
IO_CHANGE: there is no io_change for this version
FEATURE: no feature added

filename: test2.v 文件名:test2.v

BUG: bug 103 is fixed by some user 
also bug 105 is fixed
IO_CHANGE: there is no io_change for this version
FEATURE: yes feature number 3 also feature 23
and feature 34 is added

filename: test3.v 文件名:test3.v

BUG: bug 104 is fixed by some user
FEATURE: yes feature number 2
IO_CHANGE: 

My Question:- sometimes there is a long description for BUG/FEATURE/IO_CHANGE which is coming in 2 lines or sometimes there is nothing in IO_CHANGE so it is blank. 我的问题:-有时BUG / FEATURE / IO_CHANGE的描述很长,每行2行,有时IO_CHANGE中没有任何内容,因此为空白。 Output file should have list for all bugs then features and io_changes. 输出文件应包含所有错误的列表,然后列出功能和io_changes。 Those 3 types can be in any order in input file, I need to find all bugs/features/io_changes from the file and list them column wise. 这3种类型可以在输入文件中以任何顺序排列,我需要从文件中查找所有bug /功能/ io_changes,并逐列列出它们。

在此处输入图片说明

How about this. 这个怎么样。 We store the values in an array for each file. 我们将值存储在每个文件的数组中。 Here i concatenate entries that appear on multiple rows. 在这里,我将出现在多行上的条目连接起来。

awk 'function dump() {if (vc>0) 
        print fn, vals["BUG"], vals["FEATURE"], vals["IO_CHANGE"]
    } 
    BEGIN {FS=":";OFS="\t";vc=0} 
    FNR==1 {dump();val=""; delete vals; fn=FILENAME; vc=0} 
    NF>1 {val=$1; vals[val]=vals[val] $2; vc++} 
    NF==1 {vals[val] = vals[val] " " $1} 
    END{dump()}' test*v
  1. The dump() function is what writes a record out to the file. dump()函数将记录写出到文件中。
  2. The BEGIN assigns the ":" to the field separator (so no ":" are allowed as text in fields in this solution). BEGIN将“:”分配给字段分隔符(因此在此解决方案中,字段中的文本不允许使用“:”)。 The output is delimited by tab. 输出由制表符分隔。
  3. Then at the start of each file (FNR=1) we dump records if we have any, and then we reset or collections. 然后在每个文件的开头(FNR = 1),如果有记录,我们将转储记录,然后重置或回收。
  4. Then, if a line has a ":" (which would result in NF>1) we keep track of which value we are setting and store it in the array. 然后,如果一行中有一个“:”(这将导致NF> 1),我们将跟踪所设置的值并将其存储在数组中。 If there is no ":" (making NF==1) then we just add to the last value we were adding to. 如果没有“:”(使NF == 1),那么我们将添加到最后添加的值。
  5. Finally, at the end of the last file, we dump the contents one last time. 最后,在最后一个文件的末尾,我们最后一次转储了内容。

Sets a variable if phrase is found, if one of the other phrases is found unsets it, then save the lines to array based on filename. 如果找到词组,则设置变量,如果找到其他词组之一,则将其取消设置,然后根据文件名将行保存到数组。
Removes everything before : on each line 删除:之前的所有内容
Then prints the line in columns 然后在列中打印行

#!/bin/bash

awk     'BEGIN{printf("%-8s%-60s%-60s%-20s\n\n","FILE","|BUG","|IO","|FEATURE")}
    /BUG/{a=1}/IO_CHANGE:/ || /FEATURE/{a=0} {if (a){Bug[FILENAME]=Bug[FILENAME]""$0" "}}
    /IO_CHANGE:/{b=1}/BUG/ || /FEATURE/{b=0} {if (b){IO[FILENAME]=IO[FILENAME]$0" "}}
    /FEATURE/{c=1}/IO_CHANGE:/ || /BUG/{c=0} {if (c){Feat[FILENAME]=Feat[FILENAME]$0" "}}
     END{
             for (k in Bug){
                    Bug[k] = substr(Bug[k],index(Bug[k],":"))
                    IO[k] = substr(IO[k],index(IO[k],":"))
                    Feat[k] = substr(Feat[k],index(Feat[k],":"))
                    printf("%-8s%-60s%-60s%-20s\n\n","|"k,"|"Bug[k],"|"IO[k],"|"Feat[k])}}
'  test*v

Unfortunately this wont print multiple lines for each file 不幸的是,这不会为每个文件打印多行

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM