簡體   English   中英

將 grep 命令的多行 output 格式化為列添加/替換文件名作為 output 字段

[英]Format multiline output of grep command into columns adding/substituting filename as an output field

我正在嘗試將多行 egrep 查詢的 output 格式化為 CSV 兼容格式。

我需要從大量文件中獲取一些值(其中一些可能不包含我要查找的值)

我使用的 grep 命令是:

grep -e Name -e Type -e Schedule -e Pool -e Storage \*|awk -F' = '  '{print $1,$2}'|sort

這將返回 output,例如:

IRVLinuxDefault.cfg:  Name "IRVLinuxDefault"
IRVLinuxDefault.cfg:  Pool "IRV_DD890_Full60"
IRVLinuxDefault.cfg:  Schedule "IRV_Backups"
IRVLinuxDefault.cfg:  Storage "IRV_SD_DD890"
IRVLinuxDefault.cfg:  Type "Backup"
LVS_60Day_NDMP_Defs.cfg:  Name "LVS_60Day_NDMP_Defs"
LVS_60Day_NDMP_Defs.cfg:  Pool "LVS_DD_AV_NDMP"
LVS_60Day_NDMP_Defs.cfg:  Schedule "LVS_NDMP_Monthly"
LVS_60Day_NDMP_Defs.cfg:  Storage "LVS_SD_DD990_AV_NDMP"
LVS_60Day_NDMP_Defs.cfg:  Type "Backup"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Name "LVS_60Day_NDMP_NOFileSet_Defs"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Pool "LVS_DD_AV_NDMP"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Schedule "LVS_NDMP_Monthly"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Storage "LVS_SD_DD990_AV_NDMP"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Type "Backup"
LVS_Datalake2_Defs.cfg:  Name "LVS_Datalake2_Defs"
LVS_Datalake2_Defs.cfg:  Pool "LVS_WAS_SD101_13Mo-cloud"
LVS_Datalake2_Defs.cfg:  Schedule "WeeklyCycle"
LVS_Datalake2_Defs.cfg:  Storage "LVS_WAS_SD101_13Mo-cloud"
LVS_Datalake2_Defs.cfg:  Type "Backup"

我正在嘗試 output 這些值字段的格式:FILE,NAME,NAME,POOL,SCHEDULE,STORAGE,TYPE 每列都有一列 header。 如果其中一個文件不包含 grepped for 值之一,我想 output 在該空間中創建一條空記錄。

想要的 output 看起來像 csv(下面的示例),去掉任何 "'s 或:'(注意所需的 output 的第 3 行缺少 Pool 字段,因此有 2 個逗號用於保留空單元格):

FILE,NAME,POOL,SCHEDULE,STORAGE,TYPE  
IRVLinuxDefault.cfg,IRVLinuxDefault,IRV_DD890_Full60,IRV_Backups,IRV_SD_DD890,Backup  
LVS_60Day_NDMP_Defs.cfg,LVS_DD_AV_NDMP,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup  
LVS_60Day_NDMP_NOFileSet_Defs.cfg,,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup

我用 awk、sed、GNU datamash(轉置)嘗試了多種方法,但我運氣不好。

有什么建議么?


grep -e Name -e Type -e Schedule -e Pool -e Storage *|awk -F' = '  '{print $1,$2}'|sort|datamash transpose
WindowsDefault_open.cfg:  Name "WindowsDefault_Open"        WindowsDefault_open.cfg:  Pool "BO3_SD01_DD_OPEN1_60Day"        WindowsDefault_open.cfg:  Schedule "WeeklyCycle"    
WindowsDefault_open.cfg:  Storage "BO3_SD01_DD990_OPEN1"        WindowsDefault_open.cfg:  Type "Backup"     
Windows_MBS_SD01_Default.cfg:  Name "Windows_MBS_SD01_Default"  Windows_MBS_SD01_Default.cfg:  Pool "BO3_SD03_DD_V164_OPEN1_60day"  Windows_MBS_SD01_Default.cfg:  Schedule "MonthlyCycle"  Windows_MBS_SD01_Default.cfg:  Storage "BO3_SD03_DD990_OPEN1_V164"  Windows_MBS_SD01_Default.cfg:  Type "Backup"

grep -e Name -e Type -e Schedule -e Pool -e Storage *|awk -F' = '  '{print $1,$2}'|sort|awk '{print $1,$2,$3,$4,$5,$6};'|datamash transpose
1: Name "BO3_Isi_gda_spark_60day_NDMP_Defs"     1!: Name "BO3_vg8-2_2_ucqa-ws_60day_NDMP_Defs"          1: Pool "File"     1!: Pool "File"          1: Schedule "BO3_Prod_Schedule"         1!: Schedule "BO3_Prod_Schedule"        1: Storage "BO3_SD01DD990_NDMP1"           1!: Storage "BO3_SD01_DD990_NDMP1"      1: Type "Backup"        1!: Type "Backup"       AM4WS3LinuxDefault.cfg: Name "AM4WS3LinuxDefault"           AM4WS3LinuxDefault.cfg: Pool "AM4_SD01_WasabiS3-cloud"          AM4WS3LinuxDefault.cfg: Schedule "MonthlyCycle"     AM4WS3LinuxDefault.cfg: Storage "AM4_SD01_WasabiS3-cloud"       AM4WS3LinuxDefault.cfg: Type "Backup"       AM4WS3WindowsDefault.cfg: Name "AM4WS3WindowsDefault"           AM4WS3WindowsDefault.cfg: Pool "AM4SD01_WasabiS3-cloud"    AM4WS3WindowsDefault.cfg: Schedule "MonthlyCycle"       AM4WS3WindowsDefault.cfg: Storage "AM4_SD01_WasabiS3-cloud"         AM4WS3WindowsDefault.cfg: Type "Backup"         backups_BO3LinuxDefault.cfg: Name "backups_BO3LinuxDefault"         backups_BO3LinuxDefault.cfg: Pool "BO3_SD01_2MO"        backups_BO3LinuxDefault.cfg: Schedule "MonthlyCycle"

對於未提供預期值的情況,此腳本將允許您指定要替換的字符串

它還會因地制宜,允許您指定分隔符(用於輸入)以提取所需的變量值。

注意:由於與 awk 語法沖突,您不能使用單引號/雙引號作為拆分 function 的分隔符,因此我在您提供的輸入和將其轉換為所需的 output 的腳本之間使用了 sed。

#!/bin/bash

### Original command
#grep -e Name -e Type -e Schedule -e Pool -e Storage \*|awk -F' = '  '{print $1,$2}'|sort

sample="grepOutput.txt"

cat >"${sample}" <<"EnDoFiNpUt"
IRVLinuxDefault.cfg:  Name "IRVLinuxDefault"
IRVLinuxDefault.cfg:  Pool "IRV_DD890_Full60"
IRVLinuxDefault.cfg:  Schedule "IRV_Backups"
IRVLinuxDefault.cfg:  Storage "IRV_SD_DD890"
IRVLinuxDefault.cfg:  Type "Backup"
LVS_60Day_NDMP_Defs.cfg:  Name "LVS_60Day_NDMP_Defs"
LVS_60Day_NDMP_Defs.cfg:  Pool "LVS_DD_AV_NDMP"
LVS_60Day_NDMP_Defs.cfg:  Schedule "LVS_NDMP_Monthly"
LVS_60Day_NDMP_Defs.cfg:  Storage "LVS_SD_DD990_AV_NDMP"
LVS_60Day_NDMP_Defs.cfg:  Type "Backup"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Name "LVS_60Day_NDMP_NOFileSet_Defs"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Pool "LVS_DD_AV_NDMP"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Schedule "LVS_NDMP_Monthly"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Storage "LVS_SD_DD990_AV_NDMP"
LVS_60Day_NDMP_NOFileSet_Defs.cfg:  Type "Backup"
LVS_Datalake2_Defs.cfg:  Name "LVS_Datalake2_Defs"
LVS_Datalake2_Defs.cfg:  Pool "LVS_WAS_SD101_13Mo-cloud"
LVS_Datalake2_Defs.cfg:  Schedule "WeeklyCycle"
LVS_Datalake2_Defs.cfg:  Storage "LVS_WAS_SD101_13Mo-cloud"
LVS_Datalake2_Defs.cfg:  Type "Backup"
EnDoFiNpUt

### cat emulates original grep command output
cat "${sample}" | sed 's+\"+\|+g' |
awk -v delim='|' -v defval="" 'BEGIN{
    printf("FILENAME,NAME,POOL,SCHEDULE,STORAGE,TYPE") ;
    lastFN="" ;
}
{
    pos=index($0,":") ;
    if( pos > 0 ){
        FN=substr($0, 1, pos-1) ;
        split($0, vals, delim );

        if( FN != lastFN ){
            printf("\n%s", FN) ;
            lastFN=FN ;
        } ;
        if( vals[2] == "" ){
            printf(",%s", defval ) ;
        }else{
            printf(",%s", vals[2] ) ;
        } ;
    } ;
}
END{
    print "" ;
}'

Output 看起來像這樣:

FILENAME,NAME,POOL,SCHEDULE,STORAGE,TYPE
IRVLinuxDefault.cfg,IRVLinuxDefault,IRV_DD890_Full60,IRV_Backups,IRV_SD_DD890,Backup
LVS_60Day_NDMP_Defs.cfg,LVS_60Day_NDMP_Defs,LVS_DD_AV_NDMP,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup
LVS_60Day_NDMP_NOFileSet_Defs.cfg,LVS_60Day_NDMP_NOFileSet_Defs,LVS_DD_AV_NDMP,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup
LVS_Datalake2_Defs.cfg,LVS_Datalake2_Defs,LVS_WAS_SD101_13Mo-cloud,WeeklyCycle,LVS_WAS_SD101_13Mo-cloud,Backup

一旦awk成為解決方案的一部分,通常就不需要grep

逆向工程 OP 的grep|awk|sort output 到一些示例文件中:

$ head *.cfg
==> IRVLinuxDefault.cfg <==
  Name = "IRVLinuxDefault"
  Pool = "IRV_DD890_Full60"
  Schedule = "IRV_Backups"
  Storage = "IRV_SD_DD890"
  Type = "Backup"

==> LVS_60Day_NDMP_Defs.cfg <==
  Name = "LVS_60Day_NDMP_Defs"
  Pool = "LVS_DD_AV_NDMP"
  Schedule = "LVS_NDMP_Monthly"
  Storage = "LVS_SD_DD990_AV_NDMP"
  Type = "Backup"

==> LVS_60Day_NDMP_NOFileSet_Defs.cfg <==                   # NOTE: missing an entry for "Pool"
  Name = "LVS_60Day_NDMP_NOFileSet_Defs"
  Schedule = "LVS_NDMP_Monthly"
  Storage = "LVS_SD_DD990_AV_NDMP"
  Type = "Backup"

==> LVS_Datalake2_Defs.cfg <==
  Name = "LVS_Datalake2_Defs"
  Pool = "LVS_WAS_SD101_13Mo-cloud"
  Schedule = "WeeklyCycle"
  Storage = "LVS_WAS_SD101_13Mo-cloud"
  Type = "Backup"

一個awk想法:

awk '

function print_record(  ) {
    if (fname)
        print fname,record["name"],record["pool"],record["schedule"],record["storage"],record["type"]

    delete record                                                   # clear previous line contents
}

BEGIN         { OFS=","

                hdr="FILE,NAME,POOL,SCHEDULE,STORAGE,TYPE"
                print hdr

                n=split(tolower(hdr),a,",")                         # build array of field names
                for (i=2;i<=n;i++)                                  # convert field names to ...
                    fields[a[i]]                                    # associative array indices
              }

FNR==1        { print_record()                                      # print previous file contents
                fname=FILENAME
              }

              { split($0,a,"\"")                                    # split line on double quotes
                key=tolower($1)                                     # need lowercase field name to match fields[] array indices
              }

key in fields { record[key]=a[2] }                                  # if 1st field is an index in fields[] array then save the 2nd double-quote delimited field

END           { print_record()   }                                  # flush last file contents to stdout
' *cfg > all.csv

這會產生:

$ cat all.csv
FILE,NAME,POOL,SCHEDULE,STORAGE,TYPE
IRVLinuxDefault.cfg,IRVLinuxDefault,IRV_DD890_Full60,IRV_Backups,IRV_SD_DD890,Backup
LVS_60Day_NDMP_Defs.cfg,LVS_60Day_NDMP_Defs,LVS_DD_AV_NDMP,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup
LVS_60Day_NDMP_NOFileSet_Defs.cfg,LVS_60Day_NDMP_NOFileSet_Defs,,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup
LVS_Datalake2_Defs.cfg,LVS_Datalake2_Defs,LVS_WAS_SD101_13Mo-cloud,WeeklyCycle,LVS_WAS_SD101_13Mo-cloud,Backup

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM