[英]Format multiline output of grep command into columns adding/substituting filename as an output field
我正在嘗試將多行 egrep 查詢的 output 格式化為 CSV 兼容格式。
我需要從大量文件中獲取一些值(其中一些可能不包含我要查找的值)
我使用的 grep 命令是:
grep -e Name -e Type -e Schedule -e Pool -e Storage \*|awk -F' = ' '{print $1,$2}'|sort
這將返回 output,例如:
IRVLinuxDefault.cfg: Name "IRVLinuxDefault"
IRVLinuxDefault.cfg: Pool "IRV_DD890_Full60"
IRVLinuxDefault.cfg: Schedule "IRV_Backups"
IRVLinuxDefault.cfg: Storage "IRV_SD_DD890"
IRVLinuxDefault.cfg: Type "Backup"
LVS_60Day_NDMP_Defs.cfg: Name "LVS_60Day_NDMP_Defs"
LVS_60Day_NDMP_Defs.cfg: Pool "LVS_DD_AV_NDMP"
LVS_60Day_NDMP_Defs.cfg: Schedule "LVS_NDMP_Monthly"
LVS_60Day_NDMP_Defs.cfg: Storage "LVS_SD_DD990_AV_NDMP"
LVS_60Day_NDMP_Defs.cfg: Type "Backup"
LVS_60Day_NDMP_NOFileSet_Defs.cfg: Name "LVS_60Day_NDMP_NOFileSet_Defs"
LVS_60Day_NDMP_NOFileSet_Defs.cfg: Pool "LVS_DD_AV_NDMP"
LVS_60Day_NDMP_NOFileSet_Defs.cfg: Schedule "LVS_NDMP_Monthly"
LVS_60Day_NDMP_NOFileSet_Defs.cfg: Storage "LVS_SD_DD990_AV_NDMP"
LVS_60Day_NDMP_NOFileSet_Defs.cfg: Type "Backup"
LVS_Datalake2_Defs.cfg: Name "LVS_Datalake2_Defs"
LVS_Datalake2_Defs.cfg: Pool "LVS_WAS_SD101_13Mo-cloud"
LVS_Datalake2_Defs.cfg: Schedule "WeeklyCycle"
LVS_Datalake2_Defs.cfg: Storage "LVS_WAS_SD101_13Mo-cloud"
LVS_Datalake2_Defs.cfg: Type "Backup"
我正在嘗試 output 這些值字段的格式:FILE,NAME,NAME,POOL,SCHEDULE,STORAGE,TYPE 每列都有一列 header。 如果其中一個文件不包含 grepped for 值之一,我想 output 在該空間中創建一條空記錄。
我想要的 output 看起來像 csv(下面的示例),去掉任何 "'s 或:'(注意所需的 output 的第 3 行缺少 Pool 字段,因此有 2 個逗號用於保留空單元格):
FILE,NAME,POOL,SCHEDULE,STORAGE,TYPE
IRVLinuxDefault.cfg,IRVLinuxDefault,IRV_DD890_Full60,IRV_Backups,IRV_SD_DD890,Backup
LVS_60Day_NDMP_Defs.cfg,LVS_DD_AV_NDMP,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup
LVS_60Day_NDMP_NOFileSet_Defs.cfg,,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup
我用 awk、sed、GNU datamash(轉置)嘗試了多種方法,但我運氣不好。
有什么建議么?
grep -e Name -e Type -e Schedule -e Pool -e Storage *|awk -F' = ' '{print $1,$2}'|sort|datamash transpose
WindowsDefault_open.cfg: Name "WindowsDefault_Open" WindowsDefault_open.cfg: Pool "BO3_SD01_DD_OPEN1_60Day" WindowsDefault_open.cfg: Schedule "WeeklyCycle"
WindowsDefault_open.cfg: Storage "BO3_SD01_DD990_OPEN1" WindowsDefault_open.cfg: Type "Backup"
Windows_MBS_SD01_Default.cfg: Name "Windows_MBS_SD01_Default" Windows_MBS_SD01_Default.cfg: Pool "BO3_SD03_DD_V164_OPEN1_60day" Windows_MBS_SD01_Default.cfg: Schedule "MonthlyCycle" Windows_MBS_SD01_Default.cfg: Storage "BO3_SD03_DD990_OPEN1_V164" Windows_MBS_SD01_Default.cfg: Type "Backup"
grep -e Name -e Type -e Schedule -e Pool -e Storage *|awk -F' = ' '{print $1,$2}'|sort|awk '{print $1,$2,$3,$4,$5,$6};'|datamash transpose
1: Name "BO3_Isi_gda_spark_60day_NDMP_Defs" 1!: Name "BO3_vg8-2_2_ucqa-ws_60day_NDMP_Defs" 1: Pool "File" 1!: Pool "File" 1: Schedule "BO3_Prod_Schedule" 1!: Schedule "BO3_Prod_Schedule" 1: Storage "BO3_SD01DD990_NDMP1" 1!: Storage "BO3_SD01_DD990_NDMP1" 1: Type "Backup" 1!: Type "Backup" AM4WS3LinuxDefault.cfg: Name "AM4WS3LinuxDefault" AM4WS3LinuxDefault.cfg: Pool "AM4_SD01_WasabiS3-cloud" AM4WS3LinuxDefault.cfg: Schedule "MonthlyCycle" AM4WS3LinuxDefault.cfg: Storage "AM4_SD01_WasabiS3-cloud" AM4WS3LinuxDefault.cfg: Type "Backup" AM4WS3WindowsDefault.cfg: Name "AM4WS3WindowsDefault" AM4WS3WindowsDefault.cfg: Pool "AM4SD01_WasabiS3-cloud" AM4WS3WindowsDefault.cfg: Schedule "MonthlyCycle" AM4WS3WindowsDefault.cfg: Storage "AM4_SD01_WasabiS3-cloud" AM4WS3WindowsDefault.cfg: Type "Backup" backups_BO3LinuxDefault.cfg: Name "backups_BO3LinuxDefault" backups_BO3LinuxDefault.cfg: Pool "BO3_SD01_2MO" backups_BO3LinuxDefault.cfg: Schedule "MonthlyCycle"
對於未提供預期值的情況,此腳本將允許您指定要替換的字符串。
它還會因地制宜,允許您指定分隔符(用於輸入)以提取所需的變量值。
注意:由於與 awk 語法沖突,您不能使用單引號/雙引號作為拆分 function 的分隔符,因此我在您提供的輸入和將其轉換為所需的 output 的腳本之間使用了 sed。
#!/bin/bash
### Original command
#grep -e Name -e Type -e Schedule -e Pool -e Storage \*|awk -F' = ' '{print $1,$2}'|sort
sample="grepOutput.txt"
cat >"${sample}" <<"EnDoFiNpUt"
IRVLinuxDefault.cfg: Name "IRVLinuxDefault"
IRVLinuxDefault.cfg: Pool "IRV_DD890_Full60"
IRVLinuxDefault.cfg: Schedule "IRV_Backups"
IRVLinuxDefault.cfg: Storage "IRV_SD_DD890"
IRVLinuxDefault.cfg: Type "Backup"
LVS_60Day_NDMP_Defs.cfg: Name "LVS_60Day_NDMP_Defs"
LVS_60Day_NDMP_Defs.cfg: Pool "LVS_DD_AV_NDMP"
LVS_60Day_NDMP_Defs.cfg: Schedule "LVS_NDMP_Monthly"
LVS_60Day_NDMP_Defs.cfg: Storage "LVS_SD_DD990_AV_NDMP"
LVS_60Day_NDMP_Defs.cfg: Type "Backup"
LVS_60Day_NDMP_NOFileSet_Defs.cfg: Name "LVS_60Day_NDMP_NOFileSet_Defs"
LVS_60Day_NDMP_NOFileSet_Defs.cfg: Pool "LVS_DD_AV_NDMP"
LVS_60Day_NDMP_NOFileSet_Defs.cfg: Schedule "LVS_NDMP_Monthly"
LVS_60Day_NDMP_NOFileSet_Defs.cfg: Storage "LVS_SD_DD990_AV_NDMP"
LVS_60Day_NDMP_NOFileSet_Defs.cfg: Type "Backup"
LVS_Datalake2_Defs.cfg: Name "LVS_Datalake2_Defs"
LVS_Datalake2_Defs.cfg: Pool "LVS_WAS_SD101_13Mo-cloud"
LVS_Datalake2_Defs.cfg: Schedule "WeeklyCycle"
LVS_Datalake2_Defs.cfg: Storage "LVS_WAS_SD101_13Mo-cloud"
LVS_Datalake2_Defs.cfg: Type "Backup"
EnDoFiNpUt
### cat emulates original grep command output
cat "${sample}" | sed 's+\"+\|+g' |
awk -v delim='|' -v defval="" 'BEGIN{
printf("FILENAME,NAME,POOL,SCHEDULE,STORAGE,TYPE") ;
lastFN="" ;
}
{
pos=index($0,":") ;
if( pos > 0 ){
FN=substr($0, 1, pos-1) ;
split($0, vals, delim );
if( FN != lastFN ){
printf("\n%s", FN) ;
lastFN=FN ;
} ;
if( vals[2] == "" ){
printf(",%s", defval ) ;
}else{
printf(",%s", vals[2] ) ;
} ;
} ;
}
END{
print "" ;
}'
Output 看起來像這樣:
FILENAME,NAME,POOL,SCHEDULE,STORAGE,TYPE
IRVLinuxDefault.cfg,IRVLinuxDefault,IRV_DD890_Full60,IRV_Backups,IRV_SD_DD890,Backup
LVS_60Day_NDMP_Defs.cfg,LVS_60Day_NDMP_Defs,LVS_DD_AV_NDMP,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup
LVS_60Day_NDMP_NOFileSet_Defs.cfg,LVS_60Day_NDMP_NOFileSet_Defs,LVS_DD_AV_NDMP,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup
LVS_Datalake2_Defs.cfg,LVS_Datalake2_Defs,LVS_WAS_SD101_13Mo-cloud,WeeklyCycle,LVS_WAS_SD101_13Mo-cloud,Backup
一旦awk
成為解決方案的一部分,通常就不需要grep
。
逆向工程 OP 的grep|awk|sort
output 到一些示例文件中:
$ head *.cfg
==> IRVLinuxDefault.cfg <==
Name = "IRVLinuxDefault"
Pool = "IRV_DD890_Full60"
Schedule = "IRV_Backups"
Storage = "IRV_SD_DD890"
Type = "Backup"
==> LVS_60Day_NDMP_Defs.cfg <==
Name = "LVS_60Day_NDMP_Defs"
Pool = "LVS_DD_AV_NDMP"
Schedule = "LVS_NDMP_Monthly"
Storage = "LVS_SD_DD990_AV_NDMP"
Type = "Backup"
==> LVS_60Day_NDMP_NOFileSet_Defs.cfg <== # NOTE: missing an entry for "Pool"
Name = "LVS_60Day_NDMP_NOFileSet_Defs"
Schedule = "LVS_NDMP_Monthly"
Storage = "LVS_SD_DD990_AV_NDMP"
Type = "Backup"
==> LVS_Datalake2_Defs.cfg <==
Name = "LVS_Datalake2_Defs"
Pool = "LVS_WAS_SD101_13Mo-cloud"
Schedule = "WeeklyCycle"
Storage = "LVS_WAS_SD101_13Mo-cloud"
Type = "Backup"
一個awk
想法:
awk '
function print_record( ) {
if (fname)
print fname,record["name"],record["pool"],record["schedule"],record["storage"],record["type"]
delete record # clear previous line contents
}
BEGIN { OFS=","
hdr="FILE,NAME,POOL,SCHEDULE,STORAGE,TYPE"
print hdr
n=split(tolower(hdr),a,",") # build array of field names
for (i=2;i<=n;i++) # convert field names to ...
fields[a[i]] # associative array indices
}
FNR==1 { print_record() # print previous file contents
fname=FILENAME
}
{ split($0,a,"\"") # split line on double quotes
key=tolower($1) # need lowercase field name to match fields[] array indices
}
key in fields { record[key]=a[2] } # if 1st field is an index in fields[] array then save the 2nd double-quote delimited field
END { print_record() } # flush last file contents to stdout
' *cfg > all.csv
這會產生:
$ cat all.csv
FILE,NAME,POOL,SCHEDULE,STORAGE,TYPE
IRVLinuxDefault.cfg,IRVLinuxDefault,IRV_DD890_Full60,IRV_Backups,IRV_SD_DD890,Backup
LVS_60Day_NDMP_Defs.cfg,LVS_60Day_NDMP_Defs,LVS_DD_AV_NDMP,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup
LVS_60Day_NDMP_NOFileSet_Defs.cfg,LVS_60Day_NDMP_NOFileSet_Defs,,LVS_NDMP_Monthly,LVS_SD_DD990_AV_NDMP,Backup
LVS_Datalake2_Defs.cfg,LVS_Datalake2_Defs,LVS_WAS_SD101_13Mo-cloud,WeeklyCycle,LVS_WAS_SD101_13Mo-cloud,Backup
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.