使用 sed/awk 和正则表达式处理日志

Question

I have 1000s of log files generated by a very verbose PHP script.我有 1000 个由非常冗长的 PHP 脚本生成的日志文件。 The general structure is as follows大体结构如下

###Unknown no of lines, which I want to ignore###
=================================================
$insert_vars['cdr_pkey']=17568
$id<TAB>$g1<TAB>$i1<tab>rating1<TAB>$g2<TAB>$i2<tab>rating2 #<TAB>more $gX,$iX,$ratingX
#numerical values of $id $g1 $i1 etc. separated by tab
#numerical values of ---""---
#I do not know how many lines will be there (unique column is $id)
=================================================
###Unknown no of lines, which I want to ignore###

I have to process these log files and create an excel sheet (I am thinking csv format) and report the data back.我必须处理这些日志文件并创建一个 excel 表（我在想 csv 格式）并将数据报告回来。 I am really bad at excel, but I thought of outputting something like:我在 excel 方面真的很糟糕，但我想输出如下内容：

cdr_pkey<TAB>id<TAB>g1<TAB>i1<TAB>rating1<TAB>g2<TAB>rating2 #and so on
17568<TAB>1349<TAB>0.0004532<TAB>0.01320<TAB>2.014E-4<TAB>...#rest of numerical values
17568<TAB>1364<TAB>...#values for id=1364
17568<TAB>1321<TAB>...#values for id=1321
...
17569<TAB>1048<TAB>...#values for id=1048
17569<TAB>1426<TAB>...#values for id=1426
...
...

So my cdr_pkey is unique column in the sheet, and for each $cdr_pkey , I have multiple $id s, each having their own set of $g1,$i1,$rating1...所以我的 cdr_pkey 是工作表中的唯一列，对于每个$cdr_pkey ，我有多个$id ，每个都有自己的一组$g1,$i1,$rating1...
After testing such format, it can be read by excel.测试该格式后，excel可以读取。 Now I just want to extend it to all those 1000s of files.现在我只想将它扩展到所有这 1000 个文件。
I am just not sure how to proceed further.我只是不确定如何进一步进行。 What's the next step?下一步是什么？

Answer 1

The following bash script does something that might be related to what you want.以下 bash 脚本执行的操作可能与您想要的有关。 It is parameterized by what you meant when you said <TAB> .当你说<TAB>时，它是由你的意思参数化的。 I assume you mean the ascii tab character, but if your logs are so verbose that they spell out <TAB> you will need to modify the variable $WHAT_DID_YOU_MEAN_BY_TAB accordingly.我假设您的意思是 ascii 制表符，但如果您的日志非常冗长以至于它们拼写出<TAB>您将需要相应地修改变量$WHAT_DID_YOU_MEAN_BY_TAB 。 Note that there is very little about this script that does The Right Thing™;请注意，这个脚本很少做 The Right Thing™； it reads the entire file into a string variable, which might not even be possible depending on how big your log files are.它将整个文件读入一个字符串变量，这取决于你的日志文件有多大。 On the up side, the script could be easily modified to make two passes, instead, if you think that's better.从好的方面来说，如果您认为这样更好，可以轻松修改脚本以进行两次传递。

#!/bin/bash

WHAT_DID_YOU_MEAN_BY_TAB='\t'

if [[ $# -ne 1 ]] ; then echo "Requires one argument: the file to process" ; exit 1 ; fi

FILENAME="$1"

RELEVANT=$(sed -n '/^==*$/,/^==*$/p' "$FILENAME" | sed '1d' | head -n '-1')
CDR_PKEY=$(echo "$RELEVANT" | \
    grep '$insert_vars\['"'cdr_pkey'\]" | \
    sed 's/.*=\(.*\)/\1/')
echo "$RELEVANT" | sed '1,2d' | \
    sed "s/.*/${CDR_PKEY}$WHAT_DID_YOU_MEAN_BY_TAB\0/"

The following find command is an example use, but your case will depend on how your logs are organized.以下find命令是一个示例使用，但您的情况将取决于您的日志的组织方式。

find. LOG_PATTERN -exec THIS_SCRIPT '{}' \;

Lastly, I have ignored the issue of putting the CSV headers on the output.最后，我忽略了将 CSV 标头放在 output 上的问题。 This is easily done out-of-band.这很容易在带外完成。

(Edit: updated the script to reflect discussion in the comments.) （编辑：更新脚本以反映评论中的讨论。）

Answer 2

EDIT: James tells me that changing the sed in last echo from ... 1d... to ... 1,2... and dropping the grep -v 'id' should do the trick.编辑：詹姆斯告诉我，将最后一个echo中的sed从... 1d...更改为... 1,2...并删除grep -v 'id'应该可以解决问题。
Confirmed that it works.确认它有效。 So changing it below.所以下面改一下。 Thanks again to James Wilcox.再次感谢詹姆斯威尔科克斯。

Based on @James script this is what I came up with. 基于@James 脚本，这就是我想出的。 I just piped the final echo to grep -v 'id' 我只是将最终的回声传送到grep -v 'id'
Thanks again James Wilcox 再次感谢詹姆斯威尔科克斯

WHAT_DID_YOU_MEAN_BY_TAB='\t' if [[ $# -lt 1 ]]; then echo "Requires at least one argument: the files to process"; exit 1; fi echo -e "key\tid\tg1\ti1\td1\tc1\tr1\tg2\ti2\td2\tc2\tr2\tg3\ti3\td3\tc3\tr3" for i in "$@" do FILENAME="$i" RELEVANT=$(sed -n '/^==*$/,/^==*$/p' "$FILENAME" | sed '1d' | head -n '-1') CDR_PKEY=$(echo "$RELEVANT" | \ grep '$insert_vars\['"'cdr_pkey'\]" | \ sed 's/.*=\(.*\)/\1/') echo "$RELEVANT" | sed '1, 2d' | \ sed "s/.*/${CDR_PKEY}$WHAT_DID_YOU_MEAN_BY_TAB\0/" #the one with grep looked like:- #echo "$RELEVANT" | sed '1d' | \ #sed "s/.*/${CDR_PKEY}$WHAT_DID_YOU_MEAN_BY_TAB\0/" | grep -v 'id' done

使用 sed/awk 和正则表达式处理日志

问题描述

2 个解决方案

解决方案1
3 已采纳 2011-08-10 07:30:59

解决方案2
1 2011-08-10 08:09:07

使用 sed/awk 和正则表达式处理日志

问题描述

2 个解决方案

解决方案1 3 已采纳 2011-08-10 07:30:59

解决方案2 1 2011-08-10 08:09:07

解决方案1
3 已采纳 2011-08-10 07:30:59

解决方案2
1 2011-08-10 08:09:07