简体   繁体   English

AWK如何定义线?

[英]How does AWK define a line?

I was trying to parse a log file using AWK, 我试图使用AWK解析日志文件,

test.log 测试日志

[12/12/18 11:54:54:321 PST] 0000077c WC_SERVER     < com.ibm.commerce.server.HttpRequestWrapper setAttribute(String,Object) Exit
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO < -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.createContextSPI(ActivityToken, ActivityData, String) Exit
                                 com.ibm.commerce.context.base.BaseContext : [bInitialize = false][bRecalibrate = false][inCallerId = null][inRunAsId = null][inStoreId = null][istrChannelId = null][bDirty = false][bRequestStarted = false][iOriginalSerializedString = null][iToken = null]
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.loadContextData(ActivityToken, String) Entry
                                 68884:false:false:0
                                 com.ibm.commerce.context.base.BaseContext
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextservice.commands.ContextDataSerValueCacheCmdImpl.myPerformExecute() Entry
                                 68884

the idea is, if a line starts with [ , and it matches the pattern, then print out the line and also the following line, which does not start with [ . 这个想法是,如果一行以[开头,并且与模式匹配,那么请打印出该行以及不以[开头的下一行。

Expected result: 预期结果:

[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO < -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.createContextSPI(ActivityToken, ActivityData, String) Exit
                                 com.ibm.commerce.context.base.BaseContext : [bInitialize = false][bRecalibrate = false][inCallerId = null][inRunAsId = null][inStoreId = null][istrChannelId = null][bDirty = false][bRequestStarted = false][iOriginalSerializedString = null][iToken = null]
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.loadContextData(ActivityToken, String) Entry
                                 68884:false:false:0
                                 com.ibm.commerce.context.base.BaseContext
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextservice.commands.ContextDataSerValueCacheCmdImpl.myPerformExecute() Entry
                                 68884

AWK: AWK:

awk 'BEGIN{IGNORECASE = 1; flag = 0;}{ if($0 ~ /^\[/){if($0 ~ /WC_BUSINESSCO/){flag=1}else{flag = 0}; if(flag==1){print $0}}}' test.log

Current output: 电流输出:

[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO < -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.createContextSPI(ActivityToken, ActivityData, String) Exit
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.loadContextData(ActivityToken, String) Entry
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextservice.commands.ContextDataSerValueCacheCmdImpl.myPerformExecute() Entry

As you can see lines that don't start with [ aren't printed; 如您所见,未打印以[开头的行; after some debugging, it seems that AWK believes the issue lines is part or the pattern matched line. 经过调试后,AWK似乎认为问题行是部分或模式匹配行。 It's not printing out due to wrap issue, I guess. 我猜是因为包装问题没有打印出来。

How can I fix this? 我怎样才能解决这个问题?

You're overdoing it. 你太过分了。

awk '/^\[/{p=/WC_BUSINESSCO/}p' test.log
  • /^\\[/ means perform the following action ( {...} ) if current record begins with a [ , /^\\[/表示如果当前记录以[ ,开始,请执行以下操作( {...}
  • p=/WC_BUSINESSCO/ sets p true if current record contains WC_BUSINESSCO and vice versa, 如果当前记录包含WC_BUSINESSCO ,则p=/WC_BUSINESSCO/p设置为true,反之亦然,
  • p at the end means print current record if p is true. 如果p为true,则末尾的p表示打印当前记录。
  • if the current line does not start with [ , then the p value from the previous line remains. 如果当前行不是[开头,那么一行的p值将保留。

For further information, see man awk . 有关更多信息,请参见man awk

For clarity, some additional whitespace: 为了清楚起见,一些其他空格:

awk '
    /^\[/ { p = /WC_BUSINESSCO/ }
    p
' test.log

How does awk define a line? awk如何定义线?

Awk does not have any knowledge of what a line is. Awk对线是什么一无所知。 Awk knows the concept records and fields . Awk知道概念记录字段

Files are split in records where consecutive records are split by the record separator RS . 文件被分成记录 ,其中连续记录由记录分隔符RS分开。 Each record is split in fields, where consecutive fields are split by the field separator FS . 每个记录都拆分为多个字段,其中连续的字段由字段分隔符FS拆分。

By default, the record separator RS is set to be the <newline> character ( \\n ) and thus each record is a line. 默认情况下,记录分隔符RS设置为<newline>字符( \\n ),因此每个记录都是一行。 The record separator has the following definition: 记录分隔符具有以下定义:

RS : The first character of the string value of RS shall be the input record separator; RS的字符串值的第一个字符RS应输入记录分隔符; a <newline> by default. 默认情况下为<newline>。 If RS contains more than one character, the results are unspecified. 如果RS包含多个字符,则结果不确定。 If RS is null, then records are separated by sequences consisting of a <newline> plus one or more blank lines, leading or trailing blank lines shall not result in empty records at the beginning or end of the input, and a <newline> shall always be a field separator, no matter what the value of FS is. 如果RS为空,则记录由由<newline>加上一个或多个空行组成的序列分隔,开头或结尾的空行在输入的开头或结尾不应导致空记录,而<newline>应不管FS的值是多少,始终是字段分隔符。

How can I now define a multi-line record? 现在如何定义多行记录?

For multi-line records where the start of a record cannot uniquely be identified by a single character, you might want to make use of gawk or any awk version where RS is can be multiple characters (or a regular expression). 对于无法通过单个字符唯一标识记录开始的多行记录,您可能要使用gawkRS可以是多个字符(或正则表达式)的任何awk版本。 In case of the OP, you can define RS as \\n\\[ : 对于OP,可以将RS定义为\\n\\[

awk 'BEGIN { RS="\n\[" }/WC_BUSINESSCO/ { print (NR==1 ? "" : "[") $0 }' file

If you do not have access to such a version of awk, and you have to stick to POSIX, you can do: 如果您无权访问此类版本的awk,并且必须坚持使用POSIX,则可以执行以下操作:

awk '/^\[/ && (rec ~ /WC_BUSINESSCO/) { printf rec; } # process record
     /^\[/ { rec="" }                                 # initialise record
     { rec = rec $0 ORS }                             # build record
     END { if (rec ~ /WC_BUSINESSCO/) printf rec }    # process last record
    ' file

This will match "WC_BUSINESSCO" in the full record, and not only the first line as is done in most solutions here. 这将与完整记录中的“ WC_BUSINESSCO”匹配,而不仅仅是大多数解决方案中的第一行。 While for the OP, the first line might be enough. 对于OP,第一行可能就足够了。 More general questions might have a problem with this. 更一般的问题可能与此有关。

You said: then print out the line and also the following line . 您说过: 然后打印出该行以及下一行

Try this instead: 尝试以下方法:

awk '/^\[.*WC_BUSINESSCO/{print;getline;print}' test.log

The flow is quite simple when the pattern matches print the line, get the next one and print again. 当模式匹配时,流程很简单,打印一行,得到下一行,然后再次打印。

To get all the lines after the one that starts with [ : 要获得以[ :开头的那一行之后的所有行

awk '/^\[/{i=0}/WC_BUSINESSCO/{i=1}i' test.log

Check this . 检查一下

with GNU- awk you can define the record separator as you specified 使用GNU- awk您可以按指定的方式定义记录分隔符

$ awk -v RS='(^|\n)\\[' '/WC_BUSINESSCO/{print RT $0}' file

with the pattern match print the record (possibly multi-line), but with the record separator prefixed to the record. 模式匹配时,打印记录(可能是多行),但记录分隔符作为记录的前缀。

with other awk s the workaround 与其他awk的解决方法

$ awk '/^\[/{if(/WC_BUSINESSCO/){print; p=1} else p=0} p&&!/^\[/' file

If you are considering Perl, then this is a generic solution based on your requirements. 如果您正在考虑Perl,那么这是根据您的要求的通用解决方案。 Note that it doesn't hard-code any text (eg WC_BUSINESSCO ) from the file for the solution. 请注意,它不会为解决方案从文件中硬编码任何文本(例如WC_BUSINESSCO)。

/tmp> cat test.log
[12/12/18 11:54:54:321 PST] 0000077c WC_SERVER     < com.ibm.commerce.server.HttpRequestWrapper setAttribute(String,Object) Exit
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO < -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.createContextSPI(ActivityToken, ActivityData, String) Exit
                                 com.ibm.commerce.context.base.BaseContext : [bInitialize = false][bRecalibrate = false][inCallerId = null][inRunAsId = null][inStoreId = null][istrChannelId = null][bDirty = false][bRequestStarted = false][iOriginalSerializedString = null][iToken = null]
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.loadContextData(ActivityToken, String) Entry
                                 68884:false:false:0
                                 com.ibm.commerce.context.base.BaseContext
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextservice.commands.ContextDataSerValueCacheCmdImpl.myPerformExecute() Entry
                                 68884
/tmp> perl -ne ' print "$t$p" if $x and /^\[/ ;if(!/^\[/) { $x++;$t.=$p} if(/^\[/) { $x=0;$t=""} $p=$_;END { print "$t$p" if $x }' test.log
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO < -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.createContextSPI(ActivityToken, ActivityData, String) Exit
                                 com.ibm.commerce.context.base.BaseContext : [bInitialize = false][bRecalibrate = false][inCallerId = null][inRunAsId = null][inStoreId = null][istrChannelId = null][bDirty = false][bRequestStarted = false][iOriginalSerializedString = null][iToken = null]
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.loadContextData(ActivityToken, String) Entry
                                 68884:false:false:0
                                 com.ibm.commerce.context.base.BaseContext
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextservice.commands.ContextDataSerValueCacheCmdImpl.myPerformExecute() Entry
                                 68884
/tmp>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM