简体   繁体   English

Perl One Liner模拟AWK脚本

[英]Perl one liner to simulate awk script

I'm new to both awk and perl , so please bear with me. 我对awkperl都是新手,所以请多多包涵。 I have the following awk script: 我有以下awk脚本:

awk '/regex1/{p = 0;} /regex2/{p = 1;} p'

What this basically does is print all lines staring from line matching with regex2 until a line matching with regex1 is found. 这基本上是打印从与regex2匹配的行开始的所有行,直到找到与regex1匹配的行。

Example: 例:

 regex1
 regex2
 line 1
 line 2
 regex1
 regex2
 regex1

Output: 输出:

 regex2
 line 1
 line 2
 regex2

Is it possible to simulate this using a perl one-liner? 是否可以使用perl模拟它? I know I can do it with a script saved in a file. 我知道我可以用保存在文件中的脚本来做到这一点。

Edit: 编辑:

A practical example: 一个实际的例子:

24 May 2017 17:00:06,827 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content 2017年5月24日17:00:06,827 [INFO] 123456(Blah:Blah1)服务名称::单行内容

24 May 2017 17:00:06,828 [INFO] 567890 (Blah : Blah1) Service-name:: Content( May span multiple lines) 2017年5月24日17:00:06,828 [INFO] 567890(Blah:Blah1)服务名称:: Content(可能跨越多行)

24 May 2017 17:00:06,829 [INFO] 123456 (Blah : Blah2) Service-name: Multiple line content. 2017年5月24日17:00:06,829 [INFO] 123456(Blah:Blah2)服务名称:多行内容。 Printing Object[ ID1=fac-adasd ID2=123231 打印对象[ID1 = fac-adasd ID2 = 123231
ID3=123108 Status=Unknown ID3 = 123108状态=未知
Code=530007 Dest=CA 代码= 530007目标= CA
] ]

24 May 2017 17:00:06,830 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content 2017年5月24日17:00:06,830 [INFO] 123456(Blah:Blah1)服务名称::单行内容

24 May 2017 17:00:06,831 [INFO] 567890 (Blah : Blah2) Service-name:: Content( May span multiple lines) 2017年5月24日17:00:06,831 [INFO] 567890(Blah:Blah2)服务名称:: Content(可能跨越多行)

Given the search key 123456 I want to extract the following: 给定搜索键123456,我想提取以下内容:

24 May 2017 17:00:06,827 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content 2017年5月24日17:00:06,827 [INFO] 123456(Blah:Blah1)服务名称::单行内容

24 May 2017 17:00:06,829 [INFO] 123456 (Blah : Blah2) Service-name: Multiple line content. 2017年5月24日17:00:06,829 [INFO] 123456(Blah:Blah2)服务名称:多行内容。 Printing Object[ ID1=fac-adasd ID2=123231 打印对象[ID1 = fac-adasd ID2 = 123231
ID3=123108 Status=Unknown ID3 = 123108状态=未知
Code=530007 Dest=CA 代码= 530007目标= CA
] ]

24 May 2017 17:00:06,830 [INFO] 123456 (Blah : Blah1) Service-name:: Single line content 2017年5月24日17:00:06,830 [INFO] 123456(Blah:Blah1)服务名称::单行内容

The following awk script does the job: 以下awk脚本可以完成此工作:
awk '/[0-9]{2}\\s\\w+\\s[0-9]{4}/{n = 0} /123456/ {n =1}n' file

perl -ne 'print if (/regex2/ .. /regex1/) =~ /^\d+$/'

This is slightly crazy, but here's how it works: 这有点疯狂,但是它是这样工作的:

  • -n adds an implicit loop over the input lines -n在输入行上添加一个隐式循环
  • the current line is in $_ 当前行在$_
  • the two bare regex matches ( /regex2/ , /regex1 /) implicitly test against $_ 这两个/regex2/则表达式匹配( /regex2//regex1 / /)对$_隐式测试
  • we use .. in scalar context, which turns it into a stateful flip-flop operator 我们在标量上下文中使用.. ,它将变成一个有状态的触发器运算符

    By that I mean: X .. Y starts out in the "false" state. 我的意思是: X .. Y以“ false”状态开始。 In the "false" state it only evaluates X . 在“假”状态下,它仅计算X If X returns a false value, it remains in the "false" state (and returns false itself). 如果X返回一个假值,则它保持“假”状态(并自身返回假)。 Once X returns a true value, it moves into the "true" state and returns true. X一旦返回真值,它将进入“真”状态并返回真。

    In the "true" state it only evaluates Y . 在“ true”状态下,它仅求值Y If Y returns false, it remains in the "true" state (and returns true itself). 如果Y返回false,则它保持“ true”状态(并自身返回true)。 Once Y returns a true value, it moves into the "false" state but it still returns true. 一旦Y返回一个真值,它将进入“假”状态,但仍返回真。

  • had we just used print if /regex2/ .. /regex1/ , it would have printed all the terminating regex1 lines, too 如果我们只是使用print if /regex2/ .. /regex1/ ,它也会打印所有终止的regex1

  • a close reading of Range Operators in perldoc perlop reveals that you can distinguish the end points of the range 仔细阅读perldoc perlop范围运算符可以发现,您可以区分范围的终点
  • the "true" value returned by .. is actually a sequence number starting from 1 , so the start of a range can be identified by checking for 1 ..返回的“ true”值实际上是从1开始的序列号,因此可以通过检查1来确定范围的开始
  • when the end of the range is reached (ie we're about to move from the "true" state to the "false" state again), the return value gets a "E0" tacked on to the end 当到达范围的末尾时(即,我们将再次从“真”状态转换为“假”状态),返回值将被附加到末尾"E0"

    Adding "E0" to an integer doesn't affect its numeric value. "E0"添加到整数不会影响其数值。 Perl implicitly converts strings to numbers when needed, and something like "5E0" is just scientific notation (meaning 5 * 10**0 , which is 5 * 1 , which is 5 ). Perl在需要时会隐式将字符串转换为数字,并且类似"5E0"名称只是科学计数法(表示5 * 10**0 ,即5 * 1 ,即5 )。

  • the "false" value returned by .. is the empty string, "" ..返回的“ false”值是空字符串""

We check that the result of .. matches the regex /^\\d+$/ , ie is all digits. 我们检查..的结果是否与正则表达式/^\\d+$/匹配,即为全数字。 This excludes the empty string (because we require at least one digit to match), so we don't print lines outside of the range. 这不包括空字符串(因为我们需要至少一位数字才能匹配),因此我们不会打印超出范围的行。 It also excludes the last line in our range, because E is not a digit. 它还排除了我们范围内的最后一行,因为E不是数字。

Not sure if awk prints both the start and end of the range, but Perl does: 不知道awk是否同时显示范围的开始和结束,但是Perl会:

perl -ne 'if(/regex2/ ... /regex1/){print}' file

Edit: Awk (at least Gnu awk) also has a range operator, so this could have been done more simply as: 编辑:Awk(至少是Gnu awk)也有一个范围运算符,所以可以这样简单地完成:

awk '/regex2/,/regex1/' file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM