简体   繁体   English

awk:行尾与记录尾

[英]awk: end of line vs end of record

I am trying to match number at the end of the line ($), print relevant paragraphs and ignore third paragraph.我试图匹配行尾的数字 ($),打印相关段落并忽略第三段。 Here is data:这是数据:

this is first paragraph
number 200
with some text

this is second paragraph
with some text
number 200

this is third paragraph
with some text
number 2001

This command matches only first paragraph: awk -v RS="" -v ORS="\n\n" "/number 200\n/" file此命令仅匹配第一段: awk -v RS="" -v ORS="\n\n" "/number 200\n/" file

This command matches only second paragraph: awk -v RS="" -v ORS="\n\n" "/number 200$/" file此命令仅匹配第二段: awk -v RS="" -v ORS="\n\n" "/number 200$/" file

Seems the problem is that awk understands character "$" as end of record instead of line.似乎问题在于 awk 将字符“$”理解为记录结尾而不是行。 Is there some elegant way how to overcome this?有什么优雅的方法可以克服这个问题吗? Unfortunately I do not have grep that can work with paragraphs.不幸的是,我没有可以处理段落的 grep。

UPDATE:更新:

Expected output:预计 output:

this is first paragraph
number 200
with some text

this is second paragraph
with some text
number 200

You should include the desired output in your question.您应该在问题中包含所需的 output。

However, if I understand you may want record 1 & 2 but not 3:但是,据我了解,您可能需要记录 1 和 2 而不是记录 3:

awk -v RS='' -v ORS='\n\n' '{s=$0; gsub("\n", " ", s); if (s ~ /number 200( |$)/) print}' file

Using any awk:使用任何 awk:

$ awk -v RS= -v ORS='\n\n' '/(^|\n)number 200(\n|$)/' file
this is first paragraph
number 200
with some text

this is second paragraph
with some text
number 200

Regarding Seems the problem is that awk understands character "$" as end of record instead of line - that's not a problem, that's the definition of $ .关于Seems the problem is that awk understands character "$" as end of record instead of line - 这不是问题,这就是$的定义。 In a regexp $ means end of string , it only appears to mean end of line if the string you're matching against just happens to be a single line, eg as read by grep, sed, and awk by default.在正则表达式中$表示end of string结尾,如果您匹配的字符串恰好是end of line ,例如默认情况下由 grep、sed 和 awk 读取。 When you're matching against a string containing multiple lines (eg using -z in GNU grep or GNU sed or RS="" in awk or RS='^$' in GNU awk) then you should expect $ to match just once at the and of that string (and ^ just once at the start of it), there's nothing special about newlines versus any other character in the string and no regexp metachar to match them.当您匹配包含多行的字符串时(例如,在 GNU grep 或 GNU sed 中使用-z或在 awk 中使用RS=""或在 GNU awk 中使用RS='^$' ),那么您应该期望$ 仅匹配一次该字符串的 the 和(和^仅在它的开头出现一次),换行符与字符串中的任何其他字符没有什么特别之处,也没有匹配它们的正则表达式元字符。

Regarding Unfortunately I do not have grep that can work with paragraphs - no-one does as, unlike awk, grep doesn't have a paragraph mode.关于Unfortunately I do not have grep that can work with paragraphs - 没有人这样做,不像 awk,grep 没有段落模式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM