简体   繁体   English

使用RS和RT匹配Awk命令中的多个字符串

[英]Match Multiple Strings In Awk Command Using RS And RT

I have the following data: 我有以下数据:

Example line 0</span>
<tag>Example line 1</tag>
<span>Example line 1.5</span>
--Hello Example line 1.7
<tag>
Example line 2
</tag>
--Hello Example line 2.7
<span>Example line 4</span>

Using this command awk -v RS='</tag>' 'RT {gsub(/.*?<tag>|\\n/, ""); print "<tag>" $0 RT}' 使用此命令awk -v RS='</tag>' 'RT {gsub(/.*?<tag>|\\n/, ""); print "<tag>" $0 RT}' awk -v RS='</tag>' 'RT {gsub(/.*?<tag>|\\n/, ""); print "<tag>" $0 RT}' I get: awk -v RS='</tag>' 'RT {gsub(/.*?<tag>|\\n/, ""); print "<tag>" $0 RT}'我得到:

<tag>Example line 1</tag>
<tag>Example line 2</tag>

However, I want the output to be: 但是,我希望输出为:

<tag>Example line 1</tag>
--Hello Example line 1.7
<tag>Example line 2</tag>
--Hello Example line 2.7

Question: 题:

I would just like to know how to add the "or" option to also match any line that begins with --Hello . 我只想知道如何添加“或”选项以匹配以--Hello开头的任何行。 What would be the proper way to implement in my code? 在我的代码中实现的正确方法是什么?

Other options: 其他选项:

Or, another option would be to use grep -o '<tag.*tag>\\|^--.*' but I would need to also find a way to match newlines (as asked here: Match Anything In Between Strings For Linux Grep Command ). 或者,另一种选择是使用grep -o '<tag.*tag>\\|^--.*'但我还需要找到一种匹配换行符的方法(如此处要求: Linux字符串之间的任何匹配) Grep命令 )。

Any help is highly appreciated. 非常感谢您的帮助。

You can modify your earlier awk command to this: 您可以将之前的awk命令修改为:

awk -v RS='</tag>' '/\n--Hello /{print gensub(/.*\n(--Hello [^\n]*).*/, "\\1", "1")}
       RT{gsub(/.*<tag>|\n/, ""); print "<tag>" $0 RT}' file

<tag>Example line 1</tag>
--Hello Example line 1.7
<tag>Example line 2</tag>
--Hello Example line 2.7
$ cat tst.awk
BEGIN { RS="--Hello[^\\n]+|<\\/tag>" }
RT { print (RT~/^--/ ? "" : gensub(/.*(<tag>)/,"\\1",1)) RT }

$ awk -f tst.awk file
<tag>Example line 1</tag>
--Hello Example line 1.7
<tag>
Example line 2
</tag>
--Hello Example line 2.7

The above uses GNU awk for multi-char RS, RT, and gensub(). 上面的代码将GNU awk用于多字符RS,RT和gensub()。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM