如何在 bash 中查找和打印所有 AWK 匹配项？

Question

我在变量中存储了很多文本。

text="This is sentence! this is not sentence! This is sentence. this is not sencence."

我正在通过这个命令寻找句子：

echo $text | awk 'match($0,/([A-Z])([^!?.]*)([!?.])/) { print substr($0,RSTART,RLENGTH) }'

我的输出是：

This is sentence!

预期输出：

This is sentence!
This is sentence.

更多示例：文本中有语法正确和错误的句子。 正确的句子由开头的大写字母和结尾字符 (.?!) 标识。 我只想打印正确的句子。

text="incorrect sentence! this is not sentence! This is sentence. this is not sencence. This is correct sentence."

预期输出：

This is sentence.
This is correct sentence.

我能够找到第一个匹配项，但不是全部。 感谢您的帮助：）

Answer 1

您可以将 GNU awk 用于多字符 RS：

$ echo "$text" | awk -v RS='[A-Z][^!?.]*[!?.]' 'RT{print RT}'
This is sentence!
This is sentence.

或用于 FPAT 的 GNU awk：

$ echo "$text" | awk -v FPAT='[A-Z][^!?.]*[!?.]' '{for (i=1; i<=NF; i++) print $i}'
This is sentence!
This is sentence.

或 GNU grep for -o ：

$ echo "$text" | grep -o '[A-Z][^!?.]*[!?.]'
This is sentence!
This is sentence.

如果句子可以包含换行符，则只有上述第一个才有效。

Answer 2

你需要一个while()和match() ：

$ echo $text | awk '
{
    while(match($0,/([A-Z])([^!?.]*)([!?.])/)) {   # while there are matches
        print substr($0,RSTART,RLENGTH)            # output them
        $0=substr($0,RSTART+RLENGTH)               # and move forward
    }
}'

输出：

This is sentence!
This is sentence.

如何在 bash 中查找和打印所有 AWK 匹配项？

问题描述

2 个解决方案

解决方案1
5 2020-10-09 13:10:44

解决方案2
3 已采纳 2020-10-09 10:02:00

如何在 bash 中查找和打印所有 AWK 匹配项？

问题描述

2 个解决方案

解决方案1 5 2020-10-09 13:10:44

解决方案2 3 已采纳 2020-10-09 10:02:00

解决方案1
5 2020-10-09 13:10:44

解决方案2
3 已采纳 2020-10-09 10:02:00