[英]Linux Ubuntu Bash - Find words containing more than 2 vowels using AWK regular expressions
I want to print all the words containing more than 2 vowels from a file using awk. 我想使用awk从文件中打印所有包含两个以上元音的单词。
This is my code so far: 到目前为止,这是我的代码:
#!/bin/bash
cat $1 | awk '{ #Default file separator is space
for (i=1;i<=NF;i++) #for every word
{
if ($i ~ /([aeiojy]){2,}/)
{
print $i
}
}}'
Regular expression is the problem 正则表达式是问题
/([aeiojy]){2,}/) this is my actual idea, but it doesnt work. /([aeiojy]){2,} /)这是我的实际想法,但是不起作用。
This should work with GNU grep
: 这应该与GNU
grep
:
grep -Poi '([^[:space:]]*?[aeiou]){3,}[^[:space:]]*' file
Options: 选项:
-P perl compatible regular expressions
-o output every match on a single line
-i case insensitive match
The regex: 正则表达式:
( start of subpattern
[^[:space:]]* zero or more arbitrary non whitespace characters
? ungreedy quantifier for the previous expression (perl specific)
[aeiou] vowel
) end of subpattern
{3,} the previous expression appears 3 or more times
[^[:space:]]* zero or more other characters until word boundary.
Btw, perl compatible regular expressions are actually not required here. 顺便说一句,这里实际上不需要Perl兼容的正则表达式。 With plain
grep
you can use: 使用纯
grep
您可以使用:
grep -oi '\([^[:space:]aeiou]*[aeiou]\)\{3,\}[^[:space:]]*' file
Note: I've excluded punctuation in the above examples but it can be added if required. 注意:在上述示例中,我已经排除了标点符号,但是可以根据需要添加标点符号。
You can use split
function in awk
: 您可以在
awk
使用split
函数:
awk -v RS=' ' 'split($0, a, /[aeiouAEIOU]/) > 2' file
-v RS=' '
will process each word separated by space as separate records. -v RS=' '
将每个用空格分隔的单词作为单独的记录处理。 split
will return value greater than 2 if there are at least 2 vowels in the word. split
将返回大于2的值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.