Linux Ubuntu Bash-使用AWK正则表达式查找包含两个以上元音的单词

Question

I want to print all the words containing more than 2 vowels from a file using awk. 我想使用awk从文件中打印所有包含两个以上元音的单词。

This is my code so far: 到目前为止，这是我的代码：

#!/bin/bash
cat $1 | awk '{   #Default file separator is space 
for (i=1;i<=NF;i++)  #for every word          
  {
  if ($i ~ /([aeiojy]){2,}/)            
    {
      print $i
    }
}}'

Regular expression is the problem 正则表达式是问题

/([aeiojy]){2,}/) this is my actual idea, but it doesnt work. /（[aeiojy]）{2，} /）这是我的实际想法，但是不起作用。

Answer 1

This should work with GNU grep : 这应该与GNU grep ：

grep -Poi '([^[:space:]]*?[aeiou]){3,}[^[:space:]]*' file

Options: 选项：

-P perl compatible regular expressions
-o output every match on a single line
-i case insensitive match

The regex: 正则表达式：

(                start of subpattern
  [^[:space:]]*  zero or more arbitrary non whitespace characters
  ?              ungreedy quantifier for the previous expression (perl specific)
  [aeiou]        vowel
)                end of subpattern
{3,}             the previous expression appears 3 or more times
[^[:space:]]*    zero or more other characters until word boundary.

Btw, perl compatible regular expressions are actually not required here. 顺便说一句，这里实际上不需要Perl兼容的正则表达式。 With plain grep you can use: 使用纯grep您可以使用：

grep -oi '\([^[:space:]aeiou]*[aeiou]\)\{3,\}[^[:space:]]*' file

Note: I've excluded punctuation in the above examples but it can be added if required. 注意：在上述示例中，我已经排除了标点符号，但是可以根据需要添加标点符号。

Answer 2

You can use split function in awk : 您可以在awk使用split函数：

awk -v RS=' ' 'split($0, a, /[aeiouAEIOU]/) > 2' file

-v RS=' ' will process each word separated by space as separate records. -v RS=' '将每个用空格分隔的单词作为单独的记录处理。
split will return value greater than 2 if there are at least 2 vowels in the word. 如果单词中至少有两个元音，则split将返回大于2的值。

Linux Ubuntu Bash-使用AWK正则表达式查找包含两个以上元音的单词

问题描述

2 个解决方案

解决方案1
2 2016-04-27 22:19:31

解决方案2
0 2016-04-27 22:00:02

Linux Ubuntu Bash-使用AWK正则表达式查找包含两个以上元音的单词

问题描述

2 个解决方案

解决方案1 2 2016-04-27 22:19:31

解决方案2 0 2016-04-27 22:00:02

解决方案1
2 2016-04-27 22:19:31

解决方案2
0 2016-04-27 22:00:02