简体   繁体   English

如何使用 grep 命令查找不包含字母“e”的 8 个字母单词的数量?

[英]How can I find the number of 8 letter words that do not contain the letter "e", using the grep command?

I want to find the number of 8 letter words that do not contain the letter "e" in a number of text files (*.txt).我想在多个文本文件 (*.txt) 中找到不包含字母“e”的 8 个字母单词的数量。 In the process I ran into two issues: my lack of understanding in quantifiers and how to exclude characters.在这个过程中,我遇到了两个问题:我对量词缺乏理解以及如何排除字符。

I'm quite new to the Unix terminal, but this is what I have tried:我对 Unix 终端很陌生,但这是我尝试过的:

cat *.txt | grep -Eo "\w+" | grep -i ".*[^e].*"

I need to include the cat command because it otherwise includes the names of the text files in the pipe.我需要包含 cat 命令,否则它会包含管道中文本文件的名称。 The second pipe is to have all the words in a list, and it works, but the last pipe was meant to find all the words that do not have the letter "e" in them, but doesn't seem to work.第二个管道是将所有单词放在一个列表中,它可以工作,但最后一个管道是为了找到所有没有字母“e”的单词,但似乎不起作用。 (I thought ". " for no or any number of any character, followed by a character that is not an "e", and followed by another ". " for no or any number of any character.) (我认为“. ”表示没有或任意数量的任何字符,后跟一个不是“e”的字符,然后是另一个“. ”表示没有或任意数量的任何字符。)

cat *.txt | grep -Eo "\w+" | grep -wi "[a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z]"

This command works to find the words that contain 8 characters, but it is quite ineffective, because I have to repeat "[az]" 8 times.这个命令可以找到包含 8 个字符的单词,但它非常无效,因为我必须重复 "[az]" 8 次。 I thought it could also be "[az]{8}", but that doesn't seem to work.我认为它也可能是“[az]{8}”,但这似乎不起作用。

cat *.txt | grep -Eo "\w+" | grep -wi "[a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z]" | grep -i ".*[^e].*"

So finally, this would be my best guess, however, the third pipe is ineffective and the last pipe doesn't work.所以最后,这将是我最好的猜测,但是,第三个管道无效,最后一个管道不起作用。

You may use this grep :您可以使用此grep

grep -hEiwo '[a-df-z]{8}' *.txt

Here:这里:

  • [a-df-z]{8} : Matches all letters except e [a-df-z]{8} : 匹配除e之外的所有字母
  • -h : Don't print filename in output -h : 不要在输出中打印文件名
  • -i : Ignore case search -i : 忽略大小写搜索
  • -o : Print matches only -o :打印匹配
  • -w : Match complete words -w : 匹配完整的单词

In case you are ok with GNU awk and assuming that you want to print only the exact words and could be multiple matches in a line if this is the case one could try following.如果您对 GNU awk并假设您只想打印确切的单词并且可能在一行中有多个匹配项,如果是这种情况,可以尝试以下操作。

awk -v IGNORECASE="1" '{for(i=1;i<=NF;i++){if($i~/^[a-df-z]{8}$/){print $i}}}' *.txt

OR without the use of IGNORCASE one could try:或者不使用IGNORCASE可以尝试:

awk '{for(i=1;i<=NF;i++){if(tolower($i)~/^[a-df-z]{8}$/){print $i}}}' *.txt

NOTE: Considering that you want exact matches of 8 letters only in lines.注意:考虑到您只想在行中精确匹配 8 个字母。 8 letter words followed by a punctuation mark will be excluded. 8 个字母的单词后跟标点符号将被排除在外。

Here is a crazy thought with GNU awk:这是 GNU awk 的一个疯狂想法:

awk 'BEGIN{FPAT="\\<\\w{8}\\>"}{c+=NF}END{print c}' file

Or if you want to make it work only on a select set of characters:或者,如果您只想使其仅适用于一组选定的字符:

awk 'BEGIN{FPAT="\\<[a-df-z]{8}\\>"}{c+=NF}END{print c}' file

What this does is, it defines the fields, to be a set of 8 characters ( \\w as a word-constituent or [a-df-z] as a selected set) which is enclosed by word-boundaries ( \\< and \\> ).它的作用是,它将字段定义为一组 8 个字符( \\w作为单词组成部分或[a-df-z]作为选定集),由单词边界( \\<\\> )。 This is done with FPAT (note the Gory details about escaping ).这是通过FPAT完成的(注意有关 escaping 的FPAT 细节)。

Sometimes you might also have words which contain diatrics, so you have to expand.有时您可能还有包含 diatrics 的单词,因此您必须扩展。 Then this might be the best solution:那么这可能是最好的解决方案:

awk 'BEGIN{FPAT="\\<\\w{8}\\>"}{for(i=1;i<=NF;++i) if($i !~ /e/) c++}END{print c}' file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM