How to find 10 most frequent words in the file in Unix/Linux

Question

How to find 10 most frequent words in the file in Unix/Linux?

I tried using this command in Unix:

$ sort file.txt | uniq -c | sort -nr | head -10

However I am not sure if it's correct and whether it is showing me 10 most frequent words in the large file.

Answer 1

I have a shell demo to deal with your problem ,even you have a file with more than one Word in one line

wordcount.sh

#!/bin/bash

# filename: wordcount.sh
# usage: word count

# handle position arguments
if [ $# -ne 1 ]
then
    echo "Usage: $0 filename"
    exit -1
fi

# realize word count
printf "%-14s%s\n" "Word" "Count"

cat $1 | tr 'A-Z' 'a-z' | \
egrep -o "\b[[:alpha:]]+\b" | \
awk '{ count[$0]++ }
END{
for(ind in count)
{ printf("%-14s%d\n",ind,count[ind]); }
}' | sort -k2 -n -r | head -n 10

just run ./wordcount.sh filename.txt

explain
Use the tr command to convert all uppercase letters to lowercase letters, then use the egrep command to grab all the words in the text and output them item by item. Finally, use the awk command and the associative array to implement the word count function, and decrement the output according to the number of occurrences. .

How to find 10 most frequent words in the file in Unix/Linux

Question

1 answers

solution1
0 2018-11-19 15:53:35

How to find 10 most frequent words in the file in Unix/Linux

Question

1 answers

solution1 0 2018-11-19 15:53:35

solution1
0 2018-11-19 15:53:35