简体   繁体   中英

Find the most frequent word pair from a list of messages in python

i have a list of 100 messages. And i am able to find the most frequent words used in the list of messages. But i want to find the pair of words which occur most frequently. For example, key and board are being shown as most frequent words. But i need to find the number of occurrences where 'key board' are used as a pair in NLTK. Here abstracts are the list of sentences and abstract words are list of words.

abstracts = [preprocessing(document) for document in abstracts]

abstract_words = " ".join(abstracts)
abstract_words = abstract_words.split()

def plot_word_frequency(words, top_n=10):
    word_freq = FreqDist(words)
    labels = [element[0] for element in word_freq.most_common(top_n)]
    counts = [element[1] for element in word_freq.most_common(top_n)]
    plot = sns.barplot(labels, counts)
    return plot

plot_word_frequency(abstract_words, 10)

Here i am able to plot the individual top 10 words. But need to plot combination of words which are most frequent.

N-grams, see n-grams in python, four, five, six grams? , eg

>>> from collections import Counter
>>> from nltk import ngrams
>>> tokens = "this is a sentence with some of this words this is meh ".split()
>>> Counter(list(ngrams(tokens, 2)))
Counter({('this', 'is'): 2, ('is', 'a'): 1, ('a', 'sentence'): 1, ('sentence', 'with'): 1, ('with', 'some'): 1, ('some', 'of'): 1, ('of', 'this'): 1, ('this', 'words'): 1, ('words', 'this'): 1, ('is', 'meh'): 1})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM