簡體   English   中英

使用 NLTK 計算語料庫中單詞列表的頻率

[英]Count frequency of list of words in corpus using NLTK

我已經下載了一個語料庫並標記了這些詞。 我有一個主要角色的列表,我想知道每個名字在語料庫中出現了多少次。 我曾嘗試使用頻率 function 和字典,但我不知道如何獲取名稱計數..

character_list = ['Myriel','Bishop','Baptistine','Magloire','Cravatte','Valjean','Gervais','Fantine','Tholomyès'
                  ,'Blachevelle','Dahlia','Fameuil','Favourite','Listolier','Zéphine','Cosette','Thénardier',
                  'Éponine','Azelma','Javert','Fauchelevent','Bamatabois','Champmathieu',
                  'Brevet','Simplice','Chenildieu','Cochepaille','Innocente','Reverend','Ascension','Crucifixion',
                  'Gavroche','Magnon',
                  'Gillenormand','Marius','Colonel','Mabeuf','Enjolras','Combeferre','Prouvaire',
                 'Feuilly','Courfeyrac','Bahorel','Lesgle','Joly','Grantaire','Patron-Minette','Brujon',
                 'Toussaint'] 


fdist_mis = FreqDist(word_tokens)

filtered_word_freqt = dict((character_list, freq) for character_list, freq in fdist_mis.items())

當我探索filtered_word_freqt 時,它只返回所有單詞標記,而不是唯一字符及其出現的字典。 有什么幫助嗎? 非常感謝。

您希望如何查看頻率? 您可以獲得每個單詞被看到的 # 次計數,或者在總文本甚至是精美格式的表格中的頻率比率。 這里復制的相關功能:

N()[source]
Return the total number of sample outcomes that have been recorded by this FreqDist. For the number of unique sample values (or bins) with counts greater than zero, use FreqDist.B().
Return type:    int

freq(sample)[source]
Return the frequency of a given sample. The frequency of a sample is defined as the count of that sample divided by the total number of sample outcomes that have been recorded by this FreqDist. The count of a sample is defined as the number of times that sample outcome was recorded by this FreqDist. Frequencies are always real numbers in the range [0, 1].

tabulate(*args, **kwargs)[source]
Tabulate the given samples from the frequency distribution (cumulative), displaying the most frequent sample first. If an integer parameter is supplied, stop after this many samples have been plotted.
Parameters: samples (list) – The samples to plot (default is all samples)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM