如何使用NLTK在CSV文件中查找特定单词的频率分布

Question

I am just starting in python and nltk and trying to read records from a csv file and determine the frequency of specific words across all records. 我只是从python和nltk开始，尝试从csv文件读取记录并确定所有记录中特定单词的出现频率。 I can do something like this: 我可以做这样的事情：

with f:
    reader = csv.reader(f)

    # Skip the header
    next(reader)

    for row in reader:
        note = row[4]
        tokens = [t for t in note.split()] 

        # Calculate row frequency distribution
        freq = nltk.FreqDist(tokens) 
        for key,val in freq.items(): 
            print (str(key) + ':' + str(val))

        # Plot the results
        freq.plot(20, cumulative=False)

I am not sure how to modify this so that the frequency is across all records and that only the words that I am interested in are included. 我不确定如何修改它，以便所有记录的频率都在，并且仅包括我感兴趣的单词。 Apologies if this is a really simple question. 抱歉，这是一个非常简单的问题。

Answer 1

You can define the counter outside the loop freq_all = nltk.FreqDist() , then update it on each row freq_all.update(tokens) 您可以在循环外定义计数器freq_all = nltk.FreqDist() ，然后在每一行上更新它freq_all.update(tokens)

with f:
    reader = csv.reader(f)

    # Skip the header
    next(reader)
    freq_all = nltk.FreqDist()

    for row in reader:
        note = row[4]
        tokens = [t for t in note.split()] 

        # Calculate raw frequency distribution
        freq = nltk.FreqDist(tokens) 
        freq_all.update(tokens)
        for key,val in freq.items(): 
            print (str(key) + ':' + str(val))

        # Plot the results
        freq.plot(20, cumulative=False)

    # Plot the overall results
    freq_all.plot(20, cumulative=False)

如何使用NLTK在CSV文件中查找特定单词的频率分布

问题描述

1 个解决方案

解决方案1
0 2019-08-03 22:05:32

如何使用NLTK在CSV文件中查找特定单词的频率分布

问题描述

1 个解决方案

解决方案1 0 2019-08-03 22:05:32

解决方案1
0 2019-08-03 22:05:32