简体   繁体   中英

Word Frequency from a CSV Column in Python

I have a .csv file with a column of messages I have collected, I wish to get a word frequency list of every word in that column. Here is what I have so far and I am not sure where I have made a mistake, any help would be appreciated. Edit: The expected output is to write the entire list of words and their count (without duplicates) out to another .csv file.

import csv
from collections import Counter
from collections import defaultdict

output_file = 'comments_word_freqency.csv'
input_stream = open('comments.csv')
reader = csv.reader(input_stream, delimiter=',')
reader.next() #skip header
csvrow = [row[3] for row in reader] #Get the fourth column only

with open(output_file, 'rb') as csvfile:
    for row in reader:
        freq_dict = defaultdict(int) # the "int" part
                                    # means that the VALUES of the dictionary are integers.
        for line in csvrow:
            words = line.split(" ")
            for word in words:
                word = word.lower() # ignores case type
                freq_dict[word] += 1

        writer = csv.writer(open(output_file, "wb+")) # this is what lets you write the csv file.
        for key, value in freq_dict.items():
                        # this iterates through your dictionary and writes each pair as its own line.
            writer.writerow([key, value])

Recently I run the code proposed by SAMO. I was facing some issues with Python3.6. Hence, I am posting a working code [changed few lines from SAMO's code], which may help others and save their times.

import csv
from collections import Counter
from collections import defaultdict
words= []
with open('data.csv', 'rt') as csvfile:
    reader = csv.reader(csvfile)
    next(reader)
    for col in reader:
         csv_words = col[0].split(" ")
         for i in csv_words:
              words.append(i) 
with open('frequency_result.csv',  'a+') as csvfile:
    writer = csv.writer(csvfile, delimiter=',')
    for i in words:
        x = words.count(i)
        words_counted.append((i,x))    
    writer.writerow(words_counted)

The code you uploaded is all over the place, but I think this is what you're getting at. This returns a list of the word and the number of times it appeared in the original file.

words= []
with open('comments_word_freqency.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile)
    reader.next()
    for row in reader:
         csv_words = row[3].split(" ")
         for i in csv_words:
              words.append(i)

words_counted = []
for i in words:
    x = words.count(i)
    words_counted.append((i,x))

#write this to csv file
with open('output.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(edgl)

Then to get rid of the duplicates in the list just call set() on it

set(words_counted)

Your output will look like this:

'this', 2
'is', 1
'your', 3
'output', 5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM