[英]Word Frequency from a CSV Column in Python
我有一個 .csv 文件,其中包含我收集的一列消息,我希望獲得該列中每個單詞的詞頻列表。 這是我到目前為止所擁有的,我不確定我在哪里犯了錯誤,任何幫助將不勝感激。 編輯:預期的輸出是將整個單詞列表及其計數(無重復)寫入另一個 .csv 文件。
import csv
from collections import Counter
from collections import defaultdict
output_file = 'comments_word_freqency.csv'
input_stream = open('comments.csv')
reader = csv.reader(input_stream, delimiter=',')
reader.next() #skip header
csvrow = [row[3] for row in reader] #Get the fourth column only
with open(output_file, 'rb') as csvfile:
for row in reader:
freq_dict = defaultdict(int) # the "int" part
# means that the VALUES of the dictionary are integers.
for line in csvrow:
words = line.split(" ")
for word in words:
word = word.lower() # ignores case type
freq_dict[word] += 1
writer = csv.writer(open(output_file, "wb+")) # this is what lets you write the csv file.
for key, value in freq_dict.items():
# this iterates through your dictionary and writes each pair as its own line.
writer.writerow([key, value])
最近我運行了SAMO提出的代碼。 我在使用 Python3.6 時遇到了一些問題。 因此,我發布了一個工作代碼 [從 SAMO 的代碼中更改了幾行],這可能會幫助其他人並節省他們的時間。
import csv
from collections import Counter
from collections import defaultdict
words= []
with open('data.csv', 'rt') as csvfile:
reader = csv.reader(csvfile)
next(reader)
for col in reader:
csv_words = col[0].split(" ")
for i in csv_words:
words.append(i)
with open('frequency_result.csv', 'a+') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
for i in words:
x = words.count(i)
words_counted.append((i,x))
writer.writerow(words_counted)
你上傳的代碼到處都是,但我認為這就是你所得到的。 這將返回單詞列表及其在原始文件中出現的次數。
words= []
with open('comments_word_freqency.csv', 'rb') as csvfile:
reader = csv.reader(csvfile)
reader.next()
for row in reader:
csv_words = row[3].split(" ")
for i in csv_words:
words.append(i)
words_counted = []
for i in words:
x = words.count(i)
words_counted.append((i,x))
#write this to csv file
with open('output.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(edgl)
然后要擺脫列表中的重復項,只需在其上調用 set()
set(words_counted)
您的輸出將如下所示:
'this', 2
'is', 1
'your', 3
'output', 5
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.