[英]How to convert dictionary values into a csv file?
我是Python的絕對初學者。 我正在對希臘戲劇進行文本分析,並計算每個單詞的單詞頻率。 因為播放時間很長,所以我看不到完整的數據集,因為Python窗口中沒有足夠的空間,所以只能顯示頻率最低的單詞。 我正在考慮將其轉換為.csv文件。 我的完整代碼如下:
#read the file as one string and spit the string into a list of separate words
input = open('Aeschylus.txt', 'r')
text = input.read()
wordlist = text.split()
#read file containing stopwords and split the string into a list of separate words
stopwords = open("stopwords .txt", 'r').read().split()
#remove stopwords
wordsFiltered = []
for w in wordlist:
if w not in stopwords:
wordsFiltered.append(w)
#create dictionary by counting no of occurences of each word in list
wordfreq = [wordsFiltered.count(x) for x in wordsFiltered]
#create word-frequency pairs and create a dictionary
dictionary = dict(zip(wordsFiltered,wordfreq))
#sort by decreasing frequency and print
aux = [(dictionary[word], word) for word in dictionary]
aux.sort()
aux.reverse()
for y in aux: print y
import csv
with open('Aeschylus.csv', 'w') as csvfile:
fieldnames = ['dictionary[word]', 'word']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerow({'dictionary[word]': '1', 'word': 'inherited'})
writer.writerow({'dictionary[word]': '1', 'word': 'inheritance'})
writer.writerow({'dictionary[word]': '1', 'word': 'inherit'})
我在互聯網上找到了csv的代碼。 我希望得到的是從最高頻率到最低頻率的完整數據列表。 使用我現在擁有的這段代碼,python似乎完全忽略了csv部分,而只是打印數據,就好像我沒有為csv編寫代碼一樣。
我應該編寫什么代碼才能看到預期的結果?
謝謝。
由於您有一本字典,其中單詞是鍵,而它們的頻率是值,因此DictWriter
不適合。 這對於共享某些公共鍵集(用作csv的列)的映射序列很有用。 例如,如果您有一個字典列表,例如您手動創建的:
a_list = [{'dictionary[word]': '1', 'word': 'inherited'},
{'dictionary[word]': '1', 'word': 'inheritance'},
{'dictionary[word]': '1', 'word': 'inherit'}]
那么DictWriter
將是完成這項工作的工具。 但是相反,您只有一個dictionary
例如:
dictionary = {'inherited': 1,
'inheritance': 1,
'inherit': 1,
...: ...}
但是,您已經建立了一個(freq, word)
對的排序列表作為aux
,非常適合寫入csv:
with open('Aeschylus.csv', 'wb') as csvfile:
header = ['frequency', 'word']
writer = csv.writer(csvfile)
writer.writerow(header)
# Note the plural method name
writer.writerows(aux)
python似乎完全忽略了csv部分,只是打印數據,就好像我沒有為csv編寫代碼一樣。
聽起來很奇怪。 至少您應該擁有一個包含以下內容的文件Aeschylus.csv :
dictionary[word],word
1,inherited
1,inheritance
1,inherit
您的頻率計數方法也可以得到改善。 在這一刻
#create dictionary by counting no of occurences of each word in list
wordfreq = [wordsFiltered.count(x) for x in wordsFiltered]
必須遍歷wordsFiltered
中每個單詞的列表wordsFiltered
,所以O(n²) 。 相反,您可以遍歷文件中的單詞,進行過濾和計數。 Python有一個專門的字典,用於計數可哈希對象,稱為Counter
:
from __future__ import print_function
from collections import Counter
import csv
# Many ways to go about this, could for example yield from (<gen expr>)
def words(filelike):
for line in filelike:
for word in line.split():
yield word
def remove(iterable, stopwords):
stopwords = set(stopwords) # O(1) lookups instead of O(n)
for word in iterable:
if word not in stopwords:
yield word
if __name__ == '__main__':
with open("stopwords.txt") as f:
stopwords = f.read().split()
with open('Aeschylus.txt') as wordfile:
wordfreq = Counter(remove(words(wordfile), stopwords))
然后,像以前一樣,從最常見的位置開始打印單詞及其頻率:
for word, freq in wordfreq.most_common():
print(word, freq)
和/或寫為csv:
# Since you're using python 2, 'wb' and no newline=''
with open('Aeschylus.csv', 'wb') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['word', 'freq'])
# If you want to keep most common order in CSV as well. Otherwise
# wordfreq.items() would do as well.
writer.writerows(wordfreq.most_common())
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.