简体   繁体   中英

How to convert dictionary values into a csv file?

I am an absolute beginner in Python. I am doing a textual analysis of greek plays and counting the word frequencies of each word. Because the plays are very long, I am unable to see my full set of data, it only shows the words with the lowest frequencies because there is not enough space in the Python window. I am thinking of converting it to a .csv file. My full code is below:

#read the file as one string and spit the string into a list of separate words
input = open('Aeschylus.txt', 'r')
text = input.read()
wordlist = text.split()

#read file containing stopwords and split the string into a list of separate words
stopwords = open("stopwords .txt", 'r').read().split()

#remove stopwords
wordsFiltered = []

for w in wordlist:
    if w not in stopwords:
        wordsFiltered.append(w)

#create dictionary by counting no of occurences of each word in list
wordfreq = [wordsFiltered.count(x) for x in wordsFiltered]

#create word-frequency pairs and create a dictionary 
dictionary = dict(zip(wordsFiltered,wordfreq))

#sort by decreasing frequency and print
aux = [(dictionary[word], word) for word in dictionary]
aux.sort()
aux.reverse()
for y in aux: print y

import csv


with open('Aeschylus.csv', 'w') as csvfile:
    fieldnames = ['dictionary[word]', 'word']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)


    writer.writeheader()
    writer.writerow({'dictionary[word]': '1', 'word': 'inherited'})
    writer.writerow({'dictionary[word]': '1', 'word': 'inheritance'})
    writer.writerow({'dictionary[word]': '1', 'word': 'inherit'})

I found the code for the csv on the internet. What I'm hoping to get is the full list of data from the highest to lowest frequency. Using this code I have right now, python seems to be totally ignoring the csv part and just printing the data as if I didn't code for the csv.

Any idea on what I should code to see my intended result?

Thank you.

Since you have a dictionary where the words are keys and their frequencies the values, a DictWriter is ill suited. It is good for sequences of mappings that share some common set of keys, used as the columns of the csv. For example if you had had a list of dicts such as you manually create:

a_list = [{'dictionary[word]': '1', 'word': 'inherited'},
          {'dictionary[word]': '1', 'word': 'inheritance'},
          {'dictionary[word]': '1', 'word': 'inherit'}]

then a DictWriter would be the tool for the job. But instead you have a single dictionary like:

dictionary = {'inherited': 1,
              'inheritance': 1,
              'inherit': 1,
              ...: ...}

But, you've already built a sorted list of (freq, word) pairs as aux , which is perfect for writing to csv:

with open('Aeschylus.csv', 'wb') as csvfile:
    header = ['frequency', 'word']
    writer = csv.writer(csvfile)
    writer.writerow(header)
    # Note the plural method name
    writer.writerows(aux)

python seems to be totally ignoring the csv part and just printing the data as if I didn't code for the csv.

sounds rather odd. At least you should've gotten a file Aeschylus.csv containing:

dictionary[word],word
1,inherited
1,inheritance
1,inherit

Your frequency counting method could also be improved. At the moment

#create dictionary by counting no of occurences of each word in list
wordfreq = [wordsFiltered.count(x) for x in wordsFiltered]

has to loop through the list wordsFiltered for each word in wordsFiltered , so O(n²) . You could instead iterate through the words in the file, filter, and count as you go. Python has a specialized dictionary for counting hashable objects called Counter :

from __future__ import print_function
from collections import Counter
import csv

# Many ways to go about this, could for example yield from (<gen expr>)
def words(filelike):
    for line in filelike:
        for word in line.split():
            yield word

def remove(iterable, stopwords):
    stopwords = set(stopwords)  # O(1) lookups instead of O(n)
    for word in iterable:
        if word not in stopwords:
            yield word

if __name__ == '__main__':
    with open("stopwords.txt") as f:
        stopwords = f.read().split()

    with open('Aeschylus.txt') as wordfile:
        wordfreq = Counter(remove(words(wordfile), stopwords))

Then, as before, print the words and their frequencies, beginning from most common:

    for word, freq in wordfreq.most_common():
        print(word, freq)

And/or write as csv:

    # Since you're using python 2, 'wb' and no newline=''
    with open('Aeschylus.csv', 'wb') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(['word', 'freq'])
        # If you want to keep most common order in CSV as well. Otherwise
        # wordfreq.items() would do as well.
        writer.writerows(wordfreq.most_common())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM