I want to save word frequency lists as .CSV for several corpora. Is there a way to make Python write the filenames automatically based on the variable name? (eg: corpus_a > corpus_a_typefrequency.csv)
I have the following code, which already works for individual corpora:
from collections import Counter
import csv
counts = Counter(corpus_a)
counts = dict(sorted(counts.items(), key=lambda item: item[1],reverse=True))
with open('corpus_a_typefrequency.csv', 'w') as csv_file:
writer = csv.writer(csv_file)
for key, value in counts.items():
writer.writerow([key, value])
PS: it would be great if I could count only words (no punctuation) and also in a case-insensitive way. I haven't figured out how to do that here yet. I'm using data from the Brown Corpus as following:
import nltk
from nltk.corpus import brown
corpus_a = brown.words()
I tried brown.words().lower().isalpha()
, but that doesn't work.
You should have a look at this answer: https://stackoverflow.com/a/40536047/5289234 . It will allow you to extract the variable name and use it to save the csv.
import inspect
def retrieve_name(var):
"""
Gets the name of var. Does it from the out most frame inner-wards.
:param var: variable to get name from.
:return: string
"""
for fi in reversed(inspect.stack()):
names = [var_name for var_name, var_val in fi.frame.f_locals.items() if var_val is var]
if len(names) > 0:
return names[0]
from collections import Counter
import csv
counts = Counter(corpus_a)
counts = dict(sorted(counts.items(), key=lambda item: item[1],reverse=True))
with open(retrieve_name(corpus_a) +'_typefrequency.csv', 'w') as csv_file:
writer = csv.writer(csv_file)
for key, value in counts.items():
writer.writerow([key, value])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.