[英]Extracting the most common words and then append to a csv file with python
因此,我嘗試從.txt文件中提取最常用的單詞,然后將4個最常用的單詞放入csv文件中。 (然后在需要時附加),此刻它正在提取最常用的單詞,並附加到一個csv文件中。 但這是將每個字母附加到一個單元格上。
蟒蛇
import collections
import pandas as pd
import matplotlib.pyplot as plt
import csv
fields=['first','second','third']
# Read input file, note the encoding is specified here
# It may be different in your text file
file = open('pyTest.txt', encoding="utf8")
a= file.read()
# Stopwords
stopwords = set(line.strip() for line in open('stopwords.txt'))
stopwords = stopwords.union(set(['mr','mrs','one','two','said']))
# Instantiate a dictionary, and for every word in the file,
# Add to the dictionary if it doesn't exist. If it does, increase the count.
wordcount = {}
# To eliminate duplicates, remember to split by punctuation, and use case demiliters.
for word in a.lower().split():
word = word.replace(".","")
word = word.replace(",","")
word = word.replace(":","")
word = word.replace("\"","")
word = word.replace("!","")
word = word.replace("“","")
word = word.replace("‘","")
word = word.replace("*","")
if word not in stopwords:
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
# Print most common word
n_print = int(input("How many most common words to print: "))
print("\nOK. The {} most common words are as follows\n".format(n_print))
word_counter = collections.Counter(wordcount)
for word in word_counter.most_common(n_print):
print(word[0])
# Close the file
file.close()
with open('old.csv', 'a') as out_file:
writer = csv.writer(out_file)
for word in word_counter.most_common(4):
print(word)
writer.writerow(word[0])
輸出CSV文件
p,i,p,e
d,i,a,m,e,t,e,r
f,i,t,t,i,n,g,s
o,u,t,s,i,d,e
您可以使用生成器表達式來代替由most_common
方法返回的列表中的每個子列表的第一行:
with open('old.csv', 'a') as out_file:
writer = csv.writer(out_file)
writer.writerow(word for word, _ in word_counter.most_common(4))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.