I am trying to print out a list of the most common words in a file. However, I am also trying to ignore common words. I currently have this code written
import csv
import collections
from collections import Counter
with open('billboardtop1002015lyrics.txt',encoding='ISO-8859-1') as csv_file:
mostcommonword = []
counter = Counter(csv_file.read().strip().split())
commonwords = (counter.most_common(30))
ignore_words = ['i','you','me','the','that','on','is','when','if','in','dont','for','when']
if commonwords not in ignore_words:
mostcommonword.append(commonwords)
print(mostcommonword)
This is not working and I am getting output with the words 'i','you', etc. I am very new to python and this is the first project I am working on.
Is there something I am missing or an easier way to approach this?
Thanks!
You should first eliminate the ignored words, then find the most common.
import csv
import collections
from collections import Counter
ignore_words = ['i', 'you', 'me', 'the', 'that', 'on', 'is', 'when', 'if', 'in', 'dont', 'for', 'when']
with open('billboardtop1002015lyrics.txt', encoding='ISO-8859-1') as csv_file:
lyrics = csv_file.read().strip().split()
lyrics_ignored = [t for t in lyrics if t not in ignore_words]
counter = Counter(lyrics_ignored)
mostcommonwords = (counter.most_common(30))
print(mostcommonwords)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.