简体   繁体   中英

Trying to remove common words from a file

I am trying to print out a list of the most common words in a file. However, I am also trying to ignore common words. I currently have this code written

import csv
import collections
from collections import Counter

with open('billboardtop1002015lyrics.txt',encoding='ISO-8859-1') as csv_file:
mostcommonword = []

counter = Counter(csv_file.read().strip().split())

commonwords = (counter.most_common(30))

ignore_words = ['i','you','me','the','that','on','is','when','if','in','dont','for','when']

 if commonwords not in ignore_words:
    mostcommonword.append(commonwords)
    print(mostcommonword)

This is not working and I am getting output with the words 'i','you', etc. I am very new to python and this is the first project I am working on.

Is there something I am missing or an easier way to approach this?

Thanks!

You should first eliminate the ignored words, then find the most common.

import csv
import collections
from collections import Counter

ignore_words = ['i', 'you', 'me', 'the', 'that', 'on', 'is', 'when', 'if', 'in', 'dont', 'for', 'when']

with open('billboardtop1002015lyrics.txt', encoding='ISO-8859-1') as csv_file:
    lyrics = csv_file.read().strip().split()
    lyrics_ignored = [t for t in lyrics if t not in ignore_words]
    counter = Counter(lyrics_ignored)
    mostcommonwords = (counter.most_common(30))
    print(mostcommonwords)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM