简体   繁体   中英

View the 7 most common words found in the text, but sorting out the words that are common words

really would need some help to solve this or if someone could point me in the right way, thanks!

View the 7 most common words found in the text, but sorting out the words that are common words. You can find a list of common words in common-words.txt.

common-words.txt = lots of different words.

first i have found the 7 most common words in the text, this is how my code looks like.

    print("The 7 most frequently used words is:")
    print("\n")

    import re
    from collections import Counter

    with open("alice-ch1.txt") as f:
        passage = f.read()

    words = re.findall(r'\w+', passage)

    cap_words = [word.upper() for word in words]

    word_counts = Counter(cap_words).most_common(7)

    print(word_counts)

this works and i get the output:

[('THE', 93), ('SHE', 80), ('TO', 75), ('IT', 67), ('AND', 65), ('WAS', 53), ('A', 52)]

now i want to compare theese two text files, if any of the word in my TEXTFILE.TXT is in COMMON-WORDS.TXT i want it removed from the answer.

i have tried to run it with this code:

    dic_no_cw = dict(word_counts)
    with open("common-words.txt", 'r') as cw:
        commonwords = list(cw.read().split())
        for key, value in list(dic_no_cw.items()):
            for line in commonwords:
                if key == line:
                    del dic_no_cw[key]

    dict_copy = dict(dic_no_cw)

    dic_no_cw7 = Counter(dic_no_cw).most_common(7)
    sorted(dic_no_cw7)

    print(dic_no_cw7)

and i get the same output:

[('THE', 93), ('SHE', 80), ('TO', 75), ('IT', 67), ('AND', 65), ('WAS', 53), ('A', 52)]

could really use som help to solve this or some help so i maybe can figure it out by myself.

thanks,

Can you try with replacing these lines of your code:

for line in commonwords:
    if key == line:
        del dic_no_cw[key]

with

for line in commonwords:
    if key.strip() == line.upper().strip():
        del dic_no_cw[key]
        break

I haven't checked it, but I think it may be that you're simply checking the value in the dict (which represents the number of times the word appears) instead of checking the key (the actual word itself) when comparing against the words in the commonwords list:

I believe if value == line: should read if key == line: .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM