简体   繁体   中英

How to print words that are not in the list

I have 2 files, the first one is a list of tweets. And the second one is a list of standard words which looks like this:

acoustics
acquaint
acquaintable
tbc....

I want to iterate through the list of tweets and print the words that are not found in the standard words list.

This is what I tried:

dk = open('wordslist.txt','r')
dlist = []
for x in dk.readlines():
    dlist.append(x.replace('\n',''))

dlist
length = len(tokenized_tweets)
for i in range(length):
    print(tokenized_tweets[i])
for x in range(len(tokenized_tweets)):
    if x[0] not in dlist:
        print(tokenized_tweets[x])

and I got this error: 'int' object is not subscriptable

Read and follow the error message then you'll figure out what the problem is.

In traceback you would see an arrow pointing to line for x in (len(tokenized_tweets)): . The error message says: 'int' object is not iterable . What is your iterable in that for loop? (len(tokenized_tweets)) Is this really an iterable? No it's an int . The output of len() is always an int (unless you overwrite it).

You supposed to pass the length of the tokenized_tweetes to the range() object. It is an iterable.

extra tip:

Since you're finding the words for every tweet, make a set out of your words. Set's membership testing is much more faster than list. (O(1) > O(n))

It also removes duplicates if there are any.

Solution:

with open("wordslist.txt") as f:
    words_list = {word.removesuffix("\n") for word in f}

with open("tweets.txt") as g:
    for tweete in g:
        for word in tweete.split():
            if word not in words_list:
                print(word)

Simply use this, you are missing out writing range

for x in range(len(tokenized_tweets)):
    if x[0] not in dlist:
        print(tokenized_tweets[x])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM