How to check if a list item within a nested list exists in a set?

Question

I have a nested list of every sentence from a corpus. The set is all the words that occur more than once. How would I check if each word within the list is in the set containing only words that occur once? I then need to replace all words that occur more than once with the str UNK.

I tried:

for sent in tokenized_sents:
    for word in sent:
        if word in set:
           word = '<UNK>'

Answer 1

You can create a dictionary which keeps tracks of the number of occurrences of each word in your corpus with collections.Counter

from collections import Counter

corpus = [['Hello', ',', 'my', 'name', 'is', 'Walter'], ['I', 'like', 'my', 'cats']]

corpus_unnested = []
for sentence in corpus:
    corpus_unnested += sentence
my_dict = Counter(corpus_unnested)

for i, sentence in enumerate(corpus):
    for j, word in enumerate(sentence):
        if my_dict[word] > 1:
            corpus[i][j] = '<UNK>'

>>> print(corpus)
[['Hello', ',', '<UNK>', 'name', 'is', 'Walter'], ['I', 'like', '<UNK>', 'cats']]

How to check if a list item within a nested list exists in a set?

Question

1 answers

solution1
0 ACCPTED 2022-02-22 01:18:52

How to check if a list item within a nested list exists in a set?

Question

1 answers

solution1 0 ACCPTED 2022-02-22 01:18:52

solution1
0 ACCPTED 2022-02-22 01:18:52