How to remove row if there are repeated words in the sentence

Question

I have a list

top = ['GME', 'MVIS', 'TSLA', 'AMC']

and I have a dataset

                            dt  ...                                               text
0       2021-03-19 20:59:49+06  ...  I only need TSLA TSLA TSLA TSLA to hit 20 eod to make up for a...
1       2021-03-19 20:59:51+06  ...                                 Oh this isn’t good
2       2021-03-19 20:59:51+06  ...  lads why is my account covered in more GME ...
3       2021-03-19 20:59:51+06  ...  I'm tempted to drop my last 800 into some TSLA...

So what i want to do is to check if the sentence contains more than 3 words in the row from the list I want to remove this row

Thank you for help

Answer 1

Let's write a function that determines wether there is, in a given sentence, more than 3 words from the list "top" :

def check_words(sentence,top):
    words = sentence.split()
    count = 0
    for word in words :
        if word in top :
             count+=1
    return(count>3)

Then you want to create a column True/False wether the sentence contains over 3 words from the list. Let's use pandas dataframe structure :

dataframe['Contains_3+_words'] = dataframe.apply(lambda r : check_words(r.text,top), axis=1)

Then we keep only the rows without sentences containing 3+ words from the list :

dataframe = dataframe[dataframe['Contains_3+_words']==False]]

Additionally, you can remove the column we created :

dataframe.drop(['Contains_3+_words'], axis=1, inplace=True)

How to remove row if there are repeated words in the sentence

Question

1 answers

solution1
2 ACCPTED 2021-07-27 12:15:21

How to remove row if there are repeated words in the sentence

Question

1 answers

solution1 2 ACCPTED 2021-07-27 12:15:21

solution1
2 ACCPTED 2021-07-27 12:15:21