I have a list
top = ['GME', 'MVIS', 'TSLA', 'AMC']
and I have a dataset
dt ... text
0 2021-03-19 20:59:49+06 ... I only need TSLA TSLA TSLA TSLA to hit 20 eod to make up for a...
1 2021-03-19 20:59:51+06 ... Oh this isn’t good
2 2021-03-19 20:59:51+06 ... lads why is my account covered in more GME ...
3 2021-03-19 20:59:51+06 ... I'm tempted to drop my last 800 into some TSLA...
So what i want to do is to check if the sentence contains more than 3 words in the row from the list I want to remove this row
Thank you for help
Let's write a function that determines wether there is, in a given sentence, more than 3 words from the list "top" :
def check_words(sentence,top):
words = sentence.split()
count = 0
for word in words :
if word in top :
count+=1
return(count>3)
Then you want to create a column True/False wether the sentence contains over 3 words from the list. Let's use pandas dataframe structure :
dataframe['Contains_3+_words'] = dataframe.apply(lambda r : check_words(r.text,top), axis=1)
Then we keep only the rows without sentences containing 3+ words from the list :
dataframe = dataframe[dataframe['Contains_3+_words']==False]]
Additionally, you can remove the column we created :
dataframe.drop(['Contains_3+_words'], axis=1, inplace=True)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.