Say I have a DF like this
Words1 Score
The Man 10
Right Hand 7
Bad Boy Company 7
Seven Deadly Sins 11
What I was hoping to do was create a user input like this:
var = input("Enter the Words That Can Never Appear Together in the same phrase: ")
Where the user enters words that should never appear together in a phrase. So lets say that var = Bad Company' the DF becomes after df.dropna()
Words1 Score
The Man 10
Right Hand 7
Seven Deadly Sins 11
So I have two questions : Is there any way to actually do this? And if so, if there a way to support multiple queries, such as if someone wanted a row removed that contained an instance where 'Bad' and 'Company' appeared in phrase and also any rows removed where 'Seven' and 'Sins' appears in a phrase?
Hopefully someone can help me!
You can vectorize 'Words1' into a series and then apply a regex:
>>> df
Score Words
0 10 The Man
1 7 Right Hand
2 7 Bad Boy Company
3 11 Seven Deadly Sins
>>> df['Words'].str.contains('Bad')
0 False
1 False
2 True
3 False
Name: Words, dtype: bool
>>> df['Words'].str.contains('^(?=.*Bad)(?=.*Company)')
0 False
1 False
2 True
3 False
Name: Words, dtype: bool
Then use those booleans to remove the one you do not want with boolean masking:
>>> df=df[df['Words'].str.contains('^(?=.*Bad)(?=.*Company)')==False]
>>> df
Score Words
0 10 The Man
1 7 Right Hand
3 11 Seven Deadly Sins
[3 rows x 2 columns]
>>> df=df[df['Words'].str.contains('^(?=.*Sins)(?=.*Seven)')==False]
>>> df
Score Words
0 10 The Man
1 7 Right Hand
[2 rows x 2 columns]
To split user input into patterns:
>>> s=raw_input('Words: ')
Words: Seven Sins
>>> s
'Seven Sins'
>>> pattern='^'+''.join('(?=.*{})'.format(word) for word in s.split())
>>> pattern
'^(?=.*Seven)(?=.*Sins)'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.