Remove Rows Based on User-Input Conditions (Pandas,Python 3)

Question

Say I have a DF like this

  Words1            Score     
 The Man                 10        
 Right Hand              7         
 Bad Boy Company         7  
 Seven Deadly Sins       11

What I was hoping to do was create a user input like this:

var = input("Enter the Words That Can Never Appear Together in the same phrase: ")

Where the user enters words that should never appear together in a phrase. So lets say that var = Bad Company' the DF becomes after df.dropna()

  Words1           Score     
 The Man             10        
 Right Hand           7   
 Seven Deadly Sins   11

So I have two questions : Is there any way to actually do this? And if so, if there a way to support multiple queries, such as if someone wanted a row removed that contained an instance where 'Bad' and 'Company' appeared in phrase and also any rows removed where 'Seven' and 'Sins' appears in a phrase?

Hopefully someone can help me!

Answer 1

You can vectorize 'Words1' into a series and then apply a regex:

>>> df
   Score              Words
0     10            The Man
1      7         Right Hand
2      7    Bad Boy Company
3     11  Seven Deadly Sins
>>> df['Words'].str.contains('Bad')
0    False
1    False
2     True
3    False
Name: Words, dtype: bool
>>> df['Words'].str.contains('^(?=.*Bad)(?=.*Company)')
0    False
1    False
2     True
3    False
Name: Words, dtype: bool

Then use those booleans to remove the one you do not want with boolean masking:

>>> df=df[df['Words'].str.contains('^(?=.*Bad)(?=.*Company)')==False]
>>> df
   Score              Words
0     10            The Man
1      7         Right Hand
3     11  Seven Deadly Sins

[3 rows x 2 columns]
>>> df=df[df['Words'].str.contains('^(?=.*Sins)(?=.*Seven)')==False]
>>> df
   Score       Words
0     10     The Man
1      7  Right Hand

[2 rows x 2 columns]

To split user input into patterns:

>>> s=raw_input('Words: ')
Words: Seven Sins
>>> s
'Seven Sins'
>>> pattern='^'+''.join('(?=.*{})'.format(word) for word in s.split())
>>> pattern
'^(?=.*Seven)(?=.*Sins)'

Remove Rows Based on User-Input Conditions (Pandas,Python 3)

Question

1 answers

solution1
3 ACCPTED 2014-09-30 02:37:17

Remove Rows Based on User-Input Conditions (Pandas,Python 3)

Question

1 answers

solution1 3 ACCPTED 2014-09-30 02:37:17

solution1
3 ACCPTED 2014-09-30 02:37:17