Exclude certain values when checking for a condition with pandas

Question

I have a following dataframe.(Actual columns in the dataframe are more than 30)

ID  col1    col2    col3   col4
 1  a##      b##    HO      HO
 2  c##      d23   c##      HO
 3  a##      k##    z##     s#
 4  c##      b12    a12     c12
 5   b##     HO     z##      HO

I want to make a new dataframe filtering rows that have all the strings ending with '##' and if one of the cells in the row contains 'HO',I want to skip it and return a row if the rest of the columns contain string ending with ##

In the above case new dataframe would be like this:

ID  col1    col2    col3  col4
1   a##      b##    HO    HO
3   a##      k##    z##   s#
5   b##      HO     z##   HO

currently I am doing

m = np.logical_and.reduce([
[x.endswith('##')  for x in df[c] ] for c in df.columns if x!='HO'] )

But df[m] is only giving one following row and not the other two

3   a##      k##    z##

How can I fix this?

Answer 1

You can combine Boolean filters as below. Note are only 2 valid scenarios: either you have 2 strings ending in '##' and one equal to 'HO' , or you have all 3 strings ending in '##' .

suffix = df.iloc[:, 1:].apply(lambda s: s.str[-2:].eq('##')).sum(1)
value = df.iloc[:, 1:].eq('HO').sum(1)

res = df[(suffix.eq(2) & value.eq(1)) | suffix.eq(3)]

print(res)

   ID col1 col2 col3
0   1  a##  b##   HO
2   3  a##  k##  z##
4   5  b##   HO  z##

Answer 2

You can replace "HO" with a dummy string '##', store this result as a temporary DataFrame, compute the mask on the temp, and use that to index back into df .

# df = df.set_index('ID')  # Run this if not already done.
df

  col1 col2 col3
ID               
1   a##  b##   HO
2   c##  d23  c##
3   a##  k##  z##
4   c##  b12  a12
5   b##   HO  z##

u = df.replace('HO', '##')
m = np.logical_and.reduce([
   [x.endswith('##')  for x in u[c]] for c in u.columns]
)
df[m]

   col1 col2 col3
ID               
1   a##  b##   HO
3   a##  k##  z##
5   b##   HO  z##

Answer 3

You can try this:

import pandas as pd
import numpy as np

data = {'col1':['a##', 'c##', 'a##', 'c##', 'b##'], 'col2':['b##', 'd23', 'k##', 'b12', 'HO'], 'col3': ['HO', 'c##', 'z##', 'a12', 'z##']}

df = pd.DataFrame(data = data)

m = np.logical_and.reduce([[('##' in x) | (x=='HO') for x in df[c] ]for c in df.columns  ])

print(df, '\n\n',df[m])


 col1 col2 col3
0  a##  b##   HO
1  c##  d23  c##
2  a##  k##  z##
3  c##  b12  a12
4  b##   HO  z## 

   col1 col2 col3
0  a##  b##   HO
2  a##  k##  z##
4  b##   HO  z##

Exclude certain values when checking for a condition with pandas

Question

3 answers

solution1
1 2019-01-18 00:58:24

solution2
1 ACCPTED 2019-01-18 01:12:00

solution3
0 2019-01-18 01:54:06

Exclude certain values when checking for a condition with pandas

Question

3 answers

solution1 1 2019-01-18 00:58:24

solution2 1 ACCPTED 2019-01-18 01:12:00

solution3 0 2019-01-18 01:54:06

solution1
1 2019-01-18 00:58:24

solution2
1 ACCPTED 2019-01-18 01:12:00

solution3
0 2019-01-18 01:54:06