简体   繁体   中英

Deleting DataFrame column in Pandas based on value

I have a dataframe something like this:

    Col0    Col1    Col2    Col3
1   a       b       g       a
2   a       d       z       a
3   a       g       x       a
4   a       h       p       a
5   a       b       c       a

I need to remove the columns where the value is 'a'. No other cells contain the value 'a'(Ex. Here Col1 and Col2 will have no cells with value 'a').I have around 1000 columns and I'm not really sure what all columns have the value 'a'. The dataframe required should be something like this.,

    Col1    Col2
1   b       g   
2   d       z    
3   g       x    
4   h       p    
5   b       c    

What's the best way to do this?

Use any if need check if at least one True or all if need check all True s with boolean indexing and loc , because filter columns:

print (df)
  Col0 Col1 Col2 Col3
0    a    a    g    a
1    a    d    z    a
2    a    g    x    a
3    a    h    p    a
4    a    b    c    a


df2 = df.loc[:, ~(df == 'a').any()]
print (df2)
  Col2
0    g
1    z
2    x
3    p
4    c

df1 = df.loc[:, ~(df == 'a').all()]
print (df1)
  Col1 Col2
0    a    g
1    d    z
2    g    x
3    h    p
4    b    c

Detail:

print (df == 'a')

   Col0   Col1   Col2  Col3
0  True   True  False  True
1  True  False  False  True
2  True  False  False  True
3  True  False  False  True
4  True  False  False  True

df2 = df.loc[:, (df != 'a').any()]
print (df2)
  Col1 Col2
0    a    g
1    d    z
2    g    x
3    h    p
4    b    c

df1 = df.loc[:, (df != 'a').all()]
print (df1)
  Col2
0    g
1    z
2    x
3    p
4    c

print (df != 'a')

    Col0   Col1  Col2   Col3
0  False  False  True  False
1  False   True  True  False
2  False   True  True  False
3  False   True  True  False
4  False   True  True  False

EDIT:

For check mixed types - numeric with strings are 2 possible solutions convert all to string s or compare numpy arrays:

df.astype(str) == 'a'

Or:

df.values == 'a'

Option 1
Using pd.DataFrame.dropna with pd.DataFrame.mask
The concept is that I replace 'a' with np.nan and then conveniently use dropna .

This drops the column even it has one a .

df.mask(df.astype(str).eq('a')).dropna(1)

  Col1 Col2
1    b    g
2    d    z
3    g    x
4    h    p
5    b    c

This requires that all elements of the column be a

df.mask(df.astype(str).eq('a')).dropna(1, how='all')

  Col1 Col2
1    b    g
2    d    z
3    g    x
4    h    p
5    b    c

Option 2
Creative way using np.where to find the unique column positions that have 'a'
This is cool because np.where will return a tuple of arrays that give the positions of all True values in an array. The second array of the tuple will be all the column positions. I grab a unique set of those and find the other column names.

df[df.columns.difference(
       df.columns[np.unique(np.where(df.astype(str).eq('a'))[1]
)])]

  Col1 Col2
1    b    g
2    d    z
3    g    x
4    h    p
5    b    c

Or similarly with pd.DataFrame.drop

df.drop(df.columns[np.unique(np.where(df.astype(str).eq('a'))[1])], 1)

  Col1 Col2
1    b    g
2    d    z
3    g    x
4    h    p
5    b    c

Option 3
Probably bad way of doing it.

df.loc[:, ~df.astype(str).sum().str.contains('a')]

  Col1 Col2
1    b    g
2    d    z
3    g    x
4    h    p
5    b    c

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM