Drop column if all values equal a string value

Question

Say I have a dataframe like this, df :

Date      Time Black Carbon Carbon monoxide  PM10                    Particulate matter
0  19/10/2015  01:00:00      No data         No data                 No data   
1  19/10/2015  02:00:00      No data         No data                 No data   
2  19/10/2015  03:00:00      10              No data                 No data   
3  19/10/2015  04:00:00      No data         11 .                    No data   
4  19/10/2015  05:00:00      No data         No data                 No data

I can remove all na columns via:

tmp_df= df.dropna(axis=1,how='all')

However, I wish to delete a column, on the condition that every row contains a string, No data

In this case, we would remove the Particulate matter column

Answer 1

You want columns such that not all columns equal No data .

df.loc[:, ~(df.astype(str) == 'No data').all()]

Output

                  Date Time Black Carbon Carbon monoxide     PM10
0 19/10/2015  01:00:00                           No data  No data
1 19/10/2015  02:00:00                           No data  No data
2 19/10/2015  03:00:00                                10  No data
3 19/10/2015  04:00:00                           No data     11 .
4 19/10/2015  05:00:00                           No data  No data

Answer 2

Alternatively, you can do:

df.loc[:, ~df.apply(lambda x: x.nunique() == 1 and x[0]=='No data', axis=0)]

That i) checks whether there is only one element in the column using nunique and ii) whether the first element of the column is equal to your string.

Demonstration:

df1 = pd.DataFrame(np.random.randn(3,3), columns=list('abc'))
df1['d'] = 'No data'
df1['e'] = ['No data', 0, 'No data']

          a         b         c        d        e
0 -0.441122  3.499830 -0.161578  No data  No data
1  1.683904  0.217083 -1.167220  No data        0
2 -1.143193 -0.386444 -0.403479  No data  No data

Then

df1.loc[:, ~df1.apply(lambda x: x.nunique() == 1 and x[0]=='No data', axis=0)]

returns

          a         b         c        e
0 -0.441122  3.499830 -0.161578  No data
1  1.683904  0.217083 -1.167220        0
2 -1.143193 -0.386444 -0.403479  No data

EDIT:

As an alternative to @Ted Petrou's answer:

df1.loc[:, ~(df1.values == 'No data').all(axis=0)]

I don't know, however, whether it is more efficient to convert all values to strings as in his answer or to just use .values .

Drop column if all values equal a string value

Question

2 answers

solution1
4 ACCPTED 2017-03-22 15:55:18

solution2
1 2017-03-22 17:26:58

Drop column if all values equal a string value

Question

2 answers

solution1 4 ACCPTED 2017-03-22 15:55:18

solution2 1 2017-03-22 17:26:58

solution1
4 ACCPTED 2017-03-22 15:55:18

solution2
1 2017-03-22 17:26:58