简体   繁体   中英

Drop column if all values equal a string value

Say I have a dataframe like this, df :

Date      Time Black Carbon Carbon monoxide  PM10                    Particulate matter
0  19/10/2015  01:00:00      No data         No data                 No data   
1  19/10/2015  02:00:00      No data         No data                 No data   
2  19/10/2015  03:00:00      10              No data                 No data   
3  19/10/2015  04:00:00      No data         11 .                    No data   
4  19/10/2015  05:00:00      No data         No data                 No data 

I can remove all na columns via:

tmp_df= df.dropna(axis=1,how='all')

However, I wish to delete a column, on the condition that every row contains a string, No data

In this case, we would remove the Particulate matter column

You want columns such that not all columns equal No data .

df.loc[:, ~(df.astype(str) == 'No data').all()]

Output

                  Date Time Black Carbon Carbon monoxide     PM10
0 19/10/2015  01:00:00                           No data  No data
1 19/10/2015  02:00:00                           No data  No data
2 19/10/2015  03:00:00                                10  No data
3 19/10/2015  04:00:00                           No data     11 .
4 19/10/2015  05:00:00                           No data  No data

Alternatively, you can do:

df.loc[:, ~df.apply(lambda x: x.nunique() == 1 and x[0]=='No data', axis=0)]

That i) checks whether there is only one element in the column using nunique and ii) whether the first element of the column is equal to your string.

Demonstration:

df1 = pd.DataFrame(np.random.randn(3,3), columns=list('abc'))
df1['d'] = 'No data'
df1['e'] = ['No data', 0, 'No data']

          a         b         c        d        e
0 -0.441122  3.499830 -0.161578  No data  No data
1  1.683904  0.217083 -1.167220  No data        0
2 -1.143193 -0.386444 -0.403479  No data  No data

Then

df1.loc[:, ~df1.apply(lambda x: x.nunique() == 1 and x[0]=='No data', axis=0)]

returns

          a         b         c        e
0 -0.441122  3.499830 -0.161578  No data
1  1.683904  0.217083 -1.167220        0
2 -1.143193 -0.386444 -0.403479  No data

EDIT:

As an alternative to @Ted Petrou's answer:

df1.loc[:, ~(df1.values == 'No data').all(axis=0)]

I don't know, however, whether it is more efficient to convert all values to strings as in his answer or to just use .values .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM