简体   繁体   中英

Conditionally grabbing column headings in pandas dataframe

I have a pandas DataFrame with many columns and indexed by probability. Below is code that can generate a sample df

import numpy as N
probs = N.arange(0, 1, .1)
data = N.random.random_integers(0, 500, (10,3))
df = DataFrame(data, index=probs, columns=['col1', 'col2', 'col3'])

I want to grab the column headings where the counts in the cells are above some threshold for specific probabilities. For example, if I care about probabilities >=.75 and have values in the cells above 100 I can do the following

df[df['Probability'] >= .75] >= 100

But based on that indexing, how do I get the column headings where at least one entry is True ? (ie 'col1' has at least one value indexed by a probability higher than .75 and greater than 100, not necessarily all of them)

You can pass a boolean vector to columns axis of .loc . For example, if you want columns where all values are above 100, your mask would be:

In [111]: mask = (df[df.index > .75] >= 100).any()

Then you can pass this to .loc to filter.

In [112]: df.loc[:, mask]
Out[112]: 
     col1  col2  col3
0.0   358    30   241
0.1   330    71   119
0.2   311    92   204
0.3   347   245   344
0.4   214   219   347
0.5   152   241    65
0.6   232   487    61
0.7   478   314   196
0.8   477   317   291
0.9   303    99   342

If you just want the column headings, you can apply the mask to itself.

In [119]: mask[mask].index
Out[119]: Index([u'col1', u'col2', u'col3'], dtype='object')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM