I have a pandas DataFrame
with many columns and indexed by probability. Below is code that can generate a sample df
import numpy as N
probs = N.arange(0, 1, .1)
data = N.random.random_integers(0, 500, (10,3))
df = DataFrame(data, index=probs, columns=['col1', 'col2', 'col3'])
I want to grab the column headings where the counts in the cells are above some threshold for specific probabilities. For example, if I care about probabilities >=.75 and have values in the cells above 100 I can do the following
df[df['Probability'] >= .75] >= 100
But based on that indexing, how do I get the column headings where at least one entry is True ? (ie 'col1' has at least one value indexed by a probability higher than .75 and greater than 100, not necessarily all of them)
You can pass a boolean vector to columns axis of .loc
. For example, if you want columns where all values are above 100, your mask would be:
In [111]: mask = (df[df.index > .75] >= 100).any()
Then you can pass this to .loc
to filter.
In [112]: df.loc[:, mask]
Out[112]:
col1 col2 col3
0.0 358 30 241
0.1 330 71 119
0.2 311 92 204
0.3 347 245 344
0.4 214 219 347
0.5 152 241 65
0.6 232 487 61
0.7 478 314 196
0.8 477 317 291
0.9 303 99 342
If you just want the column headings, you can apply the mask to itself.
In [119]: mask[mask].index
Out[119]: Index([u'col1', u'col2', u'col3'], dtype='object')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.