[英]How to find the `True` values' corresponding index and column in a large Pandas DataFrame?
I have a large DataFrame df
whose values are mostly False
.我有一个大的 DataFrame df
,其值大多为False
。
About 1% of the values of df
are True
. df
的大约 1% 的值为True
。
How can I display the True
values' corresponding index and column?如何显示True
值的相应索引和列?
Here's the index of df
这是df
的索引
df.index
DatetimeIndex(['2007-04-23', '2007-04-24', '2007-04-25', '2007-04-26',
'2007-04-27', '2007-04-30', '2007-05-02', '2007-05-03',
'2007-05-04', '2007-05-07',
...
'2021-02-24', '2021-02-25', '2021-02-26', '2021-03-02',
'2021-03-03', '2021-03-04', '2021-03-05', '2021-03-08',
'2021-03-09', '2021-03-10'],
dtype='datetime64[ns]', name='date', length=3426, freq=None)
Here's the columns of df
这是df
的列
df.columns
Index(['0015', '0050', '0051', '0052', '0053', '0054', '0055', '0056', '0057',
'0058',
...
'9944', '9945', '9946', '9949', '9950', '9951', '9955', '9958', '9960',
'9962'],
dtype='object', name='stock_id', length=1947)
And df.shape
returns (3426, 1947)
.并且df.shape
返回(3426, 1947)
。
Suppose only the values of df['1234']['2020-01-05']
, and df['4321']['2020-03-07']
are true.假设只有df['1234']['2020-01-05']
和df['4321']['2020-03-07']
的值为真。
How can I write a function whose input is df
and whose output are df['1234']['2020-01-05']
and df['4321']['2020-03-07']
?如何编写一个 function ,其输入为df
,其 output 为df['1234']['2020-01-05']
和df['4321']['2020-03-07']
If need DataFrame
use DataFrame.stack
for MultiIndex Series
and then filter True
s with convert MultiIndex
to DataFrame
by Index.to_frame
:如果需要DataFrame
使用DataFrame.stack
作为MultiIndex Series
,然后通过Index.to_frame
将MultiIndex
转换为DataFrame
过滤True
:
#data from @Quang Hoang answer
s = df.stack()
df1 = s[s].index.to_frame(index=False).set_axis(['idx','cols'], axis=1)
print (df1)
idx cols
0 2010 a
1 2011 c
Suppose we have this:假设我们有这个:
# Test data
a b c
2010 True False False
2011 False False True
You can try np.where
:您可以尝试np.where
:
x,y = np.where(df)
indexes = df.index[x]
columns = df.columns[y]
print(indexes, columns)
Output: Output:
Index(['2010', '2011'], dtype='object') Index(['a', 'c'], dtype='object')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.