[英]Get the column names for each index(row) such that column value is imposed upon some condition in pandas
I have the following: 我有以下内容:
>>> import pandas as pd
>>> x = pd.DataFrame({'a':[1,3,5], 'b':[4,0,6]})
>>> x
a b
0 1 4
1 3 0
2 5 6
>>> required = {0:['b'],1:['a'],2:['a','b']} ---> how to get it from x??
#keys -> index of x
#values -> list of col names such that value is >2
How can we do this efficiently? 我们如何才能有效地做到这一点?
Here's a one-liner using apply
and to_dict
methods. 这是使用
apply
和to_dict
方法的to_dict
。
In [162]: (x > 2).apply(lambda y: x.columns[y.tolist()].tolist(), axis=1).to_dict()
Out[162]: {0: ['b'], 1: ['a'], 2: ['a', 'b']}
Details 细节
In [173]: (x > 2)
Out[173]:
a b
0 False True
1 True False
2 True True
In [174]: (x > 2).apply(lambda y: [y.tolist()], axis=1)
Out[174]:
0 [[False, True]]
1 [[True, False]]
2 [[True, True]]
dtype: object
In [175]: (x > 2).apply(lambda y: x.columns[y.tolist()].tolist(), axis=1)
Out[175]:
0 [b]
1 [a]
2 [a, b]
dtype: object
Here's another one-liner. 这是另一条线。
In [205]: {i: x.columns[y.tolist()].tolist() for i, y in (x > 2).iterrows()}
Out[205]: {0: ['b'], 1: ['a'], 2: ['a', 'b']}
Or 要么
In [122]: {i: y[y].index.tolist() for i, y in (x > 2).iterrows()}
Out[122]: {0: ['b'], 1: ['a'], 2: ['a', 'b']}
Here are two ideas which are efficient: 这是两个有效的想法:
pd.DataFrame(x.columns.where(x > 2, ''))
Out:
0
0 (, b)
1 (a, )
2 (a, b)
np.where(x > 2, x.columns, '').T
Out:
array([['', 'a', 'a'],
['b', '', 'b']], dtype=object)
Don't know about efficiency but works: 不了解效率,但可以:
df = pd.DataFrame({'a':[1,3,5], 'b':[4,0,6]})
a = defaultdict(list)
for b,c in df.iterrows():
for d in c.iteritems():
if d[1]>2:
a[b].append(d[0])
print dict(a)
Output: 输出:
{0: ['b'], 1: ['a'], 2: ['a', 'b']}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.