[英]Pandas pivoted dataframe and multi-column Boolean comparison
I have a pivoted dataframe of the form 我有一个数据透视表的形式
Price Units
Buyer B G S B G S
Idx
1 0 1.51 0 0 11 0
2 2.32 1.32 0 21 13 0
3 0 0 1.44 0 0 14
I am trying to do create another major column called "Flag" with B, G, S sub-columns using the logic that can be thought of as (cell-by-cell) 我正在尝试使用可以被认为是(逐个单元)的逻辑,用B,G,S子列创建另一个名为“ Flag”的主列。
p['Flag'] = (p['Price'] < 2.0) & (p['Units'] > 13.5)
So the desired result (showing only the new columns) 因此,期望的结果(仅显示新列)
Flag
Buyer B G S
Idx
1 False False False
2 False False False
3 False False True
I have tried quite a few ways and the following comes closer than others 我尝试了很多方法,以下方法比其他方法更接近
newp = p.join(((p['Price'] < 2.0) & (p['Units'] > 13.5)).rename(columns=dict(Price='Flag')))
but this has two issues 但这有两个问题
Any ideas on fixing the Boolean conditions and merging at the correct level? 关于固定布尔条件并在正确级别合并的任何想法?
The code for generating the initial dataframe is 用于生成初始数据帧的代码是
from collections import OrderedDict
import pandas as pd
table = OrderedDict((
("Idx", [1, 2, 2, 3]),
('Buyer',['G', 'B', 'G', 'S']),
('Price', ['1.51', '2.32', '1.32', '1.44']),
('Units', ['11', '21', '13', '14'])
))
d = pd.DataFrame(table)
p = d.pivot(index='Idx', columns='Buyer')
p.fillna(0, inplace=True)
I think you need convert string numbers to float
by astype
and then use concat
: 我认为您需要将字符串数字转换为按astype
float
,然后使用concat
:
p = p.astype(float)
newp = pd.concat([p['Price'], p['Units'], (p['Price'] < 2.0) & (p['Units'] > 13.5)],
axis=1,
keys=['Price','Units','Flag'])
print (newp)
Price Units Flag
Buyer B G S B G S B G S
Idx
1 0.00 1.51 0.00 0.0 11.0 0.0 False False False
2 2.32 1.32 0.00 21.0 13.0 0.0 False False False
3 0.00 0.00 1.44 0.0 0.0 14.0 False False True
Solution with join
and MultiIndex.from_product
for create new level
: 使用join
和MultiIndex.from_product
创建新level
解决方案:
p = p.astype(float)
a = (p['Price'] < 2.0) & (p['Units'] > 13.5)
a.columns = pd.MultiIndex.from_product([['Flag'],a.columns])
p = p.join(a)
print (p)
Price Units Flag
Buyer B G S B G S B G S
Idx
1 0.00 1.51 0.00 0.0 11.0 0.0 False False False
2 2.32 1.32 0.00 21.0 13.0 0.0 False False False
3 0.00 0.00 1.44 0.0 0.0 14.0 False False True
Use double brackets on 'Price'
to preserve the multi-index and logically combine with 'Units'
after having removed the first level of the multi-index. 在'Price'
上使用双括号保存多索引,并在删除多索引的第一级后与'Units'
进行逻辑组合。 This way, the level that is left naturally combines with the 2nd level of the multi-index from 'Price'
这样,剩下的水平自然就会与'Price'
多指标的第二水平结合
Enough talk. 聊够了。 Observe: 观察:
p[['Price']].lt(2) & p.Units.gt(13.5)
Price
Buyer B G S
Idx
1 False False False
2 False False False
3 False False True
Now all that's left is to rename 'Price'
and join
现在剩下的就是重命名'Price'
并join
p.join(
(
p[['Price']].lt(2) & p.Units.gt(13.5)
).rename(columns=dict(Price='Flag'))
)
Price Units Flag
Buyer B G S B G S B G S
Idx
1 0.00 1.51 0.00 0.0 11.0 0.0 False False False
2 2.32 1.32 0.00 21.0 13.0 0.0 False False False
3 0.00 0.00 1.44 0.0 0.0 14.0 False False True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.