简体   繁体   English

熊猫数据透视表和多列布尔比较

[英]Pandas pivoted dataframe and multi-column Boolean comparison

I have a pivoted dataframe of the form 我有一个数据透视表的形式

      Price             Units  
Buyer     B     G     S     B   G   S
Idx                                  
1         0  1.51     0     0  11   0
2      2.32  1.32     0    21  13   0
3         0     0  1.44     0   0  14

I am trying to do create another major column called "Flag" with B, G, S sub-columns using the logic that can be thought of as (cell-by-cell) 我正在尝试使用可以被认为是(逐个单元)的逻辑,用B,G,S子列创建另一个名为“ Flag”的主列。

p['Flag'] = (p['Price'] < 2.0) & (p['Units'] > 13.5)

So the desired result (showing only the new columns) 因此,期望的结果(仅显示新列)

       Flag
Buyer     B     G     S     
Idx                                  
1     False False False
2     False False False
3     False False  True

I have tried quite a few ways and the following comes closer than others 我尝试了很多方法,以下方法比其他方法更接近

newp = p.join(((p['Price'] < 2.0) & (p['Units'] > 13.5)).rename(columns=dict(Price='Flag')))

but this has two issues 但这有两个问题

  1. The boolean output is incorrect for the bottom right corner. 布尔输出在右下角不正确。 It should be true since the corresponding cell price is less than 2.0 and the corresponding cell units is more than 13.5. 这应该是正确的,因为相应的单元格价格小于2.0,相应的单元格单元大于13.5。
  2. It gives the warning "UserWarning: merging between different levels can give an unintended result (2 levels on the left, 1 on the right)". 它给出警告“ UserWarning:在不同级别之间合并会产生意想不到的结果(左侧2级,右侧1级)”。 I can't seem to get the major column name "Flag" into the dataframe. 我似乎无法将主要列名称“ Flag”添加到数据框中。

Any ideas on fixing the Boolean conditions and merging at the correct level? 关于固定布尔条件并在正确级别合并的任何想法?

The code for generating the initial dataframe is 用于生成初始数据帧的代码是

from collections import OrderedDict
import pandas as pd

table = OrderedDict((
    ("Idx", [1, 2, 2, 3]),
    ('Buyer',['G', 'B', 'G', 'S']),
    ('Price',  ['1.51', '2.32', '1.32', '1.44']),
    ('Units',   ['11', '21', '13', '14'])
))
d = pd.DataFrame(table)
p = d.pivot(index='Idx', columns='Buyer')
p.fillna(0, inplace=True)

I think you need convert string numbers to float by astype and then use concat : 我认为您需要将字符串数字转换为按astype float ,然后使用concat

p = p.astype(float)

newp = pd.concat([p['Price'], p['Units'], (p['Price'] < 2.0) & (p['Units'] > 13.5)], 
                 axis=1, 
                 keys=['Price','Units','Flag'])
print (newp)

      Price             Units               Flag              
Buyer     B     G     S     B     G     S      B      G      S
Idx                                                           
1      0.00  1.51  0.00   0.0  11.0   0.0  False  False  False
2      2.32  1.32  0.00  21.0  13.0   0.0  False  False  False
3      0.00  0.00  1.44   0.0   0.0  14.0  False  False   True

Solution with join and MultiIndex.from_product for create new level : 使用joinMultiIndex.from_product创建新level解决方案:

p = p.astype(float)

a = (p['Price'] < 2.0) & (p['Units'] > 13.5)
a.columns = pd.MultiIndex.from_product([['Flag'],a.columns])
p = p.join(a)
print (p)
      Price             Units               Flag              
Buyer     B     G     S     B     G     S      B      G      S
Idx                                                           
1      0.00  1.51  0.00   0.0  11.0   0.0  False  False  False
2      2.32  1.32  0.00  21.0  13.0   0.0  False  False  False
3      0.00  0.00  1.44   0.0   0.0  14.0  False  False   True

Use double brackets on 'Price' to preserve the multi-index and logically combine with 'Units' after having removed the first level of the multi-index. 'Price'上使用双括号保存多索引,并在删除多索引的第一级后与'Units'进行逻辑组合。 This way, the level that is left naturally combines with the 2nd level of the multi-index from 'Price' 这样,剩下的水平自然就会与'Price'多指标的第二水平结合

Enough talk. 聊够了。 Observe: 观察:

p[['Price']].lt(2) & p.Units.gt(13.5)

       Price              
Buyer      B      G      S
Idx                       
1      False  False  False
2      False  False  False
3      False  False   True

Now all that's left is to rename 'Price' and join 现在剩下的就是重命名'Price'join

p.join(
    (
        p[['Price']].lt(2) & p.Units.gt(13.5)
    ).rename(columns=dict(Price='Flag'))
)

      Price             Units               Flag              
Buyer     B     G     S     B     G     S      B      G      S
Idx                                                           
1      0.00  1.51  0.00   0.0  11.0   0.0  False  False  False
2      2.32  1.32  0.00  21.0  13.0   0.0  False  False  False
3      0.00  0.00  1.44   0.0   0.0  14.0  False  False   True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM