简体   繁体   English

Pandas Python - 如何使用 pivot 表中的 MultiIndex 创建新列

[英]Pandas Python - How to create new columns with MultiIndex from pivot table

I have created a pivot table with 2 different types of values i) Number of apples from 2017-2020, ii) Number of people from 2017-2020.我创建了一个 pivot 表,其中包含 2 种不同类型的值 i) 2017-2020 年的苹果数量,ii) 2017-2020 年的人数。 I want to create additional columns to calculate iii) Apples per person from 2017-2020.我想创建额外的列来计算 iii) 2017-2020 年每人的苹果数。 How can I do so?我该怎么做?

Current code for pivot table: pivot表的当前代码:

tdf = df.pivot_table(index="States",
                     columns="Year",
                     values=["Number of Apples","Number of People"],
                     aggfunc= lambda x: len(x.unique()),
                     margins=True)
tdf 

Here is my current pivot table:这是我当前的 pivot 表:

                Number of Apples                  Number of People
                2017    2018    2019    2020      2017    2018    2019    2020   
California        10      18      20      25         2       3       4       5
West Virginia      8      35      25      12         2       5       5       4
...

I want my pivot table to look like this, where I add additional columns to divide Number of Apples by Number of People.我希望我的 pivot 表看起来像这样,我在其中添加了额外的列以将苹果数量除以人数。

                Number of Apples                  Number of People                  Number of Apples per Person
                2017    2018    2019    2020      2017    2018    2019    2020      2017    2018    2019    2020   
California        10      18      20      25         2       3       4       5       5       6       5       5      
West Virginia      8      35      25      12         2       5       5       4       4       7       5       3

I've tried a few things, such as:我尝试了一些事情,例如:

  • Creating a new column via assigning new column names, but does not work with multiple column index tdf["Number of Apples per Person"][2017] = tdf["Number of Apples"][2017] / tdf["Number of People"][2017]通过分配新列名创建新列,但不适用于多列索引tdf["Number of Apples per Person"][2017] = tdf["Number of Apples"][2017] / tdf["Number of People"][2017]
  • Tried the other assignment method tdf.assign(tdf["Number of Apples per Person"][2017] = tdf["Enrollment ID"][2017] / tdf["Student ID"][2017]) ;尝试了其他赋值方法tdf.assign(tdf["Number of Apples per Person"][2017] = tdf["Enrollment ID"][2017] / tdf["Student ID"][2017]) ; got this error SyntaxError: expression cannot contain assignment, perhaps you meant "=="?收到此错误SyntaxError: expression cannot contain assignment, perhaps you meant "=="?

Appreciate any help!感谢任何帮助! Thanks谢谢

What you can do here is stack() , do your thing, and then unstack() :你可以在这里做的是stack() ,做你的事,然后unstack()

s = df.stack()
s['Number of Apples per Person'] = s['Number of Apples'] / s['Number of People']
df = s.unstack()

Output: Output:

>>> df
              Number of Apples                Number of People                Number of Apples per Person               
                          2017 2018 2019 2020             2017 2018 2019 2020                        2017 2018 2019 2020
California                  10   18   20   25                2    3    4    5                         5.0  6.0  5.0  5.0
West Virginia                8   35   25   12                2    5    5    4                         4.0  7.0  5.0  3.0

One-liner:单线:

df = df.stack().pipe(lambda x: x.assign(**{'Number of Apples per Person': x['Number of Apples'] / x['Number of People']})).unstack()

Given鉴于

df
              Number of Apples                Number of People               
                          2017 2018 2019 2020             2017 2018 2019 2020
California                  10   18   20   25                2    3    4    5
West Virginia                8   35   25   12                2    5    5    4

You can index on the first level to get sub-frames and then divide.您可以在第一级进行索引以获取子帧,然后进行划分。 The division will be auto-aligned on the columns.该部门将在列上自动对齐。

df['Number of Apples'] / df['Number of People']
               2017  2018  2019  2020
California      5.0   6.0   5.0   5.0
West Virginia   4.0   7.0   5.0   3.0

Append this back to your DataFrame: Append 这回到你的 DataFrame:

pd.concat([df, pd.concat([df['Number of Apples'] / df['Number of People']], keys=['Result'], axis=1)], axis=1)
              Number of Apples                Number of People                Result               
                          2017 2018 2019 2020             2017 2018 2019 2020   2017 2018 2019 2020
California                  10   18   20   25                2    3    4    5    5.0  6.0  5.0  5.0
West Virginia                8   35   25   12                2    5    5    4    4.0  7.0  5.0  3.0

This is fast since it is completely vectorized.这很快,因为它是完全矢量化的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM