简体   繁体   English

Excel/谷歌表格 pivot function VS python pivot function

[英]Excel/Google Sheets pivot function VS python pivot function

I am trying to pivot a dataframe in python. I used the pivot_table function in python. The trouble is that when I tried to verify the result with the pivot table created in Google Sheets, both the results were different .我正在尝试在 python 中使用 pivot 和 dataframe。我在 python 中使用了pivot_table function。问题是,当我尝试使用 pivot 表格验证结果时,结果都是不同的 Google 表格。 Also the number of rows after pivoting is different in both the cases .在这两种情况下,旋转后的行数也不同 I am confused now as to what is going on here?我现在对这里发生的事情感到困惑? Why are the results of a same function different?为什么同一个function的结果不一样? Aren't they supposed to do the same thing??!他们不应该做同样的事情吗??!

Here is the link to the data and the pivot table done in Google Sheets. 是在 Google 表格中完成的数据和 pivot 表格的链接。 The Order item ID 2 should be the index as it is the unique ID and the Settlement Value should be the values where it should be aggregated. Order item ID 2应该是索引,因为它是唯一 ID, Settlement Value应该是它应该聚合的

Below is the code I use to create pivot table in python with the same data:下面是我使用相同数据在 python 中创建 pivot 表的代码:

payout = pd.read_excel('combined aug-dec payout.xlsx')
payout['Order item ID'] = payout['Order item ID'].apply(str)
payout['Order item ID 2'] = 'OI:' + payout['Order item ID']
pivot_settle = payout.pivot_table(index = ['Order item ID 2'], values = ['Settlement Value'], aggfunc = 'sum')

Any help is appreciated!任何帮助表示赞赏!

The problem comes from the fact that Order item ID 2 contains Google Sheet formulas, not raw data, that are not parsed by Pandas as intended: OI:121314525072009 01 becomes OI:121314525072009 00 when the Excel file is imported.问题来自Order item ID 2包含 Google 表格公式,而不是原始数据,Pandas 未按预期解析这些公式:OI:121314525072009 01在导入 Excel 文件时变为 OI:121314525072009 00 And thus, when you aggregate on this column, you end up whith less rows than in Google Sheet pivot table.因此,当您在此列上聚合时,您最终得到的行数少于 Google 表格 pivot 表中的行数。

So, first, copy/paste values to overwrite formulas in this column, save the file, and then, pivoting your data with Pandas:因此,首先,复制/粘贴值以覆盖此列中的公式,保存文件,然后使用 Pandas 旋转数据:

import pandas as pd

df = pd.read_excel("combined aug-dec payout.xlsx")

print(
    pd.pivot_table(
        df,
        index="Order item ID 2",
        values="Settlement Value",
        aggfunc=sum,
    ).shape[0]
)
# Ouput: 17,754 rows

And with Google Sheets: 17,754 rows .使用 Google 表格: 17,754 rows

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM