简体   繁体   English

如何用熊猫计算两个数据框之间的百分比差异?

[英]How to calculate percentage difference between two data frames with Pandas?

I'm using pandas , and I perform some calculations and transformations, where I end up with two data frames that look more or less like this: 我正在使用pandas ,并且执行一些计算和转换,最终得到两个看起来或多或少像这样的数据帧:

ID      'abc'     'def'
Total     4         5
Slow      0         0
Normal    1         2
Fast      3         3

ID      'abc'     'def'
Total     3         4
Slow      0         0
Normal    0         1
Fast      3         3

Now, given these two data frames, I want to generate a third data frame, that somehow returns how much percent of the first data frame the second one fulfills. 现在,给定这两个数据帧,我想生成第三个数据帧,以某种方式返回第二个数据帧满足的第一个数据帧的百分比。 Such that I Want the results to be like this: 这样我希望结果是这样的:

ID      'abc'     'dfe'
Total   75.0%      80.0%
Slow     None      None
Normal   0.0%      50.0%
Fast    100.0%     100.0%

If there is a 0 in the first data frame, then in the resultant data frame we set that cell to None or something else. 如果在第一个数据帧中有一个0,那么在结果数据帧中,我们将该单元格设置为None或其他值。 The whole idea is that at the end I will write the results to an Excel file, so I want the cells that have None to be empty in Excel. 整个想法是,最后我将结果写入Excel文件,因此我希望Excel中不包含None的单元格为空。 Any ideas how to do this in Python using pandas ? 有什么想法如何使用pandas在Python中执行此操作?

You can simply divide df2 by df1 on the columns of interest: 您可以在感兴趣的列上简单地将df2除以df1

df2.loc[:,"'abc'":] = df2.loc[:,"'abc'":].div(df1.loc[:,"'abc'":]).mul(100)

     ID     'abc'  'dfe'
0   Total   75.0   80.0
1    Slow    NaN    NaN
2  Normal    0.0   50.0
3    Fast  100.0  100.0

Update 更新

In order to format as specified, you can do: 为了格式化指定,您可以执行以下操作:

df2.loc[:,"'abc'":] = df2.where(df2.loc[:,"'abc'":].isna(), 
                                df2.round(2).astype(str).add('%'))

      ID    'abc'   'dfe'
0   Total   75.0%   80.0%
1    Slow     NaN     NaN
2  Normal    0.0%   50.0%
3    Fast  100.0%  100.0%

Given that there are no decimal places, other than .0 , round(2) has no effect on the displayed floats, however as soon as there is some float with more decimal places after having divided, you will see the 2 decimal positions for all floats. 鉴于除了.0之外没有小数位, round(2)对显示的浮点数没有任何影响,但是,一旦除法后有一些浮点数的小数位数增加,您将看到所有小数点后2彩车。

Pandas offers some possibilities for directly specifying styling in the output excel file . Pandas提供了一些可能性,可以直接在输出excel文件中指定样式 It's limited, but fortunately for you does include a number-format option. 它是有限的,但是幸运的是您确实包含一个数字格式选项。

import pandas as pd

# Initialize example dataframes
df1 = pd.DataFrame(
    data=[[4, 5], [0, 0], [1, 2], [3, 3], [3, 3]],
    index=['Total', 'Slow', 'Normal', 'Fast', 'Fast'],
    columns=['abc', 'def'],
)
df2 = pd.DataFrame(
    data=[[3, 4], [0, 0], [0, 1], [3, 3], [3, 3]],
    index=['Total', 'Slow', 'Normal', 'Fast', 'Fast'],
    columns=['abc', 'def'],
)

result_df = df2 / df1

# Change rows index into data column (to avoid any chance of having non-unique row index values,
# since the pandas styler can only handle unique row index)
result_df = result_df.reset_index()

# Write excel output file with number format styling applied
result_df.style.applymap(lambda _: 'number-format: 0.00%').to_excel('result.xlsx', engine='openpyxl', index=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM