简体   繁体   English

当使用自定义 function 时,Pandas 应用/映射在某些行不起作用

[英]Pandas apply/map not working at some rows when custom function is used

I have the following DataFrame where the CMO _ ($ / MWH) is a finantial value with needs to be rounded using the old rounding method (eg: 754.275 must be rounded to 754.28):我有以下 DataFrame 其中CMO _ ($ / MWH)是一个需要使用旧舍入方法四舍五入的最终值(例如:754.275 必须四舍五入为 754.28):

    INIT_DATE  INIT_HOUR   CMO_($/MWH)
0   2020-12-01  00:00:00    754.275
1   2020-12-01  01:00:00    728.130
2   2020-12-01  02:00:00    722.575
3   2020-12-01  03:00:00    722.045
4   2020-12-01  04:00:00    721.950
5   2020-12-01  05:00:00    721.035
6   2020-12-01  06:00:00    722.100
7   2020-12-01  07:00:00    739.925
8   2020-12-01  08:00:00    771.390
9   2020-12-01  09:00:00    797.415
10  2020-12-01  10:00:00    796.585
11  2020-12-01  11:00:00    791.875
12  2020-12-01  12:00:00    782.225
13  2020-12-01  13:00:00    783.540
14  2020-12-01  14:00:00    790.980
15  2020-12-01  15:00:00    815.555
16  2020-12-01  16:00:00    824.760
17  2020-12-01  17:00:00    782.265
18  2020-12-01  18:00:00    779.970
19  2020-12-01  19:00:00    784.640
20  2020-12-01  20:00:00    785.380
21  2020-12-01  21:00:00    785.840
22  2020-12-01  22:00:00    779.775
23  2020-12-01  23:00:00    763.775

Since Python 3+ uses the Bank's Rounding rule, I implemented a custom round function that makes the old rounding method:由于 Python 3+ 使用银行的舍入规则,我实现了一个自定义舍入 function 使旧舍入方法:

def _old_round(n):
    """
        Since Python 3+ the rounding is made through the Banker's Round Rule,
        this function uses the old rounding method (round up if 0.5).
        
        Args:
            n (float): A number with 3 thousands values after decimal point
        
        Returns:
            (float)  : The rounded number now with 2 decimal places, but using
                        the old round method
        Example:
            >>> _old_round(782.225)
            >>> 782.23
        Additional:
            For more information about rounding at Python 3, go to:
                https://stackoverflow.com/questions/10825926/python-3-x-rounding-behavior

    """
    # * Multiply ou number by 100 since we want to round the 2 decimal place
    new_n = n * 100
    # * Round using the Decimal lib, but using the old method
    n_dec = Decimal(new_n).quantize(Decimal('1'), rounding=ROUND_HALF_UP)
    # * Roll back to original format
    return float(n_dec / 100)

The function works normally when I use only numbers (see function example above), but when I use the Pandas apply or map to perform the calculation in the CMO _ ($ / MWH) column, some values are not rounded correctly ( index 12 ), does anyone know why? The function works normally when I use only numbers (see function example above), but when I use the Pandas apply or map to perform the calculation in the CMO _ ($ / MWH) column, some values are not rounded correctly ( index 12 ) ,有人知道为什么吗? Find the DF after calculation bellow:在下面计算后找到 DF:

df['ROUNDED_CMO'] = df['CMO_($/MWH)'].apply(_old_round)
df

Results:结果:

    INIT_DATE  INIT_HOUR  CMO_($/MWH)   NEW_CMO
0   2020-12-01  00:00:00    754.275     754.28
1   2020-12-01  01:00:00    728.130     728.13
2   2020-12-01  02:00:00    722.575     722.58
3   2020-12-01  03:00:00    722.045     722.05
4   2020-12-01  04:00:00    721.950     721.95
5   2020-12-01  05:00:00    721.035     721.04
6   2020-12-01  06:00:00    722.100     722.10
7   2020-12-01  07:00:00    739.925     739.93
8   2020-12-01  08:00:00    771.390     771.39
9   2020-12-01  09:00:00    797.415     797.42
10  2020-12-01  10:00:00    796.585     796.59
11  2020-12-01  11:00:00    791.875     791.88
12  2020-12-01  12:00:00    782.225     782.22 <
13  2020-12-01  13:00:00    783.540     783.54
14  2020-12-01  14:00:00    790.980     790.98
15  2020-12-01  15:00:00    815.555     815.56
16  2020-12-01  16:00:00    824.760     824.76
17  2020-12-01  17:00:00    782.265     782.27
18  2020-12-01  18:00:00    779.970     779.97
19  2020-12-01  19:00:00    784.640     784.64
20  2020-12-01  20:00:00    785.380     785.38
21  2020-12-01  21:00:00    785.840     785.84
22  2020-12-01  22:00:00    779.775     779.78
23  2020-12-01  23:00:00    763.775     763.78

The value at index 12 should be: 782.23 and not 782.22 , the function works for all the data but not for this specific index for some reason... I'm using a Miniconda Environment (Windows 10) with pandas 1.1.3, Python 3.6.12, jupyter-notebook 6.1.4 (I'm running the code here). The value at index 12 should be: 782.23 and not 782.22 , the function works for all the data but not for this specific index for some reason... I'm using a Miniconda Environment (Windows 10) with pandas 1.1.3, Python 3.6.12,jupyter-notebook 6.1.4(我在这里运行代码)。

Obs: I already tried to use Math.Ceil, Numpy.Ceil, Decimal Package Round Options, and the solutions at: Question 1 , Question 2 , Question 3 , none of them work properly. Obs:我已经尝试过使用 Math.Ceil、Numpy.Ceil、Decimal Package Round Options,以及以下解决方案: 问题 1问题 2问题 3 ,它们都不能正常工作。 I also already tried (the index 12 still not rounding):我也已经尝试过(索引 12 仍未四舍五入):

df['ROUNDED_CMO'] = df['CMO_($/MWH)'].map(lambda x : _old_round(x))
df['ROUNDED_CMO'] = df['CMO_($/MWH)'].apply(lambda x : _old_round(x))

For me your code is working perfectly, maybe you have a version problem in your packages.对我来说,您的代码运行良好,也许您的包中有版本问题。 I will show you:我给你看:

  • My pandas version: '1.0.5'.我的pandas版本:'1.0.5'。
  • My decimal version: '1.70'我的decimal版本:'1.70'

First I create your dataframe:首先,我创建您的 dataframe:

import pandas as pd
from decimal import Decimal, ROUND_HALF_UP

df = pd.DataFrame({
    "CMO_($/MWH)" :
        [
        754.275,
        728.130,
        722.575,
        722.045,
        721.950,
        721.035,
        722.100,
        739.925,
        771.390,
        797.415,
        796.585,
        791.875,
        782.225,
        783.540,
        790.980,
        815.555,
        824.760,
        782.265,
        779.970,
        784.640,
        785.380,
        785.840,
        779.775,
        763.775
        ]
})

Then I apply your function:然后我申请你的 function:

df['ROUNDED_CMO'] = df['CMO_($/MWH)'].map(lambda x : _old_round(x))

The output is: output 是:

CMO_($/MWH) ROUNDED_CMO
0   754.275 754.28
1   728.130 728.13
2   722.575 722.58
3   722.045 722.05
4   721.950 721.95
5   721.035 721.04
6   722.100 722.10
7   739.925 739.93
8   771.390 771.39
9   797.415 797.42
10  796.585 796.59
11  791.875 791.88
12  782.225 782.23
13  783.540 783.54
14  790.980 790.98
15  815.555 815.56
16  824.760 824.76
17  782.265 782.27
18  779.970 779.97
19  784.640 784.64
20  785.380 785.38
21  785.840 785.84
22  779.775 779.78
23  763.775 763.78

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM