简体   繁体   English

划分对齐的 DataFrame 列时获取 NaN

[英]Getting NaN when Dividing Aligned DataFrame Columns

I have a dataframe of the form:我有一个如下形式的数据框:

            A             B               C
Cat-1    798.26        456.65          187.56
Cat-2 165165.53      45450.00         4897.57
Cat-3 488565.65      15198.56        15654.65
Cat-4      0.00      54256.35        49878.65
Cat-5   1156.61        789.05        89789.54
Cat-6      0.00       1644.78         6876.15

I am attempting to get a percentage by dividing B by A. To achieve this I used the following:我试图通过将 B 除以 A 来获得百分比。为此,我使用了以下内容:

if_condition = df['A'] != 0
then = (1 - df['B'].div(df['A']))
else_= 0
df['New Col'] = np.where(if_condition, then, else_)

I expected the following result:我期待以下结果:

            A             B               C       New Col
Cat-1    798.26        456.65          187.56        .5720
Cat-2 165165.53      45450.00         4897.57        .2751 
Cat-3 488565.65      15198.56        15654.65        .0311
Cat-4      0.00      54256.35        49878.65        0
Cat-5   1156.61        789.05        89789.54        .6822
Cat-6      0.00       1644.78         6876.15        0

However, I got the following result:但是,我得到了以下结果:

            A             B               C        New Col
Cat-1    798.26        456.65          187.56        NaN
Cat-2 165165.53      45450.00         4897.57        0.2751 
Cat-3 488565.65      15198.56        15654.65        0.0311
Cat-4      0.00      54256.35        49878.65        0
Cat-5   1156.61        789.05        89789.54        NaN
Cat-6      0.00       1644.78         6876.15        0

I have tried some other solutions which involved the alignment of the two columns, however that did not alter the end result.我尝试了其他一些涉及两列对齐的解决方案,但这并没有改变最终结果。 What could potentially generate these NaN values?什么可能会产生这些 NaN 值?

import pandas as pd
import numpy as np
import io

df = pd.read_csv(io.StringIO("""            A             B               C
Cat-1    798.26        456.65          187.56
Cat-2     165165.53      45450.00         4897.57
Cat-3     488565.65      15198.56        15654.65
Cat-4      0.00      54256.35        49878.65
Cat-5   1156.61        789.05        89789.54
Cat-6      0.00       1644.78         6876.15"""), sep="\s\s+", engine="python")

df

# output
               A         B         C
Cat-1     798.26    456.65    187.56
Cat-2  165165.53  45450.00   4897.57
Cat-3  488565.65  15198.56  15654.65
Cat-4       0.00  54256.35  49878.65
Cat-5    1156.61    789.05  89789.54
Cat-6       0.00   1644.78   6876.15

if_condition = df['A'] != 0
then = (1 - df['B'].div(df['A']))
else_= 0
df['New Col'] = np.where(if_condition, then, else_)

# output
               A         B         C   New Col
Cat-1     798.26    456.65    187.56  0.427943
Cat-2  165165.53  45450.00   4897.57  0.724822
Cat-3  488565.65  15198.56  15654.65  0.968891
Cat-4       0.00  54256.35  49878.65  0.000000
Cat-5    1156.61    789.05  89789.54  0.317791
Cat-6       0.00   1644.78   6876.15  0.000000

Seems to be correct.似乎是正确的。 I use pandas version '1.2.5'我使用熊猫版本'1.2.5'

Also you could do this "if else" condition a bit easier:您也可以更轻松地执行此“if else”条件:

df["New col"] = df.apply(lambda x: 1 - x["B"] / x["A"] if x["A"] != 0 else 0, axis=1)

You don't need a condition, replace -np.inf by 0:您不需要条件,将-np.inf替换为 0:

# df['New Col'] = (1 - df['B'] / df['A']).replace(-np.inf, 0)
df['New Col'] = ((1 - df['B'] / df['A']) * 100).round(2).replace(-np.inf, 0)
print(df)

# Output:
               A         B         C  New Col
Cat-1     798.26    456.65    187.56    42.79
Cat-2  165165.53  45450.00   4897.57    72.48
Cat-3  488565.65  15198.56  15654.65    96.89
Cat-4       0.00  54256.35  49878.65     0.00
Cat-5    1156.61    789.05  89789.54    31.78
Cat-6       0.00   1644.78   6876.15     0.00

I was able to resolve this issue, by simply not diving by 0 and then replacing the NaN values with 0. It produced the anticipated result:我能够解决这个问题,简单地不跳 0,然后用 0 替换NaN值。它产生了预期的结果:

df['New Col'] = (1 - df['B']/df['A'][df['A'] != 0]).fillna(0)

I basically was able to divide everything but 0, and the remaining NaN values are a result of not dividing 0 and can thus be replaced by 0.我基本上能够除以 0 以外的所有内容,其余的 NaN 值是不除以 0 的结果,因此可以用 0 代替。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM