简体   繁体   English

从数据框中替换表中的 NaN 值而不更改其他值

[英]Replace NaN value in table from dataframe without changed other values

I created pivot table from dataframe used below code我从下面代码中使用的数据框创建了数据透视表

table = pd.pivot_table(df_table, values=['KPI Amount Convert to USD'], index=['Customer Nick', 'Customer',
                                                                             'Customer Name', 'BSO Name', 'BSO Comment',
                                                                            'Pay Date, Recovery action, No pay schedule reason '],
                       columns=['Range'], aggfunc={'KPI Amount Convert to USD': np.sum}, margins=True,
                       margins_name='Grand Total')

it works great, but some value are NaN (other are regular number)效果很好,但有些值为 NaN(其他值为常规数)

when I used当我使用

table = table.replace(np.nan, '', regex=True)

the NaN value are empty, BUT some value are changed to 3.0176e+06, even before it was 3017601.99. NaN 值为空,但有些值已更改为 3.0176e+06,甚至在 3017601.99 之前。

Do you have any idea how to replace NaN value, but preserve int format of other?你知道如何替换 NaN 值,但保留其他的 int 格式吗?

thanks for your advice谢谢你的建议

The problem here is the dtype (data type) of the column, or more exactly on the underlying numpy array.这里的问题是列的dtype (数据类型),或者更确切地说是底层 numpy 数组。 I assume that in your table dataframe, the column containing NaN values have a floating point type (float64).我假设在您的table数据框中,包含 NaN 值的列具有浮点类型 (float64)。

If you replaced NaN with 0., all would be fine, but if you want to write an empty string there, Pandas change the dtype to object .如果你用 0. 替换 NaN,一切都会好起来,但如果你想在那里写一个空字符串,Pandas 将dtype更改为object

BTW, 3.0176e+06 is just a different representation of 3017601.99 but I would assume that the value has not changed.顺便说一句,3.0176e+06 只是 3017601.99 的不同表示,但我认为该值没有改变。 Simply pandas uses different representation for np.float64 type columns and object type columns.简单地说,pandas 对np.float64类型的列和object类型的列使用不同的表示。

You can ask it to use the default str conversion for float values in object columns by setting the relevant option: pd.set_option('display.float_format', str)您可以通过设置相关选项来要求它对object列中的浮点值使用默认str转换: pd.set_option('display.float_format', str)

Demo:演示:

>>> pd.set_option('display.float_format', None)                # reset option
>>> df = pd.DataFrame([[3017601.99], [np.nan]], columns=['A'])
>>> df
            A
0  3017601.99
1         NaN
>>> df1 = df.fillna('')
>>> df1
            A
0  3.0176e+06
1            
>>> pd.set_option('display.float_format', str)      # set the option
>>> df1
           A
0 3017601.99
1           
>>> df.loc[0,'A'] == df1.loc[0,'A']
True

你试过 table = table.fillna('')

  table = table.fillna('-')

或者

  table = table.fillna(0)

It's an issue of formatting - basically, when a column in in a given type, the numbers are shown in a certain way.这是一个格式问题 - 基本上,当列在给定类型中时,数字以某种方式显示。

If your column has only floats (so numbers and np.nan fit in that), it will display things one way.如果您的列只有浮点数(因此数字和 np.nan 适合),它将以一种方式显示内容。

If your column has floats and strings (numbers and '') then the column dtype is set to "object" and it displays various things differently, such as large floats/ints如果您的列有浮点数和字符串(数字和 ''),则列 dtype 设置为“对象”,它以不同的方式显示各种内容,例如大浮点数/整数

This is why df.fillna(0) works (0 is also a float so dtype remains float) but df.fillna('') creates the same display change.这就是 df.fillna(0) 工作的原因(0 也是一个浮点数,所以 dtype 保持浮点数)但 df.fillna('') 创建相同的显示变化。

The actual value does not change, eg:实际值不会改变,例如:

df.loc[2,0]
> 3017601.990
df.fillna('').loc[2,0] == df.fillna(0).loc[2,0]
> True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM