简体   繁体   English

如何在 Pandas 数据框的列中用零替换 NaN 值?

[英]How to replace NaN values by Zeroes in a column of a Pandas Dataframe?

I have a Pandas Dataframe as below:我有一个熊猫数据框如下:

      itm Date                  Amount 
67    420 2012-09-30 00:00:00   65211
68    421 2012-09-09 00:00:00   29424
69    421 2012-09-16 00:00:00   29877
70    421 2012-09-23 00:00:00   30990
71    421 2012-09-30 00:00:00   61303
72    485 2012-09-09 00:00:00   71781
73    485 2012-09-16 00:00:00     NaN
74    485 2012-09-23 00:00:00   11072
75    485 2012-09-30 00:00:00  113702
76    489 2012-09-09 00:00:00   64731
77    489 2012-09-16 00:00:00     NaN

When I try to apply a function to the Amount column, I get the following error:当我尝试将函数应用于 Amount 列时,出现以下错误:

ValueError: cannot convert float NaN to integer

I have tried applying a function using .isnan from the Math Module I have tried the pandas .replace attribute I tried the .sparse data attribute from pandas 0.9 I have also tried if NaN == NaN statement in a function.我尝试使用数学模块中的 .isnan 应用函数 我尝试过 pandas .replace 属性 我尝试过 pandas 0.9 中的 .sparse 数据属性 我也尝试过 if NaN == NaN 函数中的语句。 I have also looked at this article How do I replace NA values with zeros in an R dataframe?我还看过这篇文章如何在 R 数据框中用零替换 NA 值? whilst looking at some other articles.在看其他一些文章的时候。 All the methods I have tried have not worked or do not recognise NaN.我尝试过的所有方法都不起作用或无法识别 NaN。 Any Hints or solutions would be appreciated.任何提示或解决方案将不胜感激。

I believe DataFrame.fillna() will do this for you.我相信DataFrame.fillna()会为您做到这一点。

Link to Docs for a dataframe and for a Series .链接到 Docs 以获取dataframeSeries

Example:例子:

In [7]: df
Out[7]: 
          0         1
0       NaN       NaN
1 -0.494375  0.570994
2       NaN       NaN
3  1.876360 -0.229738
4       NaN       NaN

In [8]: df.fillna(0)
Out[8]: 
          0         1
0  0.000000  0.000000
1 -0.494375  0.570994
2  0.000000  0.000000
3  1.876360 -0.229738
4  0.000000  0.000000

To fill the NaNs in only one column, select just that column.要仅将 NaN 填充在一列中,请仅选择该列。 in this case I'm using inplace=True to actually change the contents of df.在这种情况下,我使用 inplace=True 来实际更改 df 的内容。

In [12]: df[1].fillna(0, inplace=True)
Out[12]: 
0    0.000000
1    0.570994
2    0.000000
3   -0.229738
4    0.000000
Name: 1

In [13]: df
Out[13]: 
          0         1
0       NaN  0.000000
1 -0.494375  0.570994
2       NaN  0.000000
3  1.876360 -0.229738
4       NaN  0.000000

EDIT:编辑:

To avoid a SettingWithCopyWarning , use the built in column-specific functionality:要避免SettingWithCopyWarning ,请使用内置的特定于列的功能:

df.fillna({1:0}, inplace=True)

It is not guaranteed that the slicing returns a view or a copy.不能保证切片返回视图或副本。 You can do你可以做

df['column'] = df['column'].fillna(value)

You could use replace to change NaN to 0 :您可以使用replaceNaN更改为0

import pandas as pd
import numpy as np

# for column
df['column'] = df['column'].replace(np.nan, 0)

# for whole dataframe
df = df.replace(np.nan, 0)

# inplace
df.replace(np.nan, 0, inplace=True)

The below code worked for me.下面的代码对我有用。

import pandas

df = pandas.read_csv('somefile.txt')

df = df.fillna(0)

I just wanted to provide a bit of an update/special case since it looks like people still come here.我只是想提供一些更新/特殊情况,因为看起来人们仍然来到这里。 If you're using a multi-index or otherwise using an index-slicer the inplace=True option may not be enough to update the slice you've chosen.如果您使用多索引或以其他方式使用索引切片器,则 inplace=True 选项可能不足以更新您选择的切片。 For example in a 2x2 level multi-index this will not change any values (as of pandas 0.15):例如,在 2x2 级别的多索引中,这不会更改任何值(从 pandas 0.15 开始):

idx = pd.IndexSlice
df.loc[idx[:,mask_1],idx[mask_2,:]].fillna(value=0,inplace=True)

The "problem" is that the chaining breaks the fillna ability to update the original dataframe. “问题”是链接破坏了 fillna 更新原始数据帧的能力。 I put "problem" in quotes because there are good reasons for the design decisions that led to not interpreting through these chains in certain situations.我将“问题”放在引号中,因为设计决策有充分的理由导致在某些情况下不通过这些链进行解释。 Also, this is a complex example (though I really ran into it), but the same may apply to fewer levels of indexes depending on how you slice.此外,这是一个复杂的示例(尽管我确实遇到过),但根据您的切片方式,这可能适用于较少级别的索引。

The solution is DataFrame.update:解决方案是 DataFrame.update:

df.update(df.loc[idx[:,mask_1],idx[[mask_2],:]].fillna(value=0))

It's one line, reads reasonably well (sort of) and eliminates any unnecessary messing with intermediate variables or loops while allowing you to apply fillna to any multi-level slice you like!它是一行,读起来相当好(有点),并消除了中间变量或循环的任何不必要的混乱,同时允许您将 fillna 应用于您喜欢的任何多级切片!

If anybody can find places this doesn't work please post in the comments, I've been messing with it and looking at the source and it seems to solve at least my multi-index slice problems.如果有人能找到这不起作用的地方,请在评论中发布,我一直在搞乱它并查看源代码,它似乎至少解决了我的多索引切片问题。

You can also use dictionaries to fill NaN values of the specific columns in the DataFrame rather to fill all the DF with some oneValue.您还可以使用字典来填充 DataFrame 中特定列的 NaN 值,而不是用一些 oneValue 填充所有 DF。

import pandas as pd

df = pd.read_excel('example.xlsx')
df.fillna( {
        'column1': 'Write your values here',
        'column2': 'Write your values here',
        'column3': 'Write your values here',
        'column4': 'Write your values here',
        .
        .
        .
        'column-n': 'Write your values here'} , inplace=True)

Easy way to fill the missing values:-填充缺失值的简单方法:-

filling string columns: when string columns have missing values and NaN values.填充字符串列:当字符串列有缺失值和 NaN 值时。

df['string column name'].fillna(df['string column name'].mode().values[0], inplace = True)

filling numeric columns: when the numeric columns have missing values and NaN values.填充数字列:当数字列有缺失值和 NaN 值时。

df['numeric column name'].fillna(df['numeric column name'].mean(), inplace = True)

filling NaN with zero:用零填充 NaN:

df['column name'].fillna(0, inplace = True)

To replace na values in pandas替换 pandas 中的 na 值

df['column_name'].fillna(value_to_be_replaced,inplace=True)

if inplace = False , instead of updating the df (dataframe) it will return the modified values.如果inplace = False ,它将返回修改后的值,而不是更新 df (数据框)。

在此处输入图像描述

Considering the particular column Amount in the above table is of integer type.考虑到上表中的特定列Amount是整数类型。 The following would be a solution :以下将是一个解决方案:

df['Amount'] = df.Amount.fillna(0).astype(int)

Similarly, you can fill it with various data types like float , str and so on.同样,您可以使用各种数据类型填充它,例如floatstr等。

In particular, I would consider datatype to compare various values of the same column.特别是,我会考虑数据类型来比较同一列的各种值。

To replace nan in different columns with different ways:用不同的方式替换不同列中的 nan:

   replacement= {'column_A': 0, 'column_B': -999, 'column_C': -99999}
   df.fillna(value=replacement)

将所有 nan 替换为 0

df = df.fillna(0)

There have been many contributions already, but since I'm new here, I will still give input.已经有很多贡献了,但由于我是新来的,我仍然会提供意见。

There are two approaches to replace NaN values with zeros in Pandas DataFrame:在 Pandas DataFrame 中有两种方法可以用零替换NaN值:

  1. fillna(): function fills NA/NaN values using the specified method. fillna():函数使用指定的方法填充 NA/NaN 值。
  2. replace(): df.replace()a simple method used to replace a string, regex, list, dictionary replace(): df.replace() 一个简单的方法,用于替换字符串、正则表达式、列表、字典

Example:例子:

#NaN with zero on all columns
df2 = df.fillna(0)


#Using the inplace=True keyword in a pandas method changes the default behaviour.
    df.fillna(0, inplace = True)

# multiple columns appraoch
df[["Student", "ID"]] = df[["Student", "ID"]].fillna(0)

finally the replace() method :最后是 replace() 方法:

df["Student"] = df["Student"].replace(np.nan, 0)

This works for me, but no one's mentioned it.这对我有用,但没有人提到它。 could there be something wrong with it?会不会有什么问题?

df.loc[df['column_name'].isnull(), 'column_name'] = 0

If you were to convert it to a pandas dataframe, you can also accomplish this by using fillna .如果要将其转换为 pandas 数据框,也可以使用fillna来完成。

import numpy as np
df=np.array([[1,2,3, np.nan]])

import pandas as pd
df=pd.DataFrame(df)
df.fillna(0)

This will return the following:这将返回以下内容:

     0    1    2   3
0  1.0  2.0  3.0 NaN
>>> df.fillna(0)
     0    1    2    3
0  1.0  2.0  3.0  0.0

There are two options available primarily;主要有两种选择; in case of imputation or filling of missing values NaN / np.nan with only numerical replacements (across column(s):如果仅用数字替换(跨列)填充或填充缺失值NaN / np.nan

df['Amount'].fillna(value=None, method= ,axis=1,) is sufficient: df['Amount'].fillna(value=None, method= ,axis=1,)就足够了:

From the Documentation:从文档:

value : scalar, dict, Series, or DataFrame Value to use to fill holes (eg 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). value : 标量、dict、Series 或 DataFrame 用于填充孔的值(例如 0),或者是一个 dict/Series/DataFrame 值,指定用于每个索引(对于 Series)或列(对于 DataFrame)的值. (values not in the dict/Series/DataFrame will not be filled). (不在 dict/Series/DataFrame 中的值将不会被填充)。 This value cannot be a list.此值不能是列表。

Which means 'strings' or 'constants' are no longer permissable to be imputed.这意味着不再允许估算“字符串”或“常量”。

For more specialized imputations use SimpleImputer() :对于更专业的估算,请使用SimpleImputer()

from sklearn.impute import SimpleImputer
si = SimpleImputer(strategy='constant', missing_values=np.nan, fill_value='Replacement_Value')
df[['Col-1', 'Col-2']] = si.fit_transform(X=df[['C-1', 'C-2']])

If you want to fill NaN for a specific column you can use loc:如果要为特定列填充 NaN,可以使用 loc:

d1 = {"Col1" : ['A', 'B', 'C'],
     "fruits": ['Avocado', 'Banana', 'NaN']}
d1= pd.DataFrame(d1)

output:

Col1    fruits
0   A   Avocado
1   B   Banana
2   C   NaN


d1.loc[ d1.Col1=='C', 'fruits' ] =  'Carrot'


output:

Col1    fruits
0   A   Avocado
1   B   Banana
2   C   Carrot

I think it's also worth mention and explain the parameters configuration of fillna() like Method, Axis, Limit, etc.我觉得也值得一提,解释一下fillna()的参数配置,比如Method、Axis、Limit等。

From the documentation we have:从我们拥有的文档中:

Series.fillna(value=None, method=None, axis=None, 
                 inplace=False, limit=None, downcast=None)
Fill NA/NaN values using the specified method.

Parameters参数

value [scalar, dict, Series, or DataFrame] Value to use to 
 fill holes (e.g. 0), alternately a dict/Series/DataFrame 
 of values specifying which value to use for each index 
 (for a Series) or column (for a DataFrame). Values not in 
 the dict/Series/DataFrame will not be filled. This 
 value cannot be a list.

method [{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, 
 default None] Method to use for filling holes in 
 reindexed Series pad / ffill: propagate last valid 
 observation forward to next valid backfill / bfill: 
 use next valid observation to fill gap axis 
 [{0 or ‘index’}] Axis along which to fill missing values.

inplace [bool, default False] If True, fill 
 in-place. Note: this will modify any other views
 on this object (e.g., a no-copy slice for a 
 column in a DataFrame).

limit [int,defaultNone] If method is specified, 
 this is the maximum number of consecutive NaN 
 values to forward/backward fill. In other words, 
 if there is a gap with more than this number of 
 consecutive NaNs, it will only be partially filled. 
 If method is not specified, this is the maximum 
 number of entries along the entire axis where NaNs
 will be filled. Must be greater than 0 if not None.

downcast [dict, default is None] A dict of item->dtype 
 of what to downcast if possible, or the string ‘infer’ 
 which will try to downcast to an appropriate equal 
 type (e.g. float64 to int64 if possible).

Ok.好的。 Let's start with the method= Parameter this have forward fill (ffill) and backward fill(bfill) ffill is doing copying forward the previous non missing value.让我们从method=参数开始,它有前向填充(ffill)和后向填充(bfill) ffill 正在向前复制前一个非缺失值。

eg :例如:

import pandas as pd
import numpy as np
inp = [{'c1':10, 'c2':np.nan, 'c3':200}, {'c1':np.nan,'c2':110, 'c3':210}, {'c1':12,'c2':np.nan, 'c3':220},{'c1':12,'c2':130, 'c3':np.nan},{'c1':12,'c2':np.nan, 'c3':240}]
df = pd.DataFrame(inp)

  c1       c2      c3
0   10.0     NaN      200.0
1   NaN   110.0 210.0
2   12.0     NaN      220.0
3   12.0     130.0 NaN
4   12.0     NaN      240.0

Forward fill:前向填充:

df.fillna(method="ffill")

    c1     c2      c3
0   10.0      NaN 200.0
1   10.0    110.0   210.0
2   12.0    110.0   220.0
3   12.0    130.0   220.0
4   12.0    130.0   240.0

Backward fill:向后填充:

df.fillna(method="bfill")

    c1      c2     c3
0   10.0    110.0   200.0
1   12.0    110.0   210.0
2   12.0    130.0   220.0
3   12.0    130.0   240.0
4   12.0      NaN   240.0

The Axis Parameter help us to choose the direction of the fill: Axis Parameter 帮助我们选择填充的方向:

Fill directions:填写方向:

ffill:填充:

Axis = 1 
Method = 'ffill'
----------->
  direction 

df.fillna(method="ffill", axis=1)

       c1   c2      c3
0   10.0     10.0   200.0
1    NaN    110.0   210.0
2   12.0     12.0   220.0
3   12.0    130.0   130.0
4   12.0    12.0    240.0

Axis = 0 # by default 
Method = 'ffill'
|
|       # direction 
|
V
e.g: # This is the ffill default
df.fillna(method="ffill", axis=0)

    c1     c2      c3
0   10.0      NaN   200.0
1   10.0    110.0   210.0
2   12.0    110.0   220.0
3   12.0    130.0   220.0
4   12.0    130.0   240.0

bfill:填充:

axis= 0
method = 'bfill'
^
|
|
|
df.fillna(method="bfill", axis=0)

    c1     c2      c3
0   10.0    110.0   200.0
1   12.0    110.0   210.0
2   12.0    130.0   220.0
3   12.0    130.0   240.0
4   12.0      NaN   240.0

axis = 1
method = 'bfill'
<-----------
df.fillna(method="bfill", axis=1)
        c1     c2       c3
0    10.0   200.0   200.0
1   110.0   110.0   210.0
2    12.0   220.0   220.0
3    12.0   130.0     NaN
4    12.0   240.0   240.0

# alias:
#  'fill' == 'pad' 
#   bfill == backfill

limit parameter:限制参数:

df
    c1     c2      c3
0   10.0      NaN   200.0
1    NaN    110.0   210.0
2   12.0      NaN   220.0
3   12.0    130.0     NaN
4   12.0      NaN   240.0

Only replace the first NaN element across columns:仅替换跨列的第一个 NaN 元素:

df.fillna(value = 'Unavailable', limit=1)
            c1           c2          c3
0          10.0 Unavailable       200.0
1   Unavailable       110.0       210.0
2          12.0         NaN       220.0
3          12.0       130.0 Unavailable
4          12.0         NaN       240.0

df.fillna(value = 'Unavailable', limit=2)

           c1            c2          c3
0          10.0 Unavailable       200.0
1   Unavailable       110.0       210.0
2          12.0 Unavailable       220.0
3          12.0       130.0 Unavailable
4          12.0         NaN       240.0

downcast parameter:向下转换参数:

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   c1      4 non-null      float64
 1   c2      2 non-null      float64
 2   c3      4 non-null      float64
dtypes: float64(3)
memory usage: 248.0 bytes

df.fillna(method="ffill",downcast='infer').info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   c1      5 non-null      int64  
 1   c2      4 non-null      float64
 2   c3      5 non-null      int64  
dtypes: float64(1), int64(2)
memory usage: 248.0 bytes

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 熊猫:如果数据框列为“ NaN”,则替换该列 - Pandas: Replace dataframe column if it is `NaN` 熊猫数据框如何用列表值替换列中的无值或删除无值或 pd.np.nan - pandas dataframe how to replace None values in a column with list values or remove none values or pd.np.nan pandas Dataframe基于键列,将NaN值替换为以前的值 - pandas Dataframe Replace NaN values with with previous value based on a key column pandas DataFrame:将nan值替换为对应列的中值 - pandas DataFrame: replace nan values with median of corresponding column 根据 pandas dataframe 中的相邻列将 NaN 值替换为特定文本 - Replace NaN values with specific text based on adjacent column in pandas dataframe 如何在 Pandas 数据框中用 NaN 选择和替换特定值。 如何从每个级别 1 多索引中删除一列 - How to select, and replace specific values with NaN in pandas dataframe. How to remove a column from each level 1 multiindex 熊猫数据框NaN值替换为任何值 - Pandas Dataframe NaN values replace by no values Python pandas 用模式(同一列 -A)相对于 Pandas 数据帧中的另一列替换一列(A)的 NaN 值 - Python pandas replace NaN values of one column(A) by mode (of same column -A) with respect to another column in pandas dataframe 如何在 pandas DataFrame 中用前向填充和递减率替换 NaN 值? - How to replace NaN values with forward fill and a decreasing rate in pandas DataFrame? Pandas:如何根据另一列替换列中的 Nan 值? - Pandas: How to replace values of Nan in column based on another column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM