根据另一列的运算符填充 NaN 值

Question

I have a database (pd.DataFrame) like this:我有一个像这样的数据库（pd.DataFrame）：

    condition     odometer
0    new           NaN
1    bad           1100
2    excellent     110
3    NaN           200
4    NaN           2000
5    new           20
6    bad           NaN

And I want to fill the NaN of "condition" based on the values of "odometer":我想根据“里程表”的值填充“条件”的 NaN：

new: odometer >0 and <= 100 
excellent: odometer >100 and <= 1000
bad: odometer >1000

I tried to do this but it is not working:我试图这样做，但它不起作用：

for i in range(len(database)): 
   if math.isnan(database['condition'][i]) == True:
      odometer = database['odometer'][i] 
      if   odometer > 0 & odometer <= 100:       value = 'new'
      elif odometer > 100 & odometer <= 1000:    value = 'excellent'
      elif odometer > 1000:                      value = 'bad'
      database['condition'][i] = value

Tried also making the first "if" condition:还尝试制作第一个“if”条件：

database['condition'][i] == np.nan

But it doesn't work as well.但它也不起作用。

Answer 1

You can use DataFrame.apply() to generate a new condition column with your function, and replace it afterwards.您可以使用 DataFrame.apply() 为您的 function 生成一个新的条件列，然后替换它。 Not sure what types your columns are.不确定您的列是什么类型。 df['condition'].dtype will tell you. df['condition'].dtype 会告诉你。 It looks like condition could either be string or object, which could create a bug in your logic.看起来条件可以是字符串或 object，这可能会在您的逻辑中产生错误。 If it's a string column, you'll need to do a direct comparison == 'NaN'.如果它是字符串列，则需要进行直接比较 == 'NaN'。 If it's an object, you can use np.nan or math.nan.如果是 object，可以使用 np.nan 或 math.nan。 I included a sample database for each case below.我在下面为每个案例提供了一个示例数据库。 You also might want to test the type of your odometer column.您可能还想测试里程表列的类型。

import numpy as np
import pandas as pd

# condition column as string
df = pd.DataFrame({'condition':['new','bad','excellent','NaN','NaN','new','bad'], 'odometer':np.array([np.nan, 1100, 110, 200, 2000, 20, np.nan], dtype=object)})
# condition column as object
# df = pd.DataFrame({'condition':np.array(['new','bad','excellent',np.nan,np.nan,'new','bad'], dtype=object), 'odometer':np.array([np.nan, 1100, 110, 200, 2000, 20, np.nan], dtype=object)})
def f(database):
    if database['condition'] == 'NaN':
    #if np.isnan(database['condition']):
        odometer = database['odometer'] 
        if   odometer > 0 & odometer <= 100:       value = 'new'
        elif odometer > 100 & odometer <= 1000:    value = 'excellent'
        elif odometer > 1000:                      value = 'bad'
        return value
    return database['condition']

df['condition'] = df.apply(f, axis=1)

Answer 2

I have a nice one liner solution for you:我有一个很好的单线解决方案给你：

Lets create a sample dataframe:让我们创建一个示例 dataframe：

import pandas as pd

df = pd.DataFrame({'condition':['new','bad',None,None,None], 'odometer':[None,1100,50,500,2000]})
df
Out:    
  condition odometer
0   new     NaN
1   bad     1100.0
2   None    50.0
3   None    500.0
4   None    2000.0

Solution:解决方案：

df.condition = df.condition.fillna(df.odometer.apply(lambda number: 'new' if number in range(101) else 'excellent' if number in range(101,1000) else 'bad'))
df
Out:    
  condition  odometer
0       new        NaN
1       bad     1100.0
2       new       50.0
3 excellent      500.0
4       bad     2000.0

根据另一列的运算符填充 NaN 值

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-10-07 23:47:22

解决方案2
1 2020-10-08 03:46:45

根据另一列的运算符填充 NaN 值

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-10-07 23:47:22

解决方案2 1 2020-10-08 03:46:45

解决方案1
1 已采纳 2020-10-07 23:47:22

解决方案2
1 2020-10-08 03:46:45