简体   繁体   English

如何使用 pandas 根据列模式填充缺失值?

[英]How to fill missing values based on column patterns using pandas?

I have a data frame like as shown below我有一个如下所示的数据框

import pandas as pd
import numpy as np
df = pd.DataFrame({'source_value':['Male','Female',np.nan,np.nan,np.nan,'M'],
                       'new_id':[1,2,3,4,5,6],
                       'month_of_birth':[11,12,1,3,5,6],
                       'day_of_birth':[11,21,23,26,10,12],
                       'year_of_birth':[1967,1987,1956,1999,2005,1987],
                       'datetime_off':['11/11/1967','21/12/1987','23/01/1956','26/03/1999','10/05/2005','12/06/1987'],
'test_id':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]})

I would like to fill missing values in the column with keywords id , value and datetime .我想用关键字idvaluedatetime填充列中的缺失值。

I tried the below based on startswith , endswith and contains我尝试了以下基于startswithendswithcontains

col = df.columns.str
c1 = col.endswith('id')
c2 = col.contains('value')
c3 = col.contains('datetime')
missing_value_filled = np.select([c1,c2,c3],[df.fillna(0),df.fillna(np.nan),df.fillna("01/01/2000 00:00:00")])
pd.DataFrame(missing_value_filled, columns=df.columns)

But the problem is it makes month_of_birth , day_of_birth and year_of_birth as zeroes even though they don't match my pattern mentioned above.但问题是它使month_of_birthday_of_birthyear_of_birth zeroes ,即使它们与我上面提到的模式不匹配。 May I know why does this happen?我可以知道为什么会这样吗?

How can I retain the original values of month , day and year of birth columns?如何保留monthday和出生year列的原始值?

I get an output like below which is incorrect我得到一个 output 如下所示,这是不正确的

在此处输入图像描述

My expected output is given below下面给出了我预期的 output

在此处输入图像描述

Let us redefine the fillna function that takes the arguments as input df , column masks ( col_masks ) along with the corresponding fill values ( fill_values ):让我们重新定义fillna function ,它将 arguments 作为输入df ,列掩码 ( col_masks ) 以及相应的填充值 ( fill_values ):

def fillna(df, col_masks, fill_values):
    df = df.copy()
    for m, v in zip(col_masks, fill_values):
        df.loc[:, m] = df.loc[:, m].fillna(v)
    return df

>>> fillna(df, [c1, c2, c3], [0, np.nan, '01/01/2000 00:00:00'])

  source_value  new_id  month_of_birth  day_of_birth  year_of_birth datetime_off  test_id
0         Male       1              11            11           1967   11/11/1967      0.0
1       Female       2              12            21           1987   21/12/1987      0.0
2          NaN       3               1            23           1956   23/01/1956      0.0
3          NaN       4               3            26           1999   26/03/1999      0.0
4          NaN       5               5            10           2005   10/05/2005      0.0
5            M       6               6            12           1987   12/06/1987      0.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM