Pandas DataFrame - 根据其他列的值填充列的 NaN

Question

I have a wide data frame with several years:我有几年的广泛数据框：

df = pd.DataFrame(index=pd.Index([29925, 223725, 280165, 813285, 956765], name='ID'),
                  columns=pd.Index([1991, 1992, 1993, 1994, 1995, 1996, '2010-2012'], name='Year'),
                  data = np.array([[np.NaN, np.NaN, 16, 17, 18, 19, np.NaN],
                                   [16, 17, 18, 19, 20, 21, np.NaN],
                                   [np.NaN, np.NaN, np.NaN, np.NaN, 16, 17, 31],
                                   [np.NaN, 22, 23, 24, np.NaN, 26, np.NaN],
                                   [36, 36, 37, 38, 39, 40, 55]]))

Year     1991  1992  1993  1994  1995  1996  2010-2012
ID                                                    
29925     NaN   NaN  16.0  17.0  18.0  19.0        NaN
223725   16.0  17.0  18.0  19.0  20.0  21.0        NaN
280165    NaN   NaN   NaN   NaN  16.0  17.0       31.0
813285    NaN  22.0  23.0  24.0   NaN  26.0        NaN
956765   36.0  36.0  37.0  38.0  39.0  40.0       55.0

The values in each row are the age of each person, with each holding a unique ID.每行中的值是每个人的年龄，每个人都有一个唯一的 ID。 I want to fill the NaN of this data frame in each year of every row, based on the existing age values in each row.我想根据每行中的现有年龄值在每一行的每一年中填充此数据框的NaN 。

For example, ID 29925 is 16 in 1993 , we know they are 15 in 1992 and 14 in 1991 , therefore we want to replace the NaN for 29925 in the columns 1992 and 1991 .例如，ID 29925在1993中是 16 ，我们知道它们在1992是 15 ，在1991是 14 ，因此我们想在1992和1991列中将NaN替换为29925 。 Similarly, I want to replace the NaN in the column 2010-2012 based on the existing age values for 29925 .同样，我想根据29925的现有年龄值替换2010-2012列中的NaN 。 Let's assume that 29925 is 15 years older from 1996 in the 2010-2012 column.假设29925在2010-2012列中比1996年大 15 年。 What is the fastest way to do this for the whole data frame - ie for all IDs?对整个数据框（即所有 ID）执行此操作的最快方法是什么？

Answer 1

# imports we need later
import numpy as np
import pandas as pd

This is a not a particularly efficient method but it works.这不是一种特别有效的方法，但它确实有效。 I'll leave out your last column, to make things more systematic.我将省略你的最后一个专栏，以使事情更系统化。

The df : df ：

df = pd.DataFrame(index=pd.Index([29925, 223725, 280165, 813285, 956765], name='ID'),
                  columns=pd.Index([1992, 1992, 1993, 1994, 1995, 1996], name='Year'),
                  data = np.array([[np.NaN, np.NaN, 16, 17, 18, 19],
                                   [16, 17, 18, 19, 20, 21],
                                   [np.NaN, np.NaN, np.NaN, np.NaN, 16, 17],
                                   [np.NaN, 22, 23, 24, np.NaN, 26],
                                   [35, 36, 37, 38, 39, 40]]))

Calculate date of birth for everyone:计算每个人的出生日期：

dob=[]
for irow, row in enumerate(df.iterrows()):
    dob.append(np.asarray([int(each) for each in df.columns]) - np.asarray(df.iloc[irow,:]))

or , if you are into list comprehensions :或者，如果您喜欢列表推导：

dob = [np.asarray([int(each) for each in df.columns]) - np.asarray(df.iloc[irow,:]) for irow, row in enumerate(df.iterrows())]

Now dob is like this:现在dob是这样的：

[array([  nan,   nan, 1977., 1977., 1977., 1977.]),
 array([1976., 1975., 1975., 1975., 1975., 1975.]),
 array([  nan,   nan,   nan,   nan, 1979., 1979.]),
 array([  nan, 1970., 1970., 1970.,   nan, 1970.]),
 array([1956., 1956., 1956., 1956., 1956., 1956.])]

Make a simpler dob list using np.unique , remove nans :使用np.unique制作一个更简单的 dob 列表，删除nans ：

dob_filtered=[np.unique(each[~np.isnan(each)])[0] for each in dob]

dob_filtered now looks like this: dob_filtered现在看起来像这样：

[1977.0, 1975.0, 1979.0, 1970.0, 1956.0]

Attach this list to dataframe:将此列表附加到 dataframe：

df['dob']=dob_filtered

Fill in the NaN s of the df using the dob column:使用dob列填写df的NaN ：

for irow, row in enumerate(df.index):
    for icol, col in enumerate(df.columns[:-2]):
        df.loc[row,col] = col - df['dob'][row]

Delete the dob column (just to obtain the original columns only, otherwise not important): 删除dob列（只是为了获取原始列而已，否则不重要）：

df.drop(['dob'],axis=1)

Obtaining:获得：

Year    1992    1992    1993    1994    1995    1996
ID                      
29925   15.0    15.0    16.0    17.0    18.0    19.0
223725  17.0    17.0    18.0    19.0    20.0    21.0
280165  13.0    13.0    14.0    15.0    16.0    17.0
813285  22.0    22.0    23.0    24.0    25.0    26.0
956765  36.0    36.0    37.0    38.0    39.0    40.0

ie IE

Pandas DataFrame - 根据其他列的值填充列的 NaN

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-07-24 18:54:06

Pandas DataFrame - 根据其他列的值填充列的 NaN

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-07-24 18:54:06

解决方案1
2 已采纳 2020-07-24 18:54:06