简体   繁体   中英

Fill missing value based on value from another column in the same row

I have a DataFrame looks like this

ColA | ColB | ColC | ColD |
-----|------|------|------|
100  |   A  |  X1  |  NaN |
200  |   B  |  X2  |  AAA |
300  |   C  |  X3  |  NaN |

I want to fill the missing value on ColD based on value on ColA . The result I need is like:

if value in ColA = 100 then value in ColD = "BBB"
if value in ColA = 300 then value in ColD = "CCC"

ColA | ColB | ColC | ColD |
-----|------|------|------|
100  |   A  |  X1  |  BBB |
200  |   B  |  X2  |  AAA |
300  |   C  |  X3  |  CCC |

You can use combine_first or fillna :

df.ColD = df.ColD.combine_first(df.ColA)
print (df)
   ColA ColB ColC ColD
0   100    A   X1  100
1   200    B   X2  AAA
2   300    C   X3  300

Or:

df.ColD = df.ColD.fillna(df.ColA)
print (df)
   ColA ColB ColC ColD
0   100    A   X1  100
1   200    B   X2  AAA
2   300    C   X3  300

EDIT: First use map for Series s and then combine_first or fillna by this Series :

d = {100: "BBB", 300:'CCC'}
s = df.ColA.map(d)
print (s)
0    BBB
1    NaN
2    CCC
Name: ColA, dtype: object

df.ColD = df.ColD.combine_first(s)
print (df)
   ColA ColB ColC ColD
0   100    A   X1  BBB
1   200    B   X2  AAA
2   300    C   X3  CCC

It replace only NaN :

print (df)
   ColA ColB ColC ColD
0   100    A   X1  EEE <- changed value to EEE
1   200    B   X2  AAA
2   300    C   X3  NaN

d = {100: "BBB", 300:'CCC'}
s = df.ColA.map(d)
df.ColD = df.ColD.combine_first(s)
print (df)
   ColA ColB ColC ColD
0   100    A   X1  EEE
1   200    B   X2  AAA
2   300    C   X3  CCC

Define a mapping function:

def my_map_func(x):
    return "BBB" if x==100 else "CCC"

Right now, df looks like:

ColA | ColB | ColC | ColD
-----|------|------|-----
100  |    A |   X1 |  NaN
200  |    B |   X2 |  AAA
300  |    C |   X3 |  NaN

Select the rows that have NaN, and fill it with mapped value obtained from column ColA

df.ix[df.ColD.isnull(), 'ColD'] = df.ix[df.ColD.isnull(), 'ColA'].apply(my_map_func)

Here, we are basically selecting only those rows for which ColD is NaN by indexing based on a boolean series and selecting the column, ColA we are interested in. In simple language, df.ix[selected_rows, selected_columns] .

Now, dataframe df looks like:

ColA | ColB | ColC | ColD
-----|------|------|-----
100  |    A |   X1 |  BBB
200  |    B |   X2 |  AAA
300  |    C |   X3 |  CCC

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM