简体   繁体   中英

fill new column of pandas DataFrame based on if-else of other columns

I have a situation where I want to create a new column in a Pandas DataFrame and populate it according to conditions involving 2 other columns. In this example:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.array([['value1','value2'],['value',np.NaN],[np.NaN,np.NaN]]), columns=['col1','col2'])

I would like to create a new column, 'new col', which consists of 1) the value in 'col2' if it is not NaN else, 2) the value in 'col1' if it is not NaN else, 3) NaN

I am trying this function with .apply() but it is not returning the desired result

def singleval(row):
    if row['col2'] != np.NaN:
        val = row['col2']
    elif row['col1'] != np.NaN:
        val = row['col1']
    else:
        val = np.NaN
    return val

df['new col'] = df.apply(singleval,axis=1)

i want the values in 'new col' to be ['value2', 'value', 'nan']

Method 1 fillna

In this case, we can simply use fillna on col2 with values from col1 :

df['new col'] = df['col2'].fillna(df['col1'])

     col1    col2 new col
0  value1  value2  value2
1   value     NaN   value
2     NaN     NaN     NaN

Method 2 np.select

If you have multiple conditions, use np.select which you pass a list of conditions and based on those conditions you pass it choices:

conditions = [
    df['col2'].notnull(),
    df['col1'].notnull(),
]

choices=[df['col2'], df['col1']]

df['new col'] = np.select(conditions, choices, default=np.NaN)

     col1    col2 new col
0  value1  value2  value2
1   value     NaN   value
2     NaN     NaN     NaN

Note

Your dataframe wasn't correct with the NaN , use this one instead to test:

df = pd.DataFrame({'col1':['value1', 'value', np.NaN],
                   'col2':['value2', np.NaN, np.NaN]})

Edit: why was the function not working?

np.NaN == np.NaN will return False
while np.NaN is np.NaN will return True .

See this question for the explanation of this.

So to fix your function you have to use is not :

def singleval(row):
    if row['col2'] is not np.NaN:
        val = row['col2']
    elif row['col1'] is not np.NaN:
        val = row['col1']
    else:
        val = np.NaN
    return val

df['new col'] = df.apply(singleval, axis=1)

     col1    col2 new col
0  value1  value2  value2
1   value     NaN   value
2     NaN     NaN     NaN

Try this:

df['col3'] = df[['col1','col2']].stack().groupby(level=0).last()

output:

    col1    col2    col3
0   value1  value2  value2
1   value   nan     value
2   nan     nan     nan

Use df.ffill on axis=1

df['new_col'] = df.ffill(1).col2

Out[1318]:
     col1    col2 new_col
0  value1  value2  value2
1   value     NaN   value
2     NaN     NaN     NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM