I have a situation where I want to create a new column in a Pandas DataFrame and populate it according to conditions involving 2 other columns. In this example:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([['value1','value2'],['value',np.NaN],[np.NaN,np.NaN]]), columns=['col1','col2'])
I would like to create a new column, 'new col', which consists of 1) the value in 'col2' if it is not NaN else, 2) the value in 'col1' if it is not NaN else, 3) NaN
I am trying this function with .apply() but it is not returning the desired result
def singleval(row):
if row['col2'] != np.NaN:
val = row['col2']
elif row['col1'] != np.NaN:
val = row['col1']
else:
val = np.NaN
return val
df['new col'] = df.apply(singleval,axis=1)
i want the values in 'new col' to be ['value2', 'value', 'nan']
fillna
In this case, we can simply use fillna
on col2
with values from col1
:
df['new col'] = df['col2'].fillna(df['col1'])
col1 col2 new col
0 value1 value2 value2
1 value NaN value
2 NaN NaN NaN
np.select
If you have multiple conditions, use np.select
which you pass a list of conditions and based on those conditions you pass it choices:
conditions = [
df['col2'].notnull(),
df['col1'].notnull(),
]
choices=[df['col2'], df['col1']]
df['new col'] = np.select(conditions, choices, default=np.NaN)
col1 col2 new col
0 value1 value2 value2
1 value NaN value
2 NaN NaN NaN
Note
Your dataframe wasn't correct with the NaN
, use this one instead to test:
df = pd.DataFrame({'col1':['value1', 'value', np.NaN],
'col2':['value2', np.NaN, np.NaN]})
Edit: why was the function not working?
np.NaN == np.NaN
will return False
while np.NaN is np.NaN
will return True
.
See this question for the explanation of this.
So to fix your function you have to use is not
:
def singleval(row):
if row['col2'] is not np.NaN:
val = row['col2']
elif row['col1'] is not np.NaN:
val = row['col1']
else:
val = np.NaN
return val
df['new col'] = df.apply(singleval, axis=1)
col1 col2 new col
0 value1 value2 value2
1 value NaN value
2 NaN NaN NaN
Try this:
df['col3'] = df[['col1','col2']].stack().groupby(level=0).last()
output:
col1 col2 col3
0 value1 value2 value2
1 value nan value
2 nan nan nan
Use df.ffill
on axis=1
df['new_col'] = df.ffill(1).col2
Out[1318]:
col1 col2 new_col
0 value1 value2 value2
1 value NaN value
2 NaN NaN NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.