I'm doing a university job with a pandas dataframe, as below:
import numpy as np
import pandas as pd
df = pd.DataFrame({'col1': [np.NaN, 11.2, 10, np.NaN, 1000],
'sensor_name': ['water', 'strain', 'fog', 'water', 'fog'],
'col2': [np.NaN, np.NaN, 30, 40, np.NaN],
'col3': [np.NaN, 2, 30, 44, np.NaN]
})
print(df)
col1 sensor_name col2 col3
NaN water NaN NaN
11.2 strain NaN 2.0
10.0 fog 30.0 30.0
NaN water 40.0 44.0
1000.0 fog NaN NaN
I would like to join columns 1, 2 and 3, to avoid the NaN values. The priority would be the value in col1, if any. Then the priority would be 'col2' and finally 'col3'.
I tried to do the following code:
df['new_column'] = df.ffill(axis=1)['col3']
The output is:
col1 sensor_name col2 col3 new_column
NaN water NaN NaN water
11.2 strain NaN 2.0 2.0
10.0 fog 30.0 30.0 30.0
NaN water 40.0 44.0 44.0
NaN fog NaN NaN fog
However, the desired output is:
col1 sensor_name col2 col3 new_column
NaN water NaN NaN NaN
11.2 strain NaN 2.0 11.2
10.0 fog 30.0 30.0 10.0
NaN water 40.0 44.0 40.0
1000.0 fog NaN NaN 1000.0
One of the many ways you can achieve is using pandas.apply
function.
import numpy as np
import pandas as pd
df = pd.DataFrame({'col1': [np.NaN, 11.2, 10, np.NaN, 1000],
'sensor_name': ['water', 'strain', 'fog', 'water', 'fog'],
'col2': [np.NaN, np.NaN, 30, 40, np.NaN],
'col3': [np.NaN, 2, 30, 44, np.NaN]
})
print(df)
def apply_func(row):
if not pd.isna(row['col1']):
return row['col1']
elif not pd.isna(row['col2']):
return row['col2']
return row["col3"]
df["new_cols"]=df.apply(apply_func,axis=1)
print(df)
Output:
col1 sensor_name col2 col3 new_cols
0 NaN water NaN NaN NaN
1 11.2 strain NaN 2.0 11.2
2 10.0 fog 30.0 30.0 10.0
3 NaN water 40.0 44.0 40.0
4 1000.0 fog NaN NaN 1000.0
Filter the column
df['new'] = df.filter(like='col').bfill(1)['col1']
df
Out[324]:
col1 sensor_name col2 col3 new
0 NaN water NaN NaN NaN
1 11.2 strain NaN 2.0 11.2
2 10.0 fog 30.0 30.0 10.0
3 NaN water 40.0 44.0 40.0
4 1000.0 fog NaN NaN 1000.0
try this:
f = df.filter(regex='col\d')
res = df.assign(new_column=f.where(f.notnull().cumsum(axis=1).eq(1)).max(axis=1))
print(res)
>>>
col1 sensor_name col2 col3 new_column
0 NaN water NaN NaN NaN
1 11.2 strain NaN 2.0 11.2
2 10.0 fog 30.0 30.0 10.0
3 NaN water 40.0 44.0 40.0
4 1000.0 fog NaN NaN 1000.0
df['new_column'] = df['col1']
index = df[df['new_column'].isna()].index
df.iloc[index,4]= df.iloc[index,2]
index = df[df['new_column'].isna()].index
df.iloc[index,4]= df.iloc[index,3]
You could create a for loop if you had more columns to consider and/or wanted it to be more automated.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.