I have a data frame that looks like this:
boat_type boat_type_2
Not Known Not Known
Not Known kayak
ship Not Known
Not Known Not Known
ship Not Known
And I want to create a third columns boat_type_final
that should look like this:
boat_type boat_type_2 boat_type_final
Not Known Not Known cruise
Not Known kayak kayak
ship Not Known ship
Not Known Not Known cruise
ship Not Known ship
So basically if 'Not Known' is present in both boat_type
and boat_type_2
, then the value should be 'cruise'. But if there is a string other than 'Not Known' in the first two columns, then boat_type_final
should be filled with that string, either 'kayak' or 'ship'.
What's the most elegant way to do this? I've seen several options such as where
, creating a function, and/or logic, and I'd like to know what a true pythonista would do.
Here's my code so far:
import pandas as pd
import numpy as np
data = [{'boat_type': 'Not Known', 'boat_type_2': 'Not Known'},
{'boat_type': 'Not Known', 'boat_type_2': 'kayak'},
{'boat_type': 'ship', 'boat_type_2': 'Not Known'},
{'boat_type': 'Not Known', 'boat_type_2': 'Not Known'},
{'boat_type': 'ship', 'boat_type_2': 'Not Known'}]
df = pd.DataFrame(data
df['phone_type_final'] = np.where(df.phone_type.str.contains('Not'))...
Use:
df['boat_type_final'] = (df.replace('Not Known',np.nan)
.ffill(axis=1)
.iloc[:, -1]
.fillna('cruise'))
print (df)
boat_type boat_type_2 boat_type_final
0 Not Known Not Known cruise
1 Not Known kayak kayak
2 ship Not Known ship
3 Not Known Not Known cruise
4 ship Not Known ship
Explanation :
First replace
Not Known
to missing values:
print (df.replace('Not Known',np.nan))
boat_type boat_type_2
0 NaN NaN
1 NaN kayak
2 ship NaN
3 NaN NaN
4 ship NaN
Then replace NaN
s by forward filling per rows:
print (df.replace('Not Known',np.nan).ffill(axis=1))
boat_type boat_type_2
0 NaN NaN
1 NaN kayak
2 ship ship
3 NaN NaN
4 ship ship
Select last column by position by iloc
:
print (df.replace('Not Known',np.nan).ffill(axis=1).iloc[:, -1])
0 NaN
1 kayak
2 ship
3 NaN
4 ship
Name: boat_type_2, dtype: object
And if possible NaN
s add fillna
:
print (df.replace('Not Known',np.nan).ffill(axis=1).iloc[:, -1].fillna('cruise'))
0 cruise
1 kayak
2 ship
3 cruise
4 ship
Name: boat_type_2, dtype: object
Another solution if only a few columns is use numpy.select
:
m1 = df['boat_type'] == 'ship'
m2 = df['boat_type_2'] == 'kayak'
df['boat_type_final'] = np.select([m1, m2], ['ship','kayak'], default='cruise')
print (df)
boat_type boat_type_2 boat_type_final
0 Not Known Not Known cruise
1 Not Known kayak kayak
2 ship Not Known ship
3 Not Known Not Known cruise
4 ship Not Known ship
Another solution is to define your function where you have the mappings:
def my_func(row):
if row['boat_type']!='Not Known':
return row['boat_type']
elif row['boat_type_2']!='Not Known':
return row['boat_type_2']
else:
return 'cruise'
[Note: you did not mention what should happen when neither of columns is 'Not Known'.]
Then simply apply the function:
df.loc[:,'boat_type_final'] = df.apply(my_func, axis=1)
print(df)
Output:
boat_type boat_type_2 boat_type_final
0 Not Known Not Known cruise
1 Not Known kayak kayak
2 ship Not Known ship
3 Not Known Not Known cruise
4 ship Not Known ship
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.