Create pandas data frame column based on strings from two other columns

Question

I have a data frame that looks like this:

boat_type   boat_type_2
Not Known   Not Known
Not Known   kayak
ship        Not Known
Not Known   Not Known
ship        Not Known

And I want to create a third columns boat_type_final that should look like this:

boat_type   boat_type_2  boat_type_final
Not Known   Not Known    cruise
Not Known   kayak        kayak
ship        Not Known    ship  
Not Known   Not Known    cruise
ship        Not Known    ship

So basically if 'Not Known' is present in both boat_type and boat_type_2 , then the value should be 'cruise'. But if there is a string other than 'Not Known' in the first two columns, then boat_type_final should be filled with that string, either 'kayak' or 'ship'.

What's the most elegant way to do this? I've seen several options such as where , creating a function, and/or logic, and I'd like to know what a true pythonista would do.

Here's my code so far:

import pandas as pd
import numpy as np
data = [{'boat_type': 'Not Known', 'boat_type_2': 'Not Known'},
    {'boat_type': 'Not Known',  'boat_type_2': 'kayak'},
    {'boat_type': 'ship',  'boat_type_2': 'Not Known'},
    {'boat_type': 'Not Known',  'boat_type_2': 'Not Known'},
    {'boat_type': 'ship',  'boat_type_2': 'Not Known'}]
df = pd.DataFrame(data
df['phone_type_final'] = np.where(df.phone_type.str.contains('Not'))...

Answer 1

Use:

df['boat_type_final'] = (df.replace('Not Known',np.nan)
                           .ffill(axis=1)
                           .iloc[:, -1]
                           .fillna('cruise'))
print (df)
   boat_type boat_type_2 boat_type_final
0  Not Known   Not Known          cruise
1  Not Known       kayak           kayak
2       ship   Not Known            ship
3  Not Known   Not Known          cruise
4       ship   Not Known            ship

Explanation :

First replace Not Known to missing values:

print (df.replace('Not Known',np.nan))
  boat_type boat_type_2
0       NaN         NaN
1       NaN       kayak
2      ship         NaN
3       NaN         NaN
4      ship         NaN

Then replace NaN s by forward filling per rows:

print (df.replace('Not Known',np.nan).ffill(axis=1))
  boat_type boat_type_2
0       NaN         NaN
1       NaN       kayak
2      ship        ship
3       NaN         NaN
4      ship        ship

Select last column by position by iloc :

print (df.replace('Not Known',np.nan).ffill(axis=1).iloc[:, -1])
0      NaN
1    kayak
2     ship
3      NaN
4     ship
Name: boat_type_2, dtype: object

And if possible NaN s add fillna :

print (df.replace('Not Known',np.nan).ffill(axis=1).iloc[:, -1].fillna('cruise'))
0    cruise
1     kayak
2      ship
3    cruise
4      ship
Name: boat_type_2, dtype: object

Another solution if only a few columns is use numpy.select :

m1 = df['boat_type'] == 'ship'
m2 = df['boat_type_2'] == 'kayak'

df['boat_type_final'] = np.select([m1, m2], ['ship','kayak'], default='cruise')
print (df)
   boat_type boat_type_2 boat_type_final
0  Not Known   Not Known          cruise
1  Not Known       kayak           kayak
2       ship   Not Known            ship
3  Not Known   Not Known          cruise
4       ship   Not Known            ship

Answer 2

Another solution is to define your function where you have the mappings:

def my_func(row):
    if row['boat_type']!='Not Known':
        return row['boat_type']
    elif row['boat_type_2']!='Not Known':
        return row['boat_type_2']
    else: 
        return 'cruise'

[Note: you did not mention what should happen when neither of columns is 'Not Known'.]

Then simply apply the function:

df.loc[:,'boat_type_final'] = df.apply(my_func, axis=1)

print(df)

Output:

   boat_type boat_type_2 boat_type_final
0  Not Known   Not Known          cruise
1  Not Known       kayak           kayak
2       ship   Not Known            ship
3  Not Known   Not Known          cruise
4       ship   Not Known            ship

Create pandas data frame column based on strings from two other columns

Question

2 answers

solution1
4 ACCPTED 2018-07-25 09:45:36

solution2
2 2018-07-25 10:25:18

Create pandas data frame column based on strings from two other columns

Question

2 answers

solution1 4 ACCPTED 2018-07-25 09:45:36

solution2 2 2018-07-25 10:25:18

solution1
4 ACCPTED 2018-07-25 09:45:36

solution2
2 2018-07-25 10:25:18