How to create a column in a Pandas dataframe based on a conditional substring search of one or more OTHER columns

Question

I have the following data frame:

import pandas as pd

df = pd.DataFrame({'Manufacturer':['Allen Edmonds', 'Louis Vuitton 23', 'Louis Vuitton 8', 'Gulfstream', 'Bombardier', '23 - Louis Vuitton', 'Louis Vuitton 20'],
                   'System':['None', 'None', '14 Platinum', 'Gold', 'None', 'Platinum 905', 'None']
                  })

I would like to create another column in the data frame named Pricing , which contains the value "East Coast" if the following conditions hold:

a) if a substring in the Manufacturer column matches "Louis",

AND

b) if a substring in the System column matches "Platinum"

The following code operates on a single column:

df['Pricing'] = np.where(df['Manufacturer'].str.contains('Louis'), 'East Coast', 'None')

I tried to chain this together using AND:

df['Pricing'] = np.where(df['Manufacturer'].str.contains('Louis'), 'East Coast', 'None') and np.where(df['Manufacturer'].str.contains('Platimum'), 'East Coast', 'None')

But, I get the following error:

ValueError: The truth value of an array with more than one element is ambiguous. Use `a.any()` or `a.all()`

Can anyone help with how I would implement a.any() or a.all() given the two conditions "a" and "b" above? Or, perhaps there is a more efficient way to create this column without using np.where ?

Thanks in advance!

Answer 1

Using .loc to slice the dataframe, according to your conditions:

df.loc[(df['Manufacturer'].str.contains('Louis')) & 
       (df['System'].str.contains('Platinum')),
      'Pricing'] = 'East Coast'
df

    Manufacturer        System       Pricing
0   Allen Edmonds       None         NaN
1   Louis Vuitton 23    None         NaN
2   Louis Vuitton 8 14  Platinum     East Coast
3   Gulfstream          Gold         NaN
4   Bombardier          None         NaN
5   23 - Louis Vuitton  Platinum 905 East Coast
6   Louis Vuitton 20    None         NaN

Answer 2

def contain(x):
    if 'Louis' in x.Manufacturer and 'Platinum' in x.System:
        return "East Coast" 

df['pricing'] = df.apply(lambda x:contain(x),axis = 1)

How to create a column in a Pandas dataframe based on a conditional substring search of one or more OTHER columns

Question

2 answers

solution1
2 ACCPTED 2020-11-15 00:21:06

solution2
1 2020-11-15 00:22:11

How to create a column in a Pandas dataframe based on a conditional substring search of one or more OTHER columns

Question

2 answers

solution1 2 ACCPTED 2020-11-15 00:21:06

solution2 1 2020-11-15 00:22:11

solution1
2 ACCPTED 2020-11-15 00:21:06

solution2
1 2020-11-15 00:22:11