繁体   English   中英

如何基于一个或多个 OTHER 列的条件子字符串搜索在 Pandas 数据框中创建一列

[英]How to create a column in a Pandas dataframe based on a conditional substring search of one or more OTHER columns

我有以下数据框:

import pandas as pd

df = pd.DataFrame({'Manufacturer':['Allen Edmonds', 'Louis Vuitton 23', 'Louis Vuitton 8', 'Gulfstream', 'Bombardier', '23 - Louis Vuitton', 'Louis Vuitton 20'],
                   'System':['None', 'None', '14 Platinum', 'Gold', 'None', 'Platinum 905', 'None']
                  })

如果满足以下条件,我想在名为Pricing的数据框中创建另一列,其中包含值“East Coast”:

a) 如果Manufacturer列中的子字符串与“Louis”匹配,

b) 如果System列中的子字符串匹配“Platinum”

以下代码对单个列进行操作:

df['Pricing'] = np.where(df['Manufacturer'].str.contains('Louis'), 'East Coast', 'None')

我尝试使用 AND 将其链接在一起:

df['Pricing'] = np.where(df['Manufacturer'].str.contains('Louis'), 'East Coast', 'None') and np.where(df['Manufacturer'].str.contains('Platimum'), 'East Coast', 'None')

但是,我收到以下错误:

ValueError: The truth value of an array with more than one element is ambiguous. Use `a.any()` or `a.all()`

鉴于上面的两个条件“a”和“b”,任何人都可以帮助我如何实现a.any()a.all()吗? 或者,也许有一种更有效的方法可以在不使用np.where情况下创建此列?

提前致谢!

根据您的条件,使用.loc对数据帧进行切片:

df.loc[(df['Manufacturer'].str.contains('Louis')) & 
       (df['System'].str.contains('Platinum')),
      'Pricing'] = 'East Coast'
df

    Manufacturer        System       Pricing
0   Allen Edmonds       None         NaN
1   Louis Vuitton 23    None         NaN
2   Louis Vuitton 8 14  Platinum     East Coast
3   Gulfstream          Gold         NaN
4   Bombardier          None         NaN
5   23 - Louis Vuitton  Platinum 905 East Coast
6   Louis Vuitton 20    None         NaN
def contain(x):
    if 'Louis' in x.Manufacturer and 'Platinum' in x.System:
        return "East Coast" 

df['pricing'] = df.apply(lambda x:contain(x),axis = 1)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM