簡體   English   中英

如何基於一個或多個 OTHER 列的條件子字符串搜索在 Pandas 數據框中創建一列

[英]How to create a column in a Pandas dataframe based on a conditional substring search of one or more OTHER columns

我有以下數據框:

import pandas as pd

df = pd.DataFrame({'Manufacturer':['Allen Edmonds', 'Louis Vuitton 23', 'Louis Vuitton 8', 'Gulfstream', 'Bombardier', '23 - Louis Vuitton', 'Louis Vuitton 20'],
                   'System':['None', 'None', '14 Platinum', 'Gold', 'None', 'Platinum 905', 'None']
                  })

如果滿足以下條件,我想在名為Pricing的數據框中創建另一列,其中包含值“East Coast”:

a) 如果Manufacturer列中的子字符串與“Louis”匹配,

b) 如果System列中的子字符串匹配“Platinum”

以下代碼對單個列進行操作:

df['Pricing'] = np.where(df['Manufacturer'].str.contains('Louis'), 'East Coast', 'None')

我嘗試使用 AND 將其鏈接在一起:

df['Pricing'] = np.where(df['Manufacturer'].str.contains('Louis'), 'East Coast', 'None') and np.where(df['Manufacturer'].str.contains('Platimum'), 'East Coast', 'None')

但是,我收到以下錯誤:

ValueError: The truth value of an array with more than one element is ambiguous. Use `a.any()` or `a.all()`

鑒於上面的兩個條件“a”和“b”,任何人都可以幫助我如何實現a.any()a.all()嗎? 或者,也許有一種更有效的方法可以在不使用np.where情況下創建此列?

提前致謝!

根據您的條件,使用.loc對數據幀進行切片:

df.loc[(df['Manufacturer'].str.contains('Louis')) & 
       (df['System'].str.contains('Platinum')),
      'Pricing'] = 'East Coast'
df

    Manufacturer        System       Pricing
0   Allen Edmonds       None         NaN
1   Louis Vuitton 23    None         NaN
2   Louis Vuitton 8 14  Platinum     East Coast
3   Gulfstream          Gold         NaN
4   Bombardier          None         NaN
5   23 - Louis Vuitton  Platinum 905 East Coast
6   Louis Vuitton 20    None         NaN
def contain(x):
    if 'Louis' in x.Manufacturer and 'Platinum' in x.System:
        return "East Coast" 

df['pricing'] = df.apply(lambda x:contain(x),axis = 1)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM