[英]How to search a string in one pandas dataframe column as a substring in another dataframe
[英]How to create a column in a Pandas dataframe based on a conditional substring search of one or more OTHER columns
我有以下數據框:
import pandas as pd
df = pd.DataFrame({'Manufacturer':['Allen Edmonds', 'Louis Vuitton 23', 'Louis Vuitton 8', 'Gulfstream', 'Bombardier', '23 - Louis Vuitton', 'Louis Vuitton 20'],
'System':['None', 'None', '14 Platinum', 'Gold', 'None', 'Platinum 905', 'None']
})
如果滿足以下條件,我想在名為Pricing
的數據框中創建另一列,其中包含值“East Coast”:
a) 如果Manufacturer
列中的子字符串與“Louis”匹配,
和
b) 如果System
列中的子字符串匹配“Platinum”
以下代碼對單個列進行操作:
df['Pricing'] = np.where(df['Manufacturer'].str.contains('Louis'), 'East Coast', 'None')
我嘗試使用 AND 將其鏈接在一起:
df['Pricing'] = np.where(df['Manufacturer'].str.contains('Louis'), 'East Coast', 'None') and np.where(df['Manufacturer'].str.contains('Platimum'), 'East Coast', 'None')
但是,我收到以下錯誤:
ValueError: The truth value of an array with more than one element is ambiguous. Use `a.any()` or `a.all()`
鑒於上面的兩個條件“a”和“b”,任何人都可以幫助我如何實現a.any()
或a.all()
嗎? 或者,也許有一種更有效的方法可以在不使用np.where
情況下創建此列?
提前致謝!
根據您的條件,使用.loc
對數據幀進行切片:
df.loc[(df['Manufacturer'].str.contains('Louis')) &
(df['System'].str.contains('Platinum')),
'Pricing'] = 'East Coast'
df
Manufacturer System Pricing
0 Allen Edmonds None NaN
1 Louis Vuitton 23 None NaN
2 Louis Vuitton 8 14 Platinum East Coast
3 Gulfstream Gold NaN
4 Bombardier None NaN
5 23 - Louis Vuitton Platinum 905 East Coast
6 Louis Vuitton 20 None NaN
def contain(x):
if 'Louis' in x.Manufacturer and 'Platinum' in x.System:
return "East Coast"
df['pricing'] = df.apply(lambda x:contain(x),axis = 1)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.