[英]How to create a column in a Pandas dataframe based on a conditional substring search of one or more OTHER columns
I have the following data frame:我有以下数据框:
import pandas as pd
df = pd.DataFrame({'Manufacturer':['Allen Edmonds', 'Louis Vuitton 23', 'Louis Vuitton 8', 'Gulfstream', 'Bombardier', '23 - Louis Vuitton', 'Louis Vuitton 20'],
'System':['None', 'None', '14 Platinum', 'Gold', 'None', 'Platinum 905', 'None']
})
I would like to create another column in the data frame named Pricing
, which contains the value "East Coast" if the following conditions hold:如果满足以下条件,我想在名为Pricing
的数据框中创建另一列,其中包含值“East Coast”:
a) if a substring in the Manufacturer
column matches "Louis", a) 如果Manufacturer
列中的子字符串与“Louis”匹配,
AND和
b) if a substring in the System
column matches "Platinum" b) 如果System
列中的子字符串匹配“Platinum”
The following code operates on a single column:以下代码对单个列进行操作:
df['Pricing'] = np.where(df['Manufacturer'].str.contains('Louis'), 'East Coast', 'None')
I tried to chain this together using AND:我尝试使用 AND 将其链接在一起:
df['Pricing'] = np.where(df['Manufacturer'].str.contains('Louis'), 'East Coast', 'None') and np.where(df['Manufacturer'].str.contains('Platimum'), 'East Coast', 'None')
But, I get the following error:但是,我收到以下错误:
ValueError: The truth value of an array with more than one element is ambiguous. Use `a.any()` or `a.all()`
Can anyone help with how I would implement a.any()
or a.all()
given the two conditions "a" and "b" above?鉴于上面的两个条件“a”和“b”,任何人都可以帮助我如何实现a.any()
或a.all()
吗? Or, perhaps there is a more efficient way to create this column without using np.where
?或者,也许有一种更有效的方法可以在不使用np.where
情况下创建此列?
Thanks in advance!提前致谢!
Using .loc
to slice the dataframe, according to your conditions:根据您的条件,使用.loc
对数据帧进行切片:
df.loc[(df['Manufacturer'].str.contains('Louis')) &
(df['System'].str.contains('Platinum')),
'Pricing'] = 'East Coast'
df
Manufacturer System Pricing
0 Allen Edmonds None NaN
1 Louis Vuitton 23 None NaN
2 Louis Vuitton 8 14 Platinum East Coast
3 Gulfstream Gold NaN
4 Bombardier None NaN
5 23 - Louis Vuitton Platinum 905 East Coast
6 Louis Vuitton 20 None NaN
def contain(x):
if 'Louis' in x.Manufacturer and 'Platinum' in x.System:
return "East Coast"
df['pricing'] = df.apply(lambda x:contain(x),axis = 1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.