[英]why does my code if elif else apply to all?
我可以为此提供解决方案吗,我有这个
df['Location']
*运行,我得到了这个
0 New York, NY
1 Chantilly, VA
2 Boston, MA
3 Newton, MA
4 New York, NY
...
667 Fort Lee, NJ
668 San Francisco, CA
669 Irwindale, CA
670 San Francisco, CA
671 New York, NY
Name: Location, Length: 659, dtype: object
then I want to make it simplified by if it contain Nwe York, NY then I want it become NY. If it contain Boston, MA then I want it become MA. Etc
so I write this code
def clean_location_1(x):
if 'CA':
return 'CA'
elif 'NY':
return 'NY'
elif 'DC':
return 'DC'
elif 'MA':
return 'MA'
elif 'IL':
return 'IL'
elif 'VA':
return 'VA'
else:
return 'others'
df['Location'] = df['Location'].apply(clean_location_1)
but, when I run my script, all the Location become CA
how can I solve this?
使用您的方法解决此问题的可能解决方案之一如下。 将熊猫导入为 pd
data = pd.DataFrame([{'location': 'New York, NY'},
{'location': 'Chantilly, VA'},
{'location': 'Boston, MA'},
{'location': 'Newton, MA'},
{'location': 'San Francisco, CA'}])
def clean_location_1(x):
if 'CA' in x:
return 'CA'
elif 'NY' in x:
return 'NY'
elif 'DC' in x:
return 'DC'
elif 'MA' in x:
return 'MA'
elif 'IL' in x:
return 'IL'
elif 'VA' in x:
return 'VA'
else:
return 'others'
data['location'].apply(clean_location_1)
您的问题是 if/else 块中的条件不正确。
这样做的另一种方法可能是。
list_states = ['CA', 'NY', 'DC', 'MA', 'IL', 'VA']
data['location'].apply(lambda x: x.split(' ')[-1] if x.split(' ')[-1] in list_states else 'others')
那么你将不需要一个巨大的 if/else 块。
当你写if 'CA'
它没有多大意义时,你必须检查它的值。
这应该使用pd.Series.str.contains
来pd.Series.str.contains
:
def clean_location_1(x):
if x.str.contains('CA'):
return 'CA'
elif x.str.contains('NY'):
return 'NY'
elif x.str.contains('DC'):
return 'DC'
elif x.str.contains('MA'):
return 'MA'
elif x.str.contains('IL'):
return 'IL'
elif x.str.contains('VA'):
return 'VA'
else:
return 'others'
问题很简单。 您不是将字符串与 x 进行比较。 并且 'CA' 将始终返回 true,因为非空字符串是真实的。 这就是为什么一切都变成了 CA
正在做
if "<str>":
始终返回True
,这意味着在您的代码中,它将始终返回CA
。 所以,你可以试试这个,检查x
是否在<word>
。
def clean_location_1(x):
if 'CA' in x:
return 'CA'
elif 'NY' in x:
return 'NY'
elif 'DC' in x:
return 'DC'
elif 'MA' in x:
return 'MA'
elif 'IL' in x:
return 'IL'
elif 'VA' in x:
return 'VA'
else:
return 'others'
df['Location'] = df['Location'].apply(clean_location_1)
或者你可以试试这个,它很容易、干净和简单:
check=["CA","NY","DC","MA","IL","VA"]
def clean_location_1(x):
y=x.rsplit(", ",1)[1]
if y in check:
return y
else:
return "others"
df['Location'] = df['Location'].apply(clean_location_1)
在这里,我们正在创建位置list of short form
,就像您在每个if-else
语句中所做的那样,并将其存储在check
并检查x
是否具有check
values
。
或单行解决方案,与第二种方法相同,但在一行中:
check=["CA","NY","DC","MA","IL","VA"]
df['Location'] = df['Location'].apply(lambda x: x.rsplit(", ",1)[1] if x.rsplit(", ",1)[1] in check else "others")
你可以做:
states = ['CA', 'NY', 'DC', 'MA', 'IL', 'VA']
df['State'] = df['Location'].str.split(', ', expand=True)[1] \
.rename('State').to_frame().query('State in @states')
df['State'] = df['State'].fillna('other')
>>> df
Location State
0 New York, NY NY
1 Chantilly, VA VA
2 Boston, MA MA
3 Newton, MA MA
4 New York, NY NY
5 Fort Lee, NJ other
6 San Francisco, CA CA
7 Irwindale, CA CA
8 San Francisco, CA CA
9 New York, NY NY
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.