[英]How to check for an specific string within a customized function for a pandas dataframe column?
假设我得到了下一个 pandas dataframe 列:
import pandas as pd
import string
d = {'Name': ['Braund, Mr. Owen Harris','Cumings, Mrs. John Bradley (Florence Briggs Thayer)', 'Heikkinen, Miss.Laina']}
raw_df = pd.DataFrame(data=d)
我正在尝试解码此列,如果在字符串行中找到Mrs
则返回is_married
,否则返回not_married
:
def is_married_female(raw_df):
raw_df['Name'].str.contains('Mrs').any():
return 'married'
else:
return 'not_married'
raw_df['is_married_female']=raw_df.apply(lambda x: is_married_female(x["Name"]), axis=1)
但是我不断收到下一个错误:
TypeError:字符串索引必须是整数
预期的 Output 可能如下所示:
raw_df['is_married_female']
# not_married
# married
# not_married
我在 function 中缺少什么?
问题:
x['Name']
是 python str不是系列或 DataFrame。
在 function is_married_female
内部,变量raw_df
是一个字符串,如:
“布劳德,欧文·哈里斯先生”
当raw_df['Name']
运行时,这相当于:
print('Braund, Mr. Owen Harris'['Name']) # TypeError: string indices must be integers
正在尝试通过索引访问字符串,例如
print('Braund, Mr. Owen Harris'[0]) # B
使固定:
str
) 并在.raw_df
重命名为name
以避免将来混淆import pandas as pd
d = {'Name': ['Braund, Mr. Owen Harris',
'Cumings, Mrs. John Bradley (Florence Briggs Thayer)',
'Heikkinen, Miss.Laina']}
raw_df = pd.DataFrame(data=d)
def is_married_female(name):
if 'Mrs' in name:
return 'married'
else:
return 'not_married'
raw_df['is_married_female'] = raw_df.apply(
lambda x: is_married_female(x["Name"]),
axis=1
)
print(raw_df.to_string())
然而,更高效的解决方案是使用np.where :
import numpy as np
import pandas as pd
d = {'Name': ['Braund, Mr. Owen Harris',
'Cumings, Mrs. John Bradley (Florence Briggs Thayer)',
'Heikkinen, Miss.Laina']}
raw_df = pd.DataFrame(data=d)
raw_df['is_married_female'] = np.where(raw_df['Name'].str.contains('Mrs'),
'married', 'not_married')
print(raw_df.to_string())
两者的 Output 是:
Name is_married_female
0 Braund, Mr. Owen Harris not_married
1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) married
2 Heikkinen, Miss.Laina not_married
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.