[英]Create Names column in Pandas DataFrame
我使用的是Python包names
以生成QA测试一些名字。
names
包中包含函数names.get_first_name(gender)
,它允许字符串male或female作为参数。 目前我有以下DataFrame:
Marital Gender
0 Single Female
1 Married Male
2 Married Male
3 Single Male
4 Married Female
我尝试过以下方法:
df.loc[df.Gender == 'Male', 'FirstName'] = names.get_first_name(gender = 'male')
df.loc[df.Gender == 'Female', 'FirstName'] = names.get_first_name(gender = 'female')
但我收到的只是两个名字:
Marital Gender FirstName
0 Single Female Kathleen
1 Married Male David
2 Married Male David
3 Single Male David
4 Married Female Kathleen
有没有办法分别为每一行调用此函数,因此并非所有男性/女性都具有相同的确切名称?
你需要申请
df['Firstname']=df['Gender'].str.lower().apply(names.get_first_name)
您可以使用列表理解:
df['Firstname']= [names.get_first_name(gender) for gender in df['Gender'].str.lower()]
听到是一个黑客,按性别(连同他们的概率)读取所有名称,然后随机抽样。
import names
def get_names(gender):
if not isinstance(gender, (str, unicode)) or gender.lower() not in ('male', 'female'):
raise ValueError('Invalid gender')
with open(names.FILES['first:{}'.format(gender.lower())], 'rb') as fin:
first_names = []
probs = []
for line in fin:
first_name, prob, dummy, dummy = line.strip().split()
first_names.append(first_name)
probs.append(float(prob) / 100)
return pd.DataFrame({'first_name': first_names, 'probability': probs})
def get_random_first_names(n, first_names_by_gender):
first_names = (
first_names_by_gender
.sample(n, replace=True, weights='probability')
.loc[:, 'first_name']
.tolist()
)
return first_names
first_names = {gender: get_names(gender) for gender in ('Male', 'Female')}
>>> get_random_first_names(3, first_names['Male'])
['RICHARD', 'EDWARD', 'HOMER']
>>> get_random_first_names(4, first_names['Female'])
['JANICE', 'CAROLINE', 'DOROTHY', 'DIANE']
如果使用map
速度很重要
list(map(names.get_first_name,df.Gender))
Out[51]: ['Harriett', 'Parker', 'Alfred', 'Debbie', 'Stanley']
#df['FN']=list(map(names.get_first_name,df.Gender))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.