[英]How to extract status in full name in pd.Dataframe column?
I have dataset.我有数据集。 Here is the column of 'Name':
这是“名称”列:
0 Braund, Mr. Owen Harris
1 Cumings, Mrs. John Bradley (Florence Briggs Th...
2 Heikkinen, Miss. Laina
3 Futrelle, Mrs. Jacques Heath (Lily May Peel)
4 Allen, Mr. William Henry
...
151 Pears, Mrs. Thomas (Edith Wearne)
152 Meo, Mr. Alfonzo
153 van Billiard, Mr. Austin Blyler
154 Olsen, Mr. Ole Martin
155 Williams, Mr. Charles Duane
and need to extract first name, status, and second name.并且需要提取名字、状态和第二名。 When I try this on simple string, its ok:
当我在简单的字符串上尝试这个时,它可以:
full_name="Braund, Mr. Owen Harris"
first_name=full_name.split(',')[0]
second_name=full_name.split('.')[1]
print('First name:',first_name)
print('Second name:',second_name)
status = full_name.replace(first_name, '').replace(',','').split('.')[0]
print('Status:',status)
>First name: Braund
>Second name: Owen Harris
>Status: Mr
But after trying to do this with pandas, I fail with the status:但是在尝试使用熊猫执行此操作后,我的状态失败了:
df['first_Name'] = df['Name'].str.split(',').str.get(0) #its ok, worsk well
But after this:但在这之后:
status= df['Name'].str.replace(df['first_Name'], '').replace(',','').split('.').str.get(0)
I get a mistake:我有一个错误:
>>TypeError: 'Series' objects are mutable, thus they cannot be hashed
What are possible solutions?有哪些可能的解决方案?
Edit:Thanks for the answers and extract columns.编辑:感谢您的回答并提取列。 I do
我愿意
def extract_name_data(row):
row.str.extract('(?P<first_name>[^,]+), (?P<status>\w+.) (?P<second_name>[^(]+\w) ?')
last_name = row['second_name']
title = row['status']
first_name = row['first_name']
return first_name, second_name, status
and get并得到
AttributeError: 'str' object has no attribute 'str'
What can be done?可以做什么? Row is meaned to be df['Name']
行的意思是 df['Name']
You could use str.extract
with named capturing groups :您可以将
str.extract
与命名捕获组一起使用:
df['Name'].str.extract('(?P<first_name>[^,]+), (?P<status>\w+.) (?P<second_name>[^(]+\w) ?')
output:输出:
first_name status second_name
0 Braund Mr. Owen Harris
1 Cumings Mrs. John Bradley
2 Heikkinen Miss. Laina
3 Futrelle Mrs. Jacques Heath
4 Allen Mr. William Henry
5 Pears Mrs. Thomas
6 Meo Mr. Alfonzo
7 van Billiard Mr. Austin Blyler
8 Olsen Mr. Ole Martin
9 Williams Mr. Charles Duane
You can also place your original codes with slight modification into Pandas .apply()
function for it to work, as follows:您还可以将您的原始代码稍加修改后放入 Pandas
.apply()
函数中以使其工作,如下所示:
Just replace your variable names in Python with the column names in Pandas.只需将 Python 中的变量名替换为 Pandas 中的列名即可。 For example, replace
full_name
with x['Name']
and first_name
with x['first_Name']
within the lambda function of .apply()
function:例如,在
.apply()
函数的 lambda 函数中,将full_name
替换为x['Name']
,将first_name
替换为x['first_Name']
:
df['status'] = df.apply(lambda x: x['Name'].replace(x['first_Name'], '').replace(',','').split('.')[0], axis=1)
Though may not be the most efficient way of doing it, it's a way to easily modify your existing codes in Python into a workable version in Pandas.虽然可能不是最有效的方法,但它是一种将 Python 中的现有代码轻松修改为 Pandas 中可用版本的方法。
Result:结果:
print(df)
Name first_Name status
0 Braund, Mr. Owen Harris Braund Mr
1 Cumings, Mrs. John Bradley (Florence Briggs Th... Cumings Mrs
2 Heikkinen, Miss. Laina Heikkinen Miss
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) Futrelle Mrs
4 Allen, Mr. William Henry Allen Mr
151 Pears, Mrs. Thomas (Edith Wearne) Pears Mrs
152 Meo, Mr. Alfonzo Meo Mr
153 van Billiard, Mr. Austin Blyler van Billiard Mr
154 Olsen, Mr. Ole Martin Olsen Mr
155 Williams, Mr. Charles Duane Williams Mr
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.