简体   繁体   English

熊猫将名称列拆分为名字和姓氏(如果包含一个空格)

[英]Pandas split name column into first and last name if contains one space

Let's say I have a pandas DataFrame containing names like so: 假设我有一个熊猫DataFrame,其中包含如下名称:

name_df = pd.DataFrame({'name':['Jack Fine','Kim Q. Danger','Jane Smith', 'Juan de la Cruz']})

    name
0   Jack Fine
1   Kim Q. Danger
2   Jane Smith
3   Juan de la Cruz

and I want to split the name column into first_name and last_name IF there is one space in the name. 如果name有一个空格,我想将name列分为first_namelast_name Otherwise I want the full name to be shoved into first_name . 否则,我希望将全名推入first_name

So the final DataFrame should look like: 因此最终的DataFrame应该看起来像:

  first_name     last_name
0 Jack           Fine
1 Kim Q. Danger
2 Jane           Smith
3 Juan de la Cruz

I've tried to accomplish this by first applying the following function to return names that can be split into first and last name: 我试图通过首先应用以下函数来返回可拆分为名字和姓氏的名称来实现此目的:

def validate_single_space_name(name: str) -> str:
    pattern = re.compile(r'^.*( ){1}.*$')
    match_obj = re.match(pattern, name)
    if match_obj:
        return name
    else:
        return None

However applying this function to my original name_df, leads to an empty DataFrame, not one populated by names that can be split and Nones. 但是将此功能应用于我的原始name_df会导致一个空的DataFrame,而不是由可以拆分的名称和None填充的一个。

Help getting my current approach to work, or solutions invovling a different approach would be appreciated! 帮助使我目前的方法起作用,或者采用其他方法的解决方案将不胜感激!

You can use str.split to split the strings, then test the number of splits using str.len and use this as a boolean mask to assign just those rows with the last component of the split: 您可以使用str.split拆分字符串,然后使用str.len测试拆分的数量,并将其用作布尔掩码,仅分配具有拆分的最后一部分的那些行:

In [33]:
df.loc[df['name'].str.split().str.len() == 2, 'last name'] = df['name'].str.split().str[-1]
df

Out[33]:
              name last name
0        Jack Fine      Fine
1    Kim Q. Danger       NaN
2       Jane Smith     Smith
3  Juan de la Cruz       NaN

EDIT 编辑

You can call split with param expand=True this will only populate where the name lengths are exactly 2 names: 您可以使用param expand=True调用split ,这只会在名称长度恰​​好是2个名称的地方填充:

In [16]:
name_df[['first_name','last_name']] = name_df['name'].loc[name_df['name'].str.split().str.len() == 2].str.split(expand=True)
name_df

Out[16]:
              name first_name last_name
0        Jack Fine       Jack      Fine
1    Kim Q. Danger        NaN       NaN
2       Jane Smith       Jane     Smith
3  Juan de la Cruz        NaN       NaN

You can then replace the missing first names using fillna : 然后,您可以使用fillna替换缺少的名字:

In [17]:
name_df['first_name'].fillna(name_df['name'],inplace=True)
name_df
​
Out[17]:
              name       first_name last_name
0        Jack Fine             Jack      Fine
1    Kim Q. Danger    Kim Q. Danger       NaN
2       Jane Smith             Jane     Smith
3  Juan de la Cruz  Juan de la Cruz       NaN

I was having some issues with IndexError: list index out of range because the names could be test , kk and other weird user input. 我在IndexError: list index out of range遇到了一些问题IndexError: list index out of range因为名称可能是testkk和其他奇怪的用户输入。 So ended up with something like this: 所以最终得到这样的东西:

items['fullNameSplitLength'] = items['fullName'].str.split().str.len()
items['firstName'] = items['lastName'] = ''
items.loc[
  items['fullNameSplitLength'] >= 1,
  'firstName'
] = items.loc[items['fullNameSplitLength'] >= 1]['fullName'].str.split().str[0]
items.loc[
  items['fullNameSplitLength'] >= 2,
  'lastName'
] = items.loc[items['fullNameSplitLength'] >= 2]['fullName'].str.split().str[-1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 熊猫将名称列拆分为姓氏和名字忽略中间名python - Pandas split name column into last and first name ignore middle name python Python Pandas - 在名字和姓氏列中有多个名称的拆分列 - Python Pandas - Split Column with multiple names in first name and last name column 使用包含空格的列名称或使用包含空格的列名称的drop方法查询Pandas DataFrame - Querying Pandas DataFrame with column name that contains a space or using the drop method with a column name that contains a space 如果在另一列中存在共同匹配,则拆分名字和姓氏 - Split first name and last name if there is common matching in another column 熊猫拆分列名称 - Pandas split column name Pandas 全名分为名字、中间名和姓氏 - Pandas Full Name Split into First , Middle and Last Names 如何将字符串拆分为名字和姓氏 - how to split the strings into first name and last name 检查 pandas df 的列名是否以“名称”开头,并根据现有的空格拆分该列 - Check if column name of a pandas df starts with "name" and split that column based on existing white space 我正在尝试将全名拆分为熊猫中的第一个中间名和姓氏,但我被困在替换 - i am trying to split a full name to first middle and last name in pandas but i am stuck at replace 在pyspark中将全名拆分为名字和姓氏? - split a full name to first name and last name in pyspark?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM