简体   繁体   English

将列值拆分为 2 个新列 - Python Pandas

[英]Splitting column value into 2 new columns - Python Pandas

I have a dataframe that has column 'name'.我有一个具有“名称”列的 dataframe。 With values like 'James Cameron'.具有像“詹姆斯卡梅隆”这样的价值观。 I'd like to split it out into 2 new columns 'First_Name' and 'Last_Name', but there is no delimiter in the data so I am not quite sure how.我想将其拆分为 2 个新列“First_Name”和“Last_Name”,但数据中没有分隔符,所以我不太确定如何操作。 I realize that 'James' is in position [0] and 'Cameron' is in position [1], but I am not sure you can recognize that without the delimiter我意识到“James”在 position [0] 中,而“Cameron”在 position [1] 中,但我不确定你是否能在没有分隔符的情况下认识到这一点

df = pd.DataFrame({'name':['James Cameron','Martin Sheen'],
               'Id':[1,2]})
df

EDIT:编辑:

Vaishali's answer below worked perfectly, for the dataframe I had provided.对于我提供的 dataframe,Vaishali 在下面的回答非常有效。 I created that dataframe as an example though.我创建了 dataframe 作为示例。 My real code looks like this"我的真实代码是这样的”

data[['First_Name','Last_Name']] = data.director_name.str.split(' ', expand = True)

and that unfortunately, is throwing an error:不幸的是,这引发了一个错误:

'Columns must be same length as key'

The column holds the same values as my example though.该列的值与我的示例相同。 Any suggestions?有什么建议么?

Thanks谢谢

You can split on space 你可以拆分空间

df[['Name', 'Lastname']] = df.name.str.split(' ', expand = True)

    Id  name            Name    Lastname
0   1   James Cameron   James   Cameron
1   2   Martin Sheen    Martin  Sheen

EDIT: Handling the error 'Columns must be same length as key'. 编辑:处理错误'列必须与键长度相同'。 The data might have some names with more than one space, eg: George Martin Jr. In that case, one way is to split on space and use the first and the second string, ignoring third if it exists 数据可能有一些具有多个空格的名称,例如:George Martin Jr.在这种情况下,一种方法是分割空间并使用第一个和第二个字符串,如果存在则忽略第三个字符串

df['First_Name'] = df.name.str.split(' ', expand = True)[0]
df['Last_Name'] = df.name.str.split(' ', expand = True)[1]

Slightly different way of doing this: 这样做的方式略有不同:

df[['first_name', 'last_name']] = df.apply(lambda row: row['name'].split(), axis=1)

df
   Id           name first_name last_name
0   1  James Cameron      James   Cameron
1   2   Martin Sheen     Martin     Sheen

I like this method... Not as quick as simply splitting but it drops in column names in a very convenient way. 我喜欢这种方法......不像简单拆分那么快,但它以非常方便的方式在列名中删除。

df.join(df.name.str.extract('(?P<First>\S+)\s+(?P<Last>\S+)', expand=True))

   Id           name   First     Last
0   1  James Cameron   James  Cameron
1   2   Martin Sheen  Martin    Sheen

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM