[英]Splitting column value into 2 new columns - Python Pandas
I have a dataframe that has column 'name'.我有一个具有“名称”列的 dataframe。 With values like 'James Cameron'.具有像“詹姆斯卡梅隆”这样的价值观。 I'd like to split it out into 2 new columns 'First_Name' and 'Last_Name', but there is no delimiter in the data so I am not quite sure how.我想将其拆分为 2 个新列“First_Name”和“Last_Name”,但数据中没有分隔符,所以我不太确定如何操作。 I realize that 'James' is in position [0] and 'Cameron' is in position [1], but I am not sure you can recognize that without the delimiter我意识到“James”在 position [0] 中,而“Cameron”在 position [1] 中,但我不确定你是否能在没有分隔符的情况下认识到这一点
df = pd.DataFrame({'name':['James Cameron','Martin Sheen'],
'Id':[1,2]})
df
EDIT:编辑:
Vaishali's answer below worked perfectly, for the dataframe I had provided.对于我提供的 dataframe,Vaishali 在下面的回答非常有效。 I created that dataframe as an example though.我创建了 dataframe 作为示例。 My real code looks like this"我的真实代码是这样的”
data[['First_Name','Last_Name']] = data.director_name.str.split(' ', expand = True)
and that unfortunately, is throwing an error:不幸的是,这引发了一个错误:
'Columns must be same length as key'
The column holds the same values as my example though.该列的值与我的示例相同。 Any suggestions?有什么建议么?
Thanks谢谢
You can split on space 你可以拆分空间
df[['Name', 'Lastname']] = df.name.str.split(' ', expand = True)
Id name Name Lastname
0 1 James Cameron James Cameron
1 2 Martin Sheen Martin Sheen
EDIT: Handling the error 'Columns must be same length as key'. 编辑:处理错误'列必须与键长度相同'。 The data might have some names with more than one space, eg: George Martin Jr. In that case, one way is to split on space and use the first and the second string, ignoring third if it exists 数据可能有一些具有多个空格的名称,例如:George Martin Jr.在这种情况下,一种方法是分割空间并使用第一个和第二个字符串,如果存在则忽略第三个字符串
df['First_Name'] = df.name.str.split(' ', expand = True)[0]
df['Last_Name'] = df.name.str.split(' ', expand = True)[1]
Slightly different way of doing this: 这样做的方式略有不同:
df[['first_name', 'last_name']] = df.apply(lambda row: row['name'].split(), axis=1)
df
Id name first_name last_name
0 1 James Cameron James Cameron
1 2 Martin Sheen Martin Sheen
I like this method... Not as quick as simply splitting but it drops in column names in a very convenient way. 我喜欢这种方法......不像简单拆分那么快,但它以非常方便的方式在列名中删除。
df.join(df.name.str.extract('(?P<First>\S+)\s+(?P<Last>\S+)', expand=True))
Id name First Last
0 1 James Cameron James Cameron
1 2 Martin Sheen Martin Sheen
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.