在pandas数据框中切片字符串并分配一个新列

Question

Hi I have the following data frame: 嗨，我有以下数据框：

df = pd.DataFrame()
df['Name'] = ['P. John','Merry','P. John travis']
df['First_Name'] = df.Name.str.split('.', expand = True)[0]
df['Last_Name'] = df.Name.str.split('.', expand = True)[1]

I want to slice the column base on period "." 我想根据时间段“”对列进行切片。 and used as last name. 并用作姓氏。 I could do for all but for "merry" it shows None as follow: 除了“ merry”，我可以做所有的事情，它显示None如下：

0            John
1            None
2     John travis

How could I replace with all None in last name with First Name ? 我怎么能在姓氏都没有用名字取代？ A searched in the forum and could not find one. 在论坛中搜索了A，但找不到。

And second question is that I have another data frame as follow: 第二个问题是我还有另一个数据框，如下所示：

df1 = pd.DataFrame({'Name':['John','Merry','John travis'],"Position":['CEO','CTO','Engr']})

I am creating a new column ** Position** for df by using map function. 我正在使用地图功能为df创建一个新的列** Position **。

 df ['Position'] = df.Last_Name.map (df1.set_index('Name').Position)

but the new column in df show me some Nan value as follow: 但是df中的新列向我显示了一些Nan值，如下所示：

The data frame I shown in this post replicates the real problem that I am solving. 我在这篇文章中显示的数据框架复制了我正在解决的实际问题。 However, using the map fucntion in real problem give me the follwoing error code: 但是，在实际问题中使用地图功能会给我以下错误代码：

Reindexing only valid with uniquely valued Index objects. 重新索引仅对具有唯一值的Index对象有效。

Can anyone advise me on that.? 有人可以建议我吗？

Thanks. 谢谢。

Answer 1

You can simplify your code by one split with parameter n=1 for split by first . 您可以使用参数n=1进行一次split来简化代码，以便首先进行拆分. if possible multiple one and then replace None by fillna : 如果可能的话，将其fillna 1，然后用fillna替换None ：

df = pd.DataFrame({'Name':['P. John','Merry','P. John travis']})

df[['First_Name', 'Last_Name']] = df.Name.str.split('.\s+', expand = True, n=1)
#if always only one .
#df[['First_Name', 'Last_Name']] = df.Name.str.split('.\s+', expand = True, n=1)
df['Last_Name'] = df['Last_Name'].fillna(df['First_Name'])
print (df)
             Name First_Name     Last_Name
0         P. John          P          John
1           Merry      Merry         Merry
2  P. John travis          P   John travis

Or remove expand=True for Series of list s and select first and last values: 或删除Series of list的Series of list expand=True ，然后选择第一个和最后一个值：

splitted = df.Name.str.split('.\s+', n=1)
df['first_Name'] = splitted.str[0]
df['Last_Name'] = splitted.str[-1]
print (df)
             Name first_Name     Last_Name
0         P. John          P          John
1           Merry      Merry         Merry
2  P. John travis          P   John travis

Answer 2

Using fillna 使用fillna

Ex: 例如：

import pandas as pd
df = pd.DataFrame()
df['Name'] = ['P. John','Merry','P. John travis']
df['First_Name'] = df.Name.str.split('.', expand = True)[0]
df['Last_Name'] = (df.Name.str.split('.', expand = True)[1]).fillna(df["First_Name"])
print(df)

Output: 输出：

             Name First_Name     Last_Name
0         P. John          P          John
1           Merry      Merry         Merry
2  P. John travis          P   John travis

Answer 3

you could use a list comprehension and negative indexing 您可以使用列表理解和否定索引

df['Last_Name'] = [x.split('.')[-1] for x in df.Name]

             Name     Last_Name
0         P. John          John
1           Merry         Merry
2  P. John travis   John travis

here's an extension of the above technique that returns a whole new dataframe with the name split as desired, in a single statement 这是上述技术的扩展，可在单个语句中返回一个新的数据帧，并根据需要拆分名称

pd.DataFrame([(lambda x: (y, x[0], x[-1]))(y.split('.')) 
              for y in df.Name], 
             columns=['Name', 'First_Name', 'Last_Name'])

             Name First_Name     Last_Name
0         P. John          P          John
1           Merry      Merry         Merry
2  P. John travis          P   John travis

在pandas数据框中切片字符串并分配一个新列

问题描述

3 个解决方案

解决方案1
2 已采纳 2018-08-02 08:39:40

解决方案2
1 2018-08-02 08:38:53

解决方案3
1 2018-08-02 08:47:53

在pandas数据框中切片字符串并分配一个新列

问题描述

3 个解决方案

解决方案1 2 已采纳 2018-08-02 08:39:40

解决方案2 1 2018-08-02 08:38:53

解决方案3 1 2018-08-02 08:47:53

解决方案1
2 已采纳 2018-08-02 08:39:40

解决方案2
1 2018-08-02 08:38:53

解决方案3
1 2018-08-02 08:47:53