[英]slicing string in pandas data frame and assign a new column
Hi I have the following data frame: 嗨,我有以下数据框:
df = pd.DataFrame()
df['Name'] = ['P. John','Merry','P. John travis']
df['First_Name'] = df.Name.str.split('.', expand = True)[0]
df['Last_Name'] = df.Name.str.split('.', expand = True)[1]
I want to slice the column base on period "." 我想根据时间段“”对列进行切片。 and used as last name.
并用作姓氏。 I could do for all but for "merry" it shows None as follow:
除了“ merry”,我可以做所有的事情,它显示None如下:
0 John
1 None
2 John travis
How could I replace with all None in last name with First Name ? 我怎么能在姓氏都没有用名字取代? A searched in the forum and could not find one.
在论坛中搜索了A,但找不到。
And second question is that I have another data frame as follow: 第二个问题是我还有另一个数据框,如下所示:
df1 = pd.DataFrame({'Name':['John','Merry','John travis'],"Position":['CEO','CTO','Engr']})
I am creating a new column ** Position** for df by using map function. 我正在使用地图功能为df创建一个新的列** Position **。
df ['Position'] = df.Last_Name.map (df1.set_index('Name').Position)
but the new column in df show me some Nan value as follow: 但是df中的新列向我显示了一些Nan值,如下所示:
The data frame I shown in this post replicates the real problem that I am solving. 我在这篇文章中显示的数据框架复制了我正在解决的实际问题。 However, using the map fucntion in real problem give me the follwoing error code:
但是,在实际问题中使用地图功能会给我以下错误代码:
Reindexing only valid with uniquely valued Index objects.
重新索引仅对具有唯一值的Index对象有效。
Can anyone advise me on that.? 有人可以建议我吗?
Thanks. 谢谢。
You can simplify your code by one split
with parameter n=1
for split by first .
您可以使用参数
n=1
进行一次split
来简化代码,以便首先进行拆分.
if possible multiple one and then replace None
by fillna
: 如果可能的话,将其
fillna
1,然后用fillna
替换None
:
df = pd.DataFrame({'Name':['P. John','Merry','P. John travis']})
df[['First_Name', 'Last_Name']] = df.Name.str.split('.\s+', expand = True, n=1)
#if always only one .
#df[['First_Name', 'Last_Name']] = df.Name.str.split('.\s+', expand = True, n=1)
df['Last_Name'] = df['Last_Name'].fillna(df['First_Name'])
print (df)
Name First_Name Last_Name
0 P. John P John
1 Merry Merry Merry
2 P. John travis P John travis
Or remove expand=True
for Series of list
s and select first and last values: 或删除
Series of list
的Series of list
expand=True
,然后选择第一个和最后一个值:
splitted = df.Name.str.split('.\s+', n=1)
df['first_Name'] = splitted.str[0]
df['Last_Name'] = splitted.str[-1]
print (df)
Name first_Name Last_Name
0 P. John P John
1 Merry Merry Merry
2 P. John travis P John travis
Using fillna
使用
fillna
Ex: 例如:
import pandas as pd
df = pd.DataFrame()
df['Name'] = ['P. John','Merry','P. John travis']
df['First_Name'] = df.Name.str.split('.', expand = True)[0]
df['Last_Name'] = (df.Name.str.split('.', expand = True)[1]).fillna(df["First_Name"])
print(df)
Output: 输出:
Name First_Name Last_Name
0 P. John P John
1 Merry Merry Merry
2 P. John travis P John travis
you could use a list comprehension and negative indexing 您可以使用列表理解和否定索引
df['Last_Name'] = [x.split('.')[-1] for x in df.Name]
Name Last_Name
0 P. John John
1 Merry Merry
2 P. John travis John travis
here's an extension of the above technique that returns a whole new dataframe with the name split as desired, in a single statement 这是上述技术的扩展,可在单个语句中返回一个新的数据帧,并根据需要拆分名称
pd.DataFrame([(lambda x: (y, x[0], x[-1]))(y.split('.'))
for y in df.Name],
columns=['Name', 'First_Name', 'Last_Name'])
Name First_Name Last_Name
0 P. John P John
1 Merry Merry Merry
2 P. John travis P John travis
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.