[英]Replace part of pandas row and make a new column
I have the below pandas dataframe.我有以下 pandas dataframe。
d = {'col1': [1, 2,3,4,5,60,0,0,6,3,2,4],'col3': [1, 22,33,44,55,60,1,5,6,3,2,4],'Name': ['22a| df a1asd_V1', 'xcd a2a_sd_V3','23vg aa_bsd_V1','22a| df a1asd_V1|5mo','a3as d_V1','aa b_12mo','aasd_V4','aa_6mo_bsd','aa_adn sd_V15',np.nan,'aasd_V12','aasd120Abs'],'Date': ['2021-06-13', '2021-06-13','2021-06-13','2021-06-14','2021-06-15','2021-06-15','2021-06-13','2021-06-16','2021-06-13','2021-06-13','2021-06-13','2021-06-16']}
dff = pd.DataFrame(data=d)
dff
col1 col3 Name Date
0 1 1 22a| df a1asd_V1 2021-06-13
1 2 22 xcd a2a_sd_V3 2021-06-13
2 3 33 23vg aa_bsd_V1 2021-06-13
3 4 44 22a| df a1asd_V1|5mo 2021-06-14
4 5 55 a3as d_V1 2021-06-15
5 60 60 aa b_12mo 2021-06-15
6 0 1 aasd_V4 2021-06-13
7 0 5 aa_6mo_bsd 2021-06-16
8 6 6 aa_adn sd_V15 2021-06-13
9 3 3 NaN 2021-06-13
10 2 2 aasd_V12 2021-06-13
11 4 4 aasd120Abs 2021-06-16
I want to replace _, |我想替换 _, | into space and if there is like 5mo, 6mo, 12mo.. into 5 months, 6 months, 12, months like that for Name column and make a new column called New Name.
进入太空,如果有 5 个月、6 个月、12 个月 .. 进入 5 个月、6 个月、12 个月,就像 Name 列那样,并创建一个名为 New Name 的新列。 Like below data frame.
像下面的数据框。
col1 col3 Name Date NewName
0 1 1 22a| df a1asd_V1 2021-06-13 22a df a1asd V1
1 2 22 xcd a2a_sd_V3 2021-06-13 xcd a2a sd V3
2 3 33 23vg aa_bsd_V1 2021-06-13 23vg aa bsd V1
3 4 44 22a| df a1asd_V1|5mo 2021-06-14 22a df a1asd V1 5 months
4 5 55 a3as d_V1 2021-06-15 a3as d V1
5 60 60 aa b_12mo 2021-06-15 aa b 12 months
6 0 1 aasd_V4 2021-06-13 aasd V4
7 0 5 aa_6mo_bsd 2021-06-16 aa 6 months bsd
8 6 6 aa_adn sd_V15 2021-06-13 aa adn sd V15
9 3 3 NaN 2021-06-13 NaN
10 2 2 aasd_V12 2021-06-13 aasd V12
11 4 4 aasd120Abs 2021-06-16 aasd120Abs
Is it possible to do it in the lambda function?是否可以在 lambda function 中做到这一点? Since my actual data frame has more than 1million records I need something much efficient to work.
由于我的实际数据框有超过 100 万条记录,因此我需要一些更高效的工作。
Thanks in advance.提前致谢。 Any idea would be appriciate.
任何想法都会很合适。
This should work:这应该有效:
dff["NewName"] = dff["Name"].apply(lambda x: x.replace("|"," ").replace("_"," "))
df['NewName'] = [x.replace('|', '').replace('_', '') for x in df['Name']]
You can use pd.Series.replace
:您可以使用
pd.Series.replace
:
print (df["Name"].replace({"[|_]":" ", "(\d+)mo":"\\1 months"}, regex=True))
0 22a df a1asd V1
1 xcd a2a sd V3
2 23vg aa bsd V1
3 22a df a1asd V1 5 months
4 a3as d V1
5 aa b 12 months
6 aasd V4
7 aa 6 months bsd
8 aa adn sd V15
9 NaN
10 aasd V12
11 aasd120Abs
Name: Name, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.