简体   繁体   中英

Replace part of pandas row and make a new column

I have the below pandas dataframe.

d = {'col1': [1, 2,3,4,5,60,0,0,6,3,2,4],'col3': [1, 22,33,44,55,60,1,5,6,3,2,4],'Name': ['22a| df a1asd_V1', 'xcd a2a_sd_V3','23vg aa_bsd_V1','22a| df a1asd_V1|5mo','a3as  d_V1','aa b_12mo','aasd_V4','aa_6mo_bsd','aa_adn sd_V15',np.nan,'aasd_V12','aasd120Abs'],'Date': ['2021-06-13', '2021-06-13','2021-06-13','2021-06-14','2021-06-15','2021-06-15','2021-06-13','2021-06-16','2021-06-13','2021-06-13','2021-06-13','2021-06-16']}
dff = pd.DataFrame(data=d)
dff
    col1    col3    Name                    Date        
0   1       1       22a| df a1asd_V1        2021-06-13
1   2       22      xcd a2a_sd_V3           2021-06-13
2   3       33      23vg aa_bsd_V1          2021-06-13
3   4       44      22a| df a1asd_V1|5mo    2021-06-14
4   5       55      a3as d_V1               2021-06-15
5   60      60      aa b_12mo               2021-06-15
6   0       1       aasd_V4                 2021-06-13
7   0       5       aa_6mo_bsd              2021-06-16
8   6       6       aa_adn sd_V15           2021-06-13
9   3       3       NaN                     2021-06-13
10  2       2       aasd_V12                2021-06-13
11  4       4       aasd120Abs              2021-06-16

I want to replace _, | into space and if there is like 5mo, 6mo, 12mo.. into 5 months, 6 months, 12, months like that for Name column and make a new column called New Name. Like below data frame.

    col1    col3    Name                    Date             NewName
0   1       1       22a| df a1asd_V1        2021-06-13      22a  df a1asd V1
1   2       22      xcd a2a_sd_V3           2021-06-13      xcd a2a sd V3
2   3       33      23vg aa_bsd_V1          2021-06-13      23vg aa bsd V1
3   4       44      22a| df a1asd_V1|5mo    2021-06-14      22a df a1asd V1 5 months
4   5       55      a3as d_V1               2021-06-15      a3as d V1
5   60      60      aa b_12mo               2021-06-15      aa b 12 months  
6   0       1       aasd_V4                 2021-06-13      aasd V4     
7   0       5       aa_6mo_bsd              2021-06-16      aa 6 months bsd 
8   6       6       aa_adn sd_V15           2021-06-13      aa adn sd V15
9   3       3       NaN                     2021-06-13      NaN 
10  2       2       aasd_V12                2021-06-13      aasd V12
11  4       4       aasd120Abs              2021-06-16      aasd120Abs

Is it possible to do it in the lambda function? Since my actual data frame has more than 1million records I need something much efficient to work.

Thanks in advance. Any idea would be appriciate.

This should work:

dff["NewName"] = dff["Name"].apply(lambda x: x.replace("|"," ").replace("_"," "))

df['NewName'] = [x.replace('|', '').replace('_', '') for x in df['Name']]

You can use pd.Series.replace :

print (df["Name"].replace({"[|_]":" ", "(\d+)mo":"\\1 months"}, regex=True))

0              22a  df a1asd V1
1                 xcd a2a sd V3
2                23vg aa bsd V1
3     22a  df a1asd V1 5 months
4                    a3as  d V1
5                aa b 12 months
6                       aasd V4
7               aa 6 months bsd
8                 aa adn sd V15
9                           NaN
10                     aasd V12
11                   aasd120Abs
Name: Name, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM