I have a column of strings like below that contain date information, and I need to add leading zeros to single-digit months and days. I've run into some issues trying to do this purely with pandas.DataFrame.replace
and regular expressions.
import pandas as pd
df = pd.DataFrame({'Key':['0123456789_1/2/2019','0123456789_11/23/2019','0145892367_10/2/2019','0145892367_4/13/2019']})
df
Out[323]:
Key
0 0123456789_1/2/2019
1 0123456789_11/23/2019
2 0145892367_10/2/2019
3 0145892367_4/13/2019
For the above column, the output I'd want after reformatting would be:
Key
0 0123456789_01/02/2019
1 0123456789_11/23/2019
2 0145892367_10/02/2019
3 0145892367_04/13/2019
By now I've figured out I can do this by splitting the strings:
r = df['Key'].str.split('_|/', expand=True)
df2 = r[0] + '_' + r[1].str.zfill(2) + '/' + r[2].str.zfill(2) + '/' + r[3]
df2
Out[333]:
0 0123456789_01/02/2019
1 0123456789_11/23/2019
2 0145892367_10/02/2019
3 0145892367_04/13/2019
dtype: object
...But when I was initially trying to do it with pandas.DataFrame.replace
, the closest I was able to get was:
df2 = df.replace(r'(_|/)([1-9]/)',r'\1 0\2',regex=True)
df2
Out[335]:
Key
0 0123456789_ 01/2/2019
1 0123456789_11/23/2019
2 0145892367_10/ 02/2019
3 0145892367_ 04/13/2019
There are two problems with this that I'd like to know more about:
r'\\10\\2'
, of course I get an error because it thinks I'm trying to substitute in group 10, and there is no such group in the first regex. If I try r'(\\1)0\\2'
, it works, except it prints the literal parenthesis. Why does it do this, and how can I properly write this so that it prints group 1 immediately followed by a literal zero? Edit for clarification: I'm aware I could also fix it by parsing the dates, but I'm specifically interested in the regex solution, as a learning exercise. Also because a single replace
is much faster for large dataframes.
IIUC, you can use:
df.Key=df.Key.str.split("_").str[0]+"_"+pd.to_datetime(df.Key.str.split("_")
.str[1]).dt.strftime('%m/%d/%Y')
print(df)
Key
0 0123456789_01/02/2019
1 0123456789_11/23/2019
2 0145892367_10/02/2019
3 0145892367_04/13/2019
using datetime module
df['Key'] = df.Key.str.split('_').apply(lambda x: x[0]+'_'+datetime.strptime(x[1], "%m/%d/%Y").strftime("%m/%d/%Y"))
Output
Key
0 0123456789_01/02/2019
1 0123456789_11/23/2019
2 0145892367_10/02/2019
3 0145892367_04/13/2019
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.