I have list with all file names as below and i need to sort them and process in ascending order. code i used is working fine in python3 commandline but not working pyspark. Code i tried is
from datetime import datetime
def sorted_paths(paths):
paths.sort(key = lambda path: datetime.strptime(path.split('_')[2], '%Y%m%d'))
return paths
Gives an error:
Error: time data daily doesn't match the format '%Y%m%d'
Input List is as below:
file_d_20190101_htp.csv
file_d_20180401_html.csv
file_d_20200701_ksh.csv
file_d_20190301_htp.csv
Required output
file_d_20180401_html.csv
file_d_20190101_htp.csv
file_d_20190301_htp.csv
file_d_20200701_ksh.csv
You can try to use python embedded function sorted
to resolve this:
import datetime
arr = ['file_d_20190101_htp.csv',
'file_d_20180401_html.csv',
'file_d_20200701_ksh.csv',
'file_d_20190301_htp.csv']
print(sorted(arr, key=lambda x: datetime.datetime.strptime(x.split("_")[2], '%Y%m%d')))
just do this, convenient and quick:
paths = ['file_d_20180401_html.csv',
'file_d_20190301_htp.csv',
'file_d_20180401_html.csv',
'file_d_20200701_ksh.csv',
'file_d_20190101_htp.csv',
]
paths.sort() # in place sort
One way using dateutil.parser
:
import dateutil.parser as dparser
f = lambda x: dparser.parse(x, fuzzy=True)
sorted(paths, key=f)
Output:
['file_d_20180401_html.csv',
'file_d_20190101_htp.csv',
'file_d_20190301_htp.csv',
'file_d_20200701_ksh.csv']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.