I trying to sort the date inside my list, but the dates comes after a string element [EQUIP-X]
. First using regex, take the only date and tried to sort. It doesn't work!
I thought to split the string [EQUIP-X]
and Date
.
files = [filename for root, dirs, files in os.walk(path) for filename in files for date in dateList if filename.endswith(date+".log")]
for item in files:
reg = re.search(r"(.+]).(\d{2}.\d{2}.\d{4})",item)
equip = reg.group(1)
data = reg.group(2)
namefile = data+'.'+equip
print item
Sample String:
[EQUIP-4].02.05.2019.log
[EQUIP-2].01.05.2019.log
[EQUIP-1].30.04.2019.log
[EQUIP-3].29.04.2019.log
[EQUIP-1].01.05.2019.log
[EQUIP-5].30.04.2019.log
[EQUIP-1].29.04.2019.log
[EQUIP-5].30.04.2019.log
[EQUIP-3].30.04.2019.log
[EQUIP-1].29.04.2019.log
[EQUIP-2].02.05.2019.log
Following this tutorial , there is not attribute 'sort' for 'str' object, once I'm not manipulating 'date' but 'str'. What is the better way to do it? The idea was to split and handle with date and after join all
You can just sort based on the end of the string minus the last 4 characters (the file extension) parsed as a date. Since the date format is zero padded, it should always be 10 characters long hence the string splice starting from -14 (10 for date + 4 for extension)
from datetime import datetime
files = ['[EQUIP-4].02.05.2019.log',
'[EQUIP-2].01.05.2019.log',
'[EQUIP-1].30.04.2019.log',
'[EQUIP-3].29.04.2019.log',
'[EQUIP-1].01.05.2019.log',
'[EQUIP-5].30.04.2019.log',
'[EQUIP-1].29.04.2019.log',
'[EQUIP-5].30.04.2019.log',
'[EQUIP-3].30.04.2019.log',
'[EQUIP-1].29.04.2019.log',
'[EQUIP-2].02.05.2019.log']
files.sort(key=lambda x: datetime.strptime(x[-14:-4], '%d.%m.%Y'))
print(files)
['[EQUIP-3].29.04.2019.log',
'[EQUIP-1].29.04.2019.log',
'[EQUIP-1].29.04.2019.log',
'[EQUIP-1].30.04.2019.log',
'[EQUIP-5].30.04.2019.log',
'[EQUIP-5].30.04.2019.log',
'[EQUIP-3].30.04.2019.log',
'[EQUIP-2].01.05.2019.log',
'[EQUIP-1].01.05.2019.log',
'[EQUIP-4].02.05.2019.log',
'[EQUIP-2].02.05.2019.log']
The python sort
function has a key
parameter that lets you modify an element before sorting it.
This example extracts the number from the end of the string and sorts by it.
a = ['hello 123', 'pumpkin 542', 'muffin 342']
def get_important_part(string):
return int(string.split()[1])
print(sorted(a, key=get_important_part))
returns
['hello 123', 'muffin 342', 'pumpkin 542']
Why not work with strptime
and strftime
?
dates = ['02.05.2019', '20.05.2019', '11.05.2019', '30.05.2019', '08.05.2019', '09.05.2019']
dates_obj = [datetime.strptime(x,'%d.%m.%Y') for x in dates]
dates_sorted = sorted(dates_obj)
dates_sorted = [x.strftime('%d.%m.%Y') for x in dates_sorted]
print (dates_sorted)
['02/05/2019', '08/05/2019', '09/05/2019', '11/05/2019', '20/05/2019', '30/05/2019']
You can convert your list into a panda dataframe then do the sorting accordingly. Sort by year, month and day then convert the index to a list. Then display the sorted values by index (iloc).
import pandas as pd
df = pd.DataFrame([('[EQUIP-4].02.05.2019.log')
,('[EQUIP-2].01.05.2019.log')
,('[EQUIP-1].30.04.2019.log')
,('[EQUIP-3].29.04.2019.log')
,('[EQUIP-1].01.05.2019.log')
,('[EQUIP-5].30.04.2019.log')
,('[EQUIP-1].29.04.2019.log')
,('[EQUIP-5].30.04.2019.log')
,('[EQUIP-3].30.04.2019.log')
,('[EQUIP-1].29.04.2019.log')
,('[EQUIP-2].02.05.2019.log')], columns = ['file'])
df.iloc[df['file'] \
.map(lambda x: pd.to_datetime(x[-14:-4])) \
.sort_values() \
.index \
.tolist()]
Result:
file
1 [EQUIP-2].01.05.2019.log
4 [EQUIP-1].01.05.2019.log
0 [EQUIP-4].02.05.2019.log
10 [EQUIP-2].02.05.2019.log
3 [EQUIP-3].29.04.2019.log
6 [EQUIP-1].29.04.2019.log
9 [EQUIP-1].29.04.2019.log
2 [EQUIP-1].30.04.2019.log
5 [EQUIP-5].30.04.2019.log
7 [EQUIP-5].30.04.2019.log
8 [EQUIP-3].30.04.2019.log
Combining @ddg's and @Sayse's suggestion, you can try:
import re
from datetime import datetime
files = ["[EQUIP-4].02.05.2019.log", ...]
files.sort(key = lambda item: datetime.strptime(re.search(r"(?=.)(\d{2}.\d{2}.\d{4})(?=.)", item).group(0), '%d.%m.%Y'), reverse=False)
or in a more readable way:
def getSortValue(item):
reg = re.search(r"(?=.)(\d{2}.\d{2}.\d{4})(?=.)", item)
data = reg.group(0)
return datetime.strptime(data, '%d.%m.%Y')
files.sort(key = getSortValue, reverse = False)
Output:
print('\n'.join(files))
[EQUIP-3].29.04.2019.log
[EQUIP-1].29.04.2019.log
[EQUIP-1].29.04.2019.log
[EQUIP-1].30.04.2019.log
[EQUIP-5].30.04.2019.log
[EQUIP-5].30.04.2019.log
[EQUIP-3].30.04.2019.log
[EQUIP-2].01.05.2019.log
[EQUIP-1].01.05.2019.log
[EQUIP-4].02.05.2019.log
[EQUIP-2].02.05.2019.log
You can sort the filenames by using the built-in list
sort()
function, like this:
from datetime import datetime
import os # Even though not used in example code.
from pprint import pprint
import re
#files = [filename for root, dirs, files in os.walk(path) for filename in files for date in dateList if filename.endswith(date+".log")]
files = [
'[EQUIP-4].02.05.2019.log',
'[EQUIP-2].01.05.2019.log',
'[EQUIP-1].30.04.2019.log',
'[EQUIP-3].29.04.2019.log',
'[EQUIP-1].01.05.2019.log',
'[EQUIP-5].30.04.2019.log',
'[EQUIP-1].29.04.2019.log',
'[EQUIP-5].30.04.2019.log',
'[EQUIP-3].30.04.2019.log',
'[EQUIP-1].29.04.2019.log',
'[EQUIP-2].02.05.2019.log',
]
def get_date(filename):
match = re.search(r".+].(\d{2}.\d{2}.\d{4})",filename)
date_str = match.group(1)
return datetime.strptime(date_str, '%d.%m.%Y')
files.sort(key=get_date)
pprint(files)
Output:
['[EQUIP-3].29.04.2019.log',
'[EQUIP-1].29.04.2019.log',
'[EQUIP-1].29.04.2019.log',
'[EQUIP-1].30.04.2019.log',
'[EQUIP-5].30.04.2019.log',
'[EQUIP-5].30.04.2019.log',
'[EQUIP-3].30.04.2019.log',
'[EQUIP-2].01.05.2019.log',
'[EQUIP-1].01.05.2019.log',
'[EQUIP-4].02.05.2019.log',
'[EQUIP-2].02.05.2019.log']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.