简体   繁体   中英

Sorting a list doesn't produce the right result

python question here:

I'm running a sort function to sort some data by dates, and get incorrect output. I've prepared a short version of my code with some sample data to show the error (the full code is uninteresting and the full real data is proprietary).

Here is the code:

import operator

mylist = [['CustomerID_12345', 'TransactionID_1001', '12/31/2012'],
['CustomerID_12345', 'TransactionID_1002', '3/12/2013'],
['CustomerID_12345', 'TransactionID_1003', '1/7/2013'],
['CustomerID_12345', 'TransactionID_1004', '12/31/2012']]


sorted_list = sorted(mylist, key=operator.itemgetter(2))


print type(mylist)
print len(mylist)

for i in mylist:
    print i

print ""        # just for a line break for convenience

for i in sorted_list:
    print i

and the output is:

<type 'list'>
4
['CustomerID_12345', 'TransactionID_1001', '12/31/2012']
['CustomerID_12345', 'TransactionID_1002', '3/12/2013']
['CustomerID_12345', 'TransactionID_1003', '1/7/2013']
['CustomerID_12345', 'TransactionID_1004', '12/31/2012']

['CustomerID_12345', 'TransactionID_1003', '1/7/2013']
['CustomerID_12345', 'TransactionID_1001', '12/31/2012']
['CustomerID_12345', 'TransactionID_1004', '12/31/2012']
['CustomerID_12345', 'TransactionID_1002', '3/12/2013']

the first block is the original data and the second is the output. Since I tried to sort by date it's easy to see the sort didn't work properly.

Can someone help explain the error and suggest how to correct it? Thanks in advance :)

This is because python treats them as strings and not as dates.

This is because '1' is less than '2' which is less than '3' Also '/' is less than digits so there is your problem.

Instead try to compare them as dates, use the datetime module.

Here is a sample:

from datetime import datetime
your_date = datetime.strptime('1/1/2013', "%m/%d/%Y")
my_date = datetime.strptime('12/3/2011', "%m/%d/%Y")

print your_date > my_date
[Out]: True

Sort by date:

from datetime import datetime

mylist = [['CustomerID_12345', 'TransactionID_1001', '12/31/2012'],
        ['CustomerID_12345', 'TransactionID_1002', '3/12/2013'],
        ['CustomerID_12345', 'TransactionID_1003', '1/7/2013'],
        ['CustomerID_12345', 'TransactionID_1004', '12/31/2012']]


sorted_list = sorted(mylist, key=lambda x: datetime.strptime(x[2],'%m/%d/%Y'))
for item in sorted_list:
    print item

Or you can store the date as datetime in the first place. If they are strings for good reason then you can first add a datetime column:

for item in mylist:
    item.append(datetime.strptime(item[2], '%m/%d/%Y'))
sorted_list = sorted(mylist, key=lambda x: x[3])
for item in sorted_list: print item[:3]

It's sorted correctly. You're sorting by the date field in a stupid format that doesn't sort according to the actual date. If you use the standard ISO format (YYYY-MM-DD), it will sort as you expect. Also if you use a python data structure used for date, eg from the datetime module, it will sort as you expect.

import datetime

mylist = [
    ['CustomerID_12345', 'TransactionID_1001', datetime.date(2012, 12, 13)],
    ['CustomerID_12345', 'TransactionID_1002', datetime.date(2013, 3, 12)],
    ...
]

Or, borrowing from one of the other answers. This could help you if you're reading your data somewhere and want to convert it from the original string format to the internal representation.

import datetime

mylist = [
    ['CustomerID_12345', 'TransactionID_1001',
        datetime.datetime.strptime('12/31/2012', '%m/%d/%Y').date()],
    ['CustomerID_12345', 'TransactionID_1002',
        datetime.datetime.strptime('3/12/2013', '%m/%d/%Y').date()],
    ...
]

Alternatively, using strings only...

mylist = [
    ['CustomerID_12345', 'TransactionID_1001', '2012-31-12'],
    ['CustomerID_12345', 'TransactionID_1002', '2013-03-12'],
    ...
]

If you already have an array like the one in your question, you can convert it easily:

new_list = [f1, f2, datetime.datetime.strptime(f3, '%m/%d/%Y').date()
    for f1, f2, f3 in old_list]

Just a sidenote, the M/D/YYYY (4/2/2014) format was one of the most stupid date formats ever created, only M/D/YY (4/2/14) being worse than that.

The best formats order units by descending size, as this is the direction we use for numbers as well. Those, when proper zero padding is used, can be sorted easily (2014-04-02) and that's why they found their place in computers and especially file names. The not so great formats order units by ascending size, not respecting the way we write down numbers, this system is being used in my country (today is 2.4.2014). But mishmash formats that don't sort units by size in ascending nor descending order are something we should have killed centuries ago.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM