简体   繁体   中英

Sort list of tuples considering locale (swedish ordering)

Apparently PostgreSQL 8.4 and Ubuntu 10.04 cannot handle the updated way to sort W and V for Swedish alphabet. That is, it's still ordering them as the same letter like this (old definition for Swedish ordering):

  • Wa
  • Vb
  • Wc
  • Vd

it should be (new definition for Swedish ordering):

  • Vb
  • Vd
  • Wa
  • Wc

I need to order this correctly for a Python/Django website I'm building. I have tried various ways to just order a list of tuples created from a Django QuerySet using *values_list*. But since it's Swedish also å, ä and ö letters needs to be correctly ordered. Now I have either one or the other way both not both..

list_of_tuples = [(u'Wa', 1), (u'Vb',2), (u'Wc',3), (u'Vd',4), (u'Öa',5), (u'äa',6), (u'Åa',7)]

print '########## Ordering One ##############'
ordered_list_one = sorted(list_of_tuples, key=lambda t: tuple(t[0].lower()))
for item in ordered_list_one:
    print item[0]

print '########## Ordering Two ##############'
locale.setlocale(locale.LC_ALL, "sv_SE.utf8")
list_of_names = [u'Wa', u'Vb', u'Wc', u'Vd', u'Öa', u'äa', u'Åa']
ordered_list_two = sorted(list_of_names, cmp=locale.strcoll)
for item in ordered_list_two:
    print item

The examples gives:

########## Ordering One ##############
Vb
Vd
Wa
Wc
äa
Åa
Öa
########## Ordering Two ##############
Wa
Vb
Wc
Vd
Åa
äa
Öa

Now, What I want is a combination of these so that both V/W and å,ä,ö ordering are correct. To be more precise. I want Ordering One to respect locale. By then using the second item (object id) in each tuple I could fetch the correct object in Django.

I'm starting to doubt that this will be possible? Would upgrading PostgreSQL to a newer version that handles collations better and then use raw SQL in Django be possible?

When running LC_ALL=sv_SE.UTF-8 sort on your example on Ubuntu-10.04, it comes out with Wa before Vb (the "old way"), so Ubuntu does not seem to agree with the "new way". Since PostgreSQL relies on the operating system for this, it will behave just the same as the OS given the same lc_collate.

There is actually a patch in debian glibc related to this particular sort issue: http://sourceware.org/bugzilla/show_bug.cgi?id=9724 But it was objected to and not accepted. If you only need this behavior on a system you administer, you can still apply the change of the patch to /usr/share/i18n/locales/sv_SE and rebuild the se_SV locale by running locale-gen sv_SE.UTF-8 . Or better yet, create your own alternative locale derived from it to avoid messing with the original.

This solution is complex because key=locale.strxfrm works fine with single lists and dictionaries, but not with lists of lists or lists of tuples.

Changes from Py2 -> Py3 : use locale.setlocale(locale.LC_ALL, '') and key='locale.strxfrm' (instead of 'cmp=locale.strcoll').

list_of_tuples = [('Wa', 1), ('Vb',2), ('Wc',3), ('Vd',4), ('Öa',5), ('äa',6), ('Åa',7)]

def locTupSorter(uLot):
    "Locale-wise list of tuples sorter - works with most European languages"
    import locale
    locale.setlocale(locale.LC_ALL, '') # get current locale

    dicTups = dict(uLot)          # list of tups to unsorted dictionary
    ssList = sorted(dicTups, key=locale.strxfrm)
    sLot = []
    for i in range(len(ssList)):  # builds a sorted list of tups 
        tfLot = ()
        elem = ssList[i]          # creates tuples for list
        tfLot = (elem, dicTups[elem])
        sLot.append(tfLot)        # creates sorted list of tuples
    return(sLot)


print("list_of_tuples=\n", list_of_tuples)
sortedLot = locTupSorter(list_of_tuples)
print("sorted list of tuples=\n",sortedLot)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM