简体   繁体   中英

Sorting Non-English Characters Alphabetically(ç, ş, ö) (Python-3x)

There is a array like that ---> [[text1, number1], [text2, number2]...]

I want to sort to this array(by first elements(texts)). This texts contains different characters such as ı, ç, ö... I found to location method but i couldn't use it. Also I want to describe own function.

Try using the sort method and specify the key as the first entry of every element in the list

import unicodedata 
def strip_accents(text):
    print("HIIIIIIII: %s"%text)
    return ''.join(char for char in
                   unicodedata.normalize('NFKD', text)
                   if unicodedata.category(char) != 'Mn')

l=[['ç', 2], ['ç', 10], ['a', 3], ['b', 1], ['d', 7]]
print(sorted(l, key=lambda k: strip_accents(k[0])))
print(l)
# [['a', 3], ['b', 1], ['ç', 2], ['ç', 10], ['d', 7]]

Refrences:

Syntax behind sorted(key=lambda: ...)

https://stackoverflow.com/a/4512721/8692977

Usually characters such as ç are alphabetically sorted after their non accentuated counterparts ( c ). In that sense, the encoding you are using may or may not help regarding sorting alphabetically.

I would suggest first defining your order relation:

[ a , á , à , b , c , ç , ...]

Implementing a < compare function based on that relation, and using one of the many available sorting methods to sort your list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM