简体   繁体   English

排序与字母混合的实数列表

[英]Sort list of real numbers mixed with letters

I have a list of data that I must sort, and sadly the naming scheme for these objects are not very consistent. 我有一个必须排序的数据列表,遗憾的是这些对象的命名方案不是很一致。 The data is a list of strings that are most often real numbers, but sometimes have a letter on the end. 数据是一个字符串列表,通常是实数,但有时最后会有一个字母。 Some examples of acceptable values in this list are like the following: 此列表中可接受值的一些示例如下所示:

# this is how it should be sorted
['1', '1.1', '1.2', '2', '2.1A', '2.1B', '2.2A', '101.1', '101.2']

Since these are in a database, my first thought was to use the following django method to return the results sorted but it returns it as follows. 由于这些是在数据库中,我首先想到的是使用以下django方法返回已排序的结果,但它返回如下。

#took out unneeded code
choices = [l.number for l in Locker.objects.extra(
               select={'asnumber': 'CAST(number as BYTEA)'}).order_by('asnumber')]
print choices
==> ['1', '1.1', '101.1', '101.2', '2', '2.1A', '2.1B', '2.2A']

It sadly was unable to sort it as it should be. 遗憾的是,它无法对它进行排序。 So my new plan is to write a method that would work with the python sorted method but I'm still not sure how to go about writing this. 所以我的新计划是编写一个可以使用python sorted方法的方法,但我仍然不确定如何编写它。 I need to find a way to sort by the real number portion of the string then as a secondary sort, sort by the letter appended to the end. 我需要找到一种方法,按字符串的实数部分排序,然后作为辅助排序,按附加到结尾的字母排序。

Any advice on where to go with this? 关于去哪里的任何建议?

Let the DBMS do the sorting, that's what it is very good at. 让DBMS进行排序,这就是它非常擅长的。 You can hardly rival the performance in your application. 您几乎无法与应用程序中的性能相媲美。

If all you got is fractional numbers with A or B appended, you can simply: 如果你得到的只是附加A或B的小数,你可以简单地说:

SELECT *
FROM  (
   SELECT unnest(
    ARRAY['1', '1.1', '1.2', '2', '2.1A', '2.1B', '2.2A', '101.1', '101.2']) AS s
   ) x
ORDER  BY rtrim(s, 'AB')::numeric, s;

Orders exactly as requested, and fast, too. 完全按照要求订购,也快速订购。 The subselect with ARRAY and unnest() is just for building a quick testcase. 带有ARRAYunnest()的子选择仅用于构建快速测试用例。 The ORDER BY clause is what matters - rtrim() in the manual . ORDER BY子句是重要的 - 手册中的rtrim()

If there are other characters involved, you might want to update your question to complete the picture. 如果涉及其他字符,您可能需要更新问题以完成图片。

x = ['1', '1.1', '1.2', '2', '2.1A', '2.1B', '2.2A', '101.1', '101.2']

#sort by the real number portion

import string

letters = tuple(string.ascii_letters)

def change(x):
    if x.endswith(letters):
        return float(x[:len(x) -1])
    else:
        return float(x)

my_list = sorted(x, key = lambda k: change(k))

Result: 结果:

>>> my_list
['1', '1.1', '1.2', '2', '2.1A', '2.1B', '2.2A', '101.1', '101.2']

I prematurely generalized to arbitrary amounts of letters on the end: 我在结尾处过早概括为任意数量的字母:

from itertools import takewhile

def sort_key(value):
    cut_point = len(value) - len(list(takewhile(str.isalpha, reversed(value))))
    return (float(value[:cut_point]), value[cut_point:])

sorted((
    l.number
    for l in Locker.objects.extra(select={'asnumber': 'CAST(number as BYTEA)'})
), key = sort_key)

Split the strings into tuples - a real number (convert it to float or decimal) and an often empty string of characters. 将字符串拆分为元组 - 实数(将其转换为浮点数或十进制数)和通常为空的字符串。 If you sort the tuples, and use python's builtin sort (timesort), it should be really fast. 如果你对元组进行排序,并使用python的内置排序(timesort),它应该非常快。

Be careful if scientific notation is allowed in your reals, eg 1e10. 如果您的实物中允许使用科学记数法,请注意,例如1e10。

If there's any chance at all that there'll be additional complexity in the comparisons later, use a class instead of a tuple. 如果有任何机会,以后比较会有额外的复杂性,请使用类而不是元组。 But the tuples will likely be faster. 但元组可能会更快。 Then define one or more comparison functions (depending on if you're in python 2.x or 3.x). 然后定义一个或多个比较函数(取决于你是否在python 2.x或3.x)。

Tuples compare element 0, then element 1, etc. 元组比较元素0,然后是元素1等。

Your class alternative would need to have a cmp method or the 3.x equivalent. 您的类替代方案需要具有cmp方法或3.x等效方法。

Storing the string as a string and then parsing it to sort it seems like the wrong approach. 将字符串存储为字符串然后解析它以对其进行排序似乎是错误的方法。 If what you really have there is 如果你真的拥有它

  • major number 主要数字
  • minor number 次要号码
  • optional revision 可选修订

Then I would strongly suggest storing it as two integers and a text field. 然后我强烈建议将其存储为两个整数和一个文本字段。 Sorting on major_number, minor_number, revision would work exactly as expected. 对major_number,minor_number进行排序,修订版将完全按预期工作。 You could either define the asnumber as a view at the database level or as a class based on the three base numbers with an associated __cmp__() . 您可以将asnumber定义为数据库级别的视图,也可以定义为基于具有关联__cmp__()的三个基本数字的类。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM