简体   繁体   English

python排序结尾的数字字符串

[英]python sort strings with digits at the end

what is the easiest way to sort a list of strings with digits at the end where some have 3 digits and some have 4: 什么是最简单的方法来排序字符串列表在末尾有一些数字有3位数,有些有4位数:

>>> list = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> list.sort()
>>> print list
['asdf111', 'asdf123', 'asdf1234', 'asdf124']

should put the 1234 one on the end. 应该把1234一个放在最后。 is there an easy way to do this? 是否有捷径可寻?

is there an easy way to do this? 是否有捷径可寻?

Yes

You can use the natsort module. 您可以使用natsort模块。

>>> from natsort import natsorted
>>> natsorted(['asdf123', 'asdf1234', 'asdf111', 'asdf124'])
['asdf111', 'asdf123', 'asdf124', 'asdf1234']

Full disclosure, I am the package's author. 完全披露,我是包的作者。

is there an easy way to do this? 是否有捷径可寻?

No 没有

It's perfectly unclear what the real rules are. 完全不清楚真正的规则是什么。 The "some have 3 digits and some have 4" isn't really a very precise or complete specification. “有些有3位数,有些有4位”并不是一个非常精确或完整的规范。 All your examples show 4 letters in front of the digits. 您的所有示例都在数字前面显示4个字母。 Is this always true? 这总是如此吗?

import re
key_pat = re.compile(r"^(\D+)(\d+)$")
def key(item):
    m = key_pat.match(item)
    return m.group(1), int(m.group(2))

That key function might do what you want. 这个key功能可能会做你想要的。 Or it might be too complex. 或者它可能太复杂了。 Or maybe the pattern is really r"^(.*)(\\d{3,4})$" or maybe the rules are even more obscure. 或者这个模式可能真的是r"^(.*)(\\d{3,4})$"或者这些规则可能更加模糊。

>>> data= ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> data.sort( key=key )
>>> data
['asdf111', 'asdf123', 'asdf124', 'asdf1234']

The issue is that the sorting is alphabetical here since they are strings . 问题是这里的排序是按字母顺序排列的,因为它们是字符串 Each sequence of character is compared before moving to next character. 在移动到下一个字符之前比较每个字符序列。

>>> 'a1234' < 'a124'  <----- positionally '3' is less than '4' 
True
>>> 

You will need to due numeric sorting to get the desired output. 您需要进行适当的数字排序才能获得所需的输出。

>>> x = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> y = [ int(t[4:]) for t in x]
>>> z = sorted(y)
>>> z
[111, 123, 124, 1234]
>>> l = ['asdf'+str(t) for t in z]
>>> l
['asdf111', 'asdf123', 'asdf124', 'asdf1234']
>>> 
l = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
l.sort(cmp=lambda x,y:cmp(int(x[4:]), int(y[4:]))

You need a key function. 你需要一个关键功能。 You're willing to specify 3 or 4 digits at the end and I have a feeling that you want them to compare numerically. 你愿意在最后指定3或4位数字,我觉得你希望它们在数字上进行比较。

sorted(list_, key=lambda s: (s[:-4], int(s[-4:])) if s[-4] in '0123456789' else (s[:-3], int(s[-3:]))) 

Without the lambda and conditional expression that's 没有lambda和条件表达式

def key(s):
    if key[-4] in '0123456789':
         return (s[:-4], int(s[-4:]))
    else:
         return (s[:-3], int(s[-3:]))

sorted(list_, key=key)

This just takes advantage of the fact that tuples sort by the first element, then the second. 这只是利用了元组按第一个元素排序,然后是第二个元素的事实。 So because the key function is called to get a value to compare, the elements will now be compared like the tuples returned by the key function. 因此,调用key函数来获取要比较的值,现在将比较键函数返回的元组来比较元素。 For example, 'asdfbad123' will compare to 'asd7890' as ('asdfbad', 123) compares to ('asd', 7890) . 例如, 'asdfbad123'将比较'asd7890'('asdfbad', 123)进行比较,以('asd', 7890) If the last 3 characters of a string aren't in fact digits, you'll get a ValueError which is perfectly appropriate given the fact that you passed it data that doesn't fit the specs it was designed for. 如果字符串的最后3个字符实际上不是数字,那么你将获得一个非常合适的ValueError,因为你传递的数据不符合它的设计规格。

What you're probably describing is called a Natural Sort , or a Human Sort. 您可能描述的内容称为自然排序或人类排序。 If you're using Python, you can borrow from Ned's implementation . 如果你正在使用Python,你可以借用Ned的实现

The algorithm for a natural sort is approximately as follows: 自然排序的算法大致如下:

  • Split each value into alphabetical "chunks" and numerical "chunks" 将每个值拆分为按字母顺序排列的“块”和数字“块”
  • Sort by the first chunk of each value 按每个值的第一个块排序
    • If the chunk is alphabetical, sort it as usual 如果块是按字母顺序排列的,请像往常一样对其进行排序
    • If the chunk is numerical, sort by the numerical value represented 如果块是数字,则按表示的数值排序
  • Take the values that have the same first chunk and sort them by the second chunk 获取具有相同第一个块的值,并按第二个块对它们进行排序
  • And so on 等等

rather than splitting each line myself, I ask python to do it for me with re.findall() : 而不是自己拆分每一行,我要求python使用re.findall()为我做这个:

import re
import sys

def SortKey(line):
  result = []
  for part in re.findall(r'\D+|\d+', line):
    try:
      result.append(int(part, 10))
    except (TypeError, ValueError) as _:
      result.append(part)
  return result

print ''.join(sorted(sys.stdin.readlines(), key=SortKey)),
L.sort(key=lambda s:int(''.join(filter(str.isdigit,s[-4:]))))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM