简体   繁体   English

如何对字符串中的混淆数字列表进行排序?

[英]How to sort a list of obfuscated digits within a string?

 list=[a1b,a100b,a2b,a99b]

I would like to transform by comparing the digit sandwitched the letter a and b like below. 我想通过比较下面的字母a和b来转换数字。

 [a1b,a2b,a99b,a100b]

I tried 我试过了

 list.sort()

But it didnt work well. 但是它不能很好地工作。

How can I sort? 我该如何分类?

Option 1 选项1
natsort.natsorted
The natsort module works nicely here - natsort模块在这里可以很好地工作-

>>> from natsort import natsorted
>>> natsorted(['a1b','a100b','a2b','a99b'])
['a1b', 'a2b', 'a99b', 'a100b']

Option 2 选项2
sorted + re.search sorted + re.search
With regex , I'd recommend defining a function that calls re.search to find and extract numbers, with a little checking to ensure that no exceptions are thrown when the pattern is not found in the string. 使用regex ,建议您定义一个函数,该函数调用re.search来查找和提取数字,并进行一些检查以确保在字符串中未找到模式时不会引发异常。

import re
def f(x):
     m = re.search('\d+', x)
     return int(m.group()) if m else x

>>> sorted(['a1b','a100b','a2b','a99b'], key=f)
['a1b', 'a2b', 'a99b', 'a100b']

You can achieve some speed gain if you have a preexisting list on which you call list.sort . 如果您有一个预先存在的列表可以调用list.sort ,则可以list.sort list.sort performs an in-place sort and is going to be a bit faster than sorted because it operates in place and does not generate a copy of the data. list.sort执行就地排序,它将比已sorted快一点,因为它可以就地操作并且不会生成数据的副本。

Another thing to note is that this version of a regex based sort is more robust than a lambda . 要注意的另一件事是,此版本的基于正则表达式的排序比lambda更健壮。 It becomes possible to catch and handle exceptions, and you aren't constrained by the single line requirement of a lambda . 捕获和处理异常成为可能,并且您不受lambda的单行要求的约束。


Performance 性能

l = ['a1b','a100b','a2b','a99b'] * 10000

%timeit natsorted(l)
1 loop, best of 3: 437 ms per loop

%timeit sorted(l, key=f)
10 loops, best of 3: 92.4 ms per loop

Note that actual timings differ by versions, environment, and data. 请注意,实际时间因版本,环境和数据而异。 I have not benchmarked the other answers as they do not generalise well to arbitrarily structured input. 我没有对其他答案进行基准测试,因为它们不能很好地概括为任意结构化的输入。

You can use regular expressions to isolate the digits for the key function that you pass to list.sort or sorted : 您可以使用正则表达式来隔离传递给list.sortsorted的键函数的数字:

import re

pat = re.compile(r'a(\d+)b')  # capture group of digits between a and b
lst = ['a1b', 'a100b', 'a2b', 'a99b']
sorted(lst, key=lambda s: int(pat.search(s).group(1)))
# ['a1b', 'a2b', 'a99b', 'a100b']

You can simply extract middle value by int(s[1:-1]) as a key to compare: 您可以简单地将int(s[1:-1])作为比较的键来提取中间值:

>>> L = ['a1b','a100b','a2b','a99b']
>>> L.sort(key=lambda s: int(s[1:-1]))
>>> L
['a1b', 'a2b', 'a99b', 'a100b']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM