简体   繁体   English

对包含字符串开头和结尾带有数字的字符串的列表进行排序

[英]sort a list containing strings with digits at beginning and end of string

I need to sort a list of strings which contains digits at the beginning and end of the string, first by the beginning digits, then by the ending digits.我需要对包含字符串开头和结尾的数字的字符串列表进行排序,首先按开头数字,然后按结尾数字。 So the beginning digits have priority over the ending digits.所以开头的数字优先于结尾的数字。

For example:例如:

    l = ['900abc5', '3000abc10', '1000abc5', '1000abc10', '900abc20']

Would become:会成为:

    l = ['900abc5', '900abc20','1000abc5','1000abc10','3000abc10']

I know that l.sort() will not work here as it sorts lexicographically.我知道 l.sort() 在这里不起作用,因为它按字典顺序排序。 Any other methods I tried seemed to be excessively complicated (example: splitting the strings by matching beginning digits, then splitting again by ending digits, sorting, concatenating, and then recombining the list) Even summarizing that method shows that it is not efficient!我尝试过的任何其他方法似乎都过于复杂(例如:通过匹配开头数字分割字符串,然后通过结尾数字再次分割,排序,连接,然后重新组合列表)即使总结该方法也表明它效率不高!

Edit : after playing around with the natsort module I found that natsorted(l) solves my particular issue.编辑:在玩弄 natsort 模块后,我发现 natsorted(l) 解决了我的特定问题。

You may create a custom function to extract the numbers from string and use that function as a key to sorted() .您可以创建一个自定义函数来从字符串中提取数字并将该函数用作sorted()的键。

For example: In the below function, I am using regex to extract the number:例如:在下面的函数中,我使用正则表达式来提取数字:

import re

def get_nums(my_str):
    return list(map(int, re.findall(r'\d+', my_str)))

Refer Python: Extract numbers from a string for more alternatives.请参阅Python:从字符串中提取数字以获取更多替代方案。

Then make a call to sorted function using get_nums() as key:然后使用get_nums()作为键调用 sorted 函数:

>>> l = ['900abc5', '3000abc10', '1000abc5', '1000abc10', '900abc20']

>>> sorted(l, key=get_nums)
['900abc5', '900abc20', '1000abc5', '1000abc10', '3000abc10']

Note: Based on your example, my regex expression assume that there will be a number only at the start and the end of the string with all intermediate characters in strings as non-numeric.注意:根据你的例子,我的正则表达式假设只有在字符串的开头和结尾会有一个数字,字符串中的所有中间字符都是非数字的。

Here is an option with regex to findout the leading digits and trailing digits and use them as keys in the sorted function:这是一个带有正则表达式的选项,用于找出前导数字和尾随数字,并将它们用作sorted函数中的键:

import re
sorted(l, key = lambda x: (int(re.findall("^\d+", x)[0]), int(re.findall("\d+$", x)[0])))

# ['900abc5', '900abc20', '1000abc5', '1000abc10', '3000abc10']

Python's sorted method allows the specification of a key parameter, which should be a function that transform a list's element into a sorting value. Python 的sorted方法允许指定一个key参数,它应该是一个将列表元素转换为排序值的函数。 In your case, you want to sort by the digits in the string.在您的情况下,您希望按字符串中的数字排序。 For example '900abc5' , the key would be [900, 5] , and so on.例如'900abc5' ,键是[900, 5] ,依此类推。 So you want to pass in a key function that transform the string into the list of digits.因此,您希望传入一个将字符串转换为数字列表的key函数。

Using regular expressions, it's quite easy to extract the digits from the string.使用正则表达式,很容易从字符串中提取数字。 All you need to do is to map the digits into actual numbers, as regular expressions return string matches.您需要做的就是将数字映射到实际数字,因为正则表达式返回字符串匹配。

I believe the code below should work:我相信下面的代码应该可以工作:

import re

l = ['900abc5', '3000abc10', '1000abc5', '1000abc10', '900abc20']

def by_digits(e):
  digits_as_string = re.findall(r"\d+", e)
  return map(int, digits_as_string)

sorted(l, key=by_digits)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM