简体   繁体   English

使用另一个列表在Python列表中对字符串进

[英]Sort strings in Python list using another list

Say I have the following lists: 说我有以下列表:

List1=['Name1','Name3','Color1','Size2','Color3','Color2','Name2','Size1', 'ID']
List2=['ID','Color1','Color2','Size1','Size2','Name1','Name2']

Each list will have element named "ID" variable and then 3 other categories (Name, Color, and Size) of which there is an unpredetermined number of elements in each category. 每个列表将具有名为“ID”变量的元素,然后是3个其他类别(名称,颜色和大小),其中每个类别中具有未确定数量的元素。

I want to sort these variables without knowing how many there will be in each category with the following 'sort list': 我想对这些变量进行排序,而不知道每个类别中将包含以下“排序列表”的数量:

SortList=['ID','Name','Size','Color']

I can get the desired output (see below) although I imagine there is a better / more pythonic way of doing so. 我可以得到所需的输出(见下文),虽然我想有更好/更pythonic的方式这样做。

>>> def SortMyList(MyList,SortList):       
...     SortedList=[]       
...     for SortItem in SortList:
...         SortItemList=[]
...         for Item in MyList:
...             ItemWithoutNum="".join([char for char in Item if char.isalpha()])  
...             if SortItem==ItemWithoutNum:
...                 SortItemList.append(Item)
...         if len(SortItemList)>1:
...             SortItemList=[SortItem+str(I) for I in range(1,len(SortItemList)+1)]
...         for SortedItem in SortItemList:
...             SortedList.append(SortedItem)
...     return SortedList
... 
>>> 
>>> SortMyList(List1, SortList)
['ID', 'Name1', 'Name2', 'Name3', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']
>>> SortMyList(List2, SortList)
['ID', 'Name1', 'Name2', 'Size1', 'Size2', 'Color1', 'Color2']
>>> 

Any suggestions as to how my methodology or my code can be improved? 有关如何改进我的方法或代码的任何建议?

You can sort the list using a custom key function, which returns a 2-tuple, for primary sorting and secondary sorting. 您可以使用自定义键功能对列表进行排序,该功能返回2元组,用于主要排序和二级排序。

Primary sorting is by the order of your "tags" (ID first, then Name, etc.). 主要排序是按照“标签”的顺序排列的(首先是ID,然后是名称等)。 Secondary sorting is by the numeric value following it. 二级排序是通过它后面的数值。

tags = ['ID','Name','Size','Color']
sort_order = { tag : i for i,tag in enumerate(tags) }

def elem_key(x):
    for tag in tags:
        if x.startswith(tag):
            suffix = x[len(tag) : ]
            return ( sort_order[tag],
                     int(suffix) if suffix else None )
    raise ValueError("element %s is not prefixed by a known tag. order is not defined" % x)

list1.sort(key = elem_key)

You can just provide the adequate key : 您只需提供足够的密钥:

List1.sort( key = lambda x : ('INSC'.index(x[0]),x[-1]))
# ['ID', 'Name1', 'Name2', 'Name3', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']

The elements will be sorted by the first letter then the last digit if exists. 元素将按第一个字母排序,然后按最后一个数字排序(如果存在)。 It works here because all first letters are different and if numbers have at most one digit. 它适用于此,因为所有首字母都不同,如果数字最多只有一位数。

EDIT 编辑

for many digits, a more obfuscated solution: 对于许多数字,一个更混淆的解决方案:

List1.sort( key =lambda x : ('INSC'.index(x[0]),int("0"+"".join(re.findall('\d+',x)))))
 # ['ID', 'Name1', 'Name2', 'Name10', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']

Is there (in this case) easier way to extract data from string than simple regexes? 是否(在这种情况下)比简单的正则表达式更容易从字符串中提取数据?

import re

def keygen(sort_list):
    return lambda elem: (
        sort_list.index(re.findall(r'^[a-zA-Z]+', elem)[0]),
        re.findall(r'\d+$', elem)
    )

Usage: 用法:

   SortList = ['ID', 'Name', 'Size', 'Color']
   List1 = ['Name1', 'Name3', 'Color1', 'Size2', 'Color3', 'Color2','Name2', 'Size1', 'ID']
   List2 = ['ID', 'Color1', 'Color2', 'Size1', 'Size2', 'Name1', 'Name2']
   sorted(List1, key=keygen(SortList))
=> ['ID', 'Name1', 'Name2', 'Name3', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']
   sorted(List2, key=keygen(SortList))
=> ['ID', 'Name1', 'Name2', 'Size1', 'Size2', 'Color1', 'Color2']

Explanation: 说明:

^[a-zA-Z]+ matches alphabetic part at the beggining, and \\d$ – numeric part at the end of string. ^[a-zA-Z]+匹配开始处的字母部分,以及字符串末尾的\\d$ - 数字部分。

keygen returns lambda that takes a string, and returns two-item tuple: keygen返回带有字符串的lambda ,并返回两项元组:
first item is position of alphabetic part in the list (no such item in list = ValueError ), 第一项是列表中字母部分的位置(list = ValueError没有这样的项目),
second is one-item list containing numeric part at the end, or empty list if string doesn't end with digit. 第二个是包含末尾数字部分的单项列表,如果字符串不以数字结尾,则为空列表。

Some possible improvements: 一些可能的改进:

  • sort_list.index call is O(n) , and it will be called for each element in list; sort_list.index调用是O(n) ,它将被调用列表中的每个元素; can be replaced with O(1) dict lookup to speed sorting up (I didn't do that to keep things simple), 可以用O(1) dict查找替换以加快排序(我没有这样做以保持简单),
  • numeric part can be convered into actual integers ( 1 < 2 < 10 , but '1' < '10' < '2' ) 数字部分可以被赋予实际整数( 1 < 2 < 10 ,但'1' < '10' < '2'

After applying those: 申请后:

import re

def keygen(sort_list):
    index = {(word, index) for index, word in enumerate(sort_slist)}
    return lambda elem: (
        index[re.findall(r'^[a-zA-Z]+', elem)[0]],
        [int(s) for s in re.findall(r'\d+$', elem)]
    )

This works as long as you know that List2 only contains strings that starts with things in sortList 只要您知道List2只包含以sortList中的内容开头的字符串,这就可以工作

List2=['ID','Color4','Color2','Size1','Size2','Name2','Name1']
sortList=['ID','Name','Size','Color']
def sort_fun(x):
    for i, thing in enumerate(sortList):
        if x.startswith(thing):
            return (i, x[len(thing):])

print sorted(List2, key=sort_fun)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM