python排序字符串列表使用正则表达式与不同数量的模式匹配

Question

Just starting to work on python and having difficulties sorting string list by multiple/varying number of matches. 刚开始使用python并且难以通过多个/不同数量的匹配对字符串列表进行排序。 Basically, given a list of strings, I need to split each string by a given regex (user provided) then sort by given list of keys (locations). 基本上，给定一个字符串列表，我需要按给定的正则表达式（用户提供）拆分每个字符串，然后按给定的键列表（位置）排序。 The key can either be single integer or a list in the order in which they should be sorted. 密钥可以是单个整数，也可以是按顺序排列的列表。 For example: 例如：

regex = r'. regex = r'。 (FF|TT|SS)_([-.\\d]+v)_([-.\\d]+c)_(FF|TT|SS). （FF | TT | SS） - （[ - \\ d] + C）_（[\\ d] + V）_ _（FF | TT | SS）。 ' “

key = [2,1,3] key = [2,1,3]

Would sort the list of strings by location2, location1, location3. 将按位置2，位置1，位置3对字符串列表进行排序。

I have the following that works for a fixed number of locations/keys, but can't figure out how to get it to work of varying number of 'keys': 我有以下适用于固定数量的位置/键，但无法弄清楚如何让它工作不同数量的'键'：

import re

strlist = ["synopsys_SS_2v_-40c_SS.lib","synopsys_SS_1v_-40c_SS.lib","synopsys_SS_2v_-40c_TT.lib","synopsys_FF_3v_-40c_FF.lib", "synopsys_TT_4v_125c_TT.lib", "synopsys_TT_1v_-40c_TT.lib"]
regex = r'.*(FF|TT|SS)_([-\.\d]+v)_([-\.\d]+c)_(FF|TT|SS).*'
key = [2,1,3]

sfids_single = sorted(strlist, key=lambda name: ( 
  re.findall(regex,name)[0][key[0]], 
  re.findall(regex,name)[0][key[1]],
  re.findall(regex,name)[0][key[2]]))

Tried the following but it does not seem to work: 尝试以下但它似乎不起作用：

fids_single = sorted(strlist, key=lambda name: (re.findall(regex,name)[0][i] for i in key))

Also tried (w/o success): 也试过（没有成功）：

for i in key:
  strlist.sort(key=lambda name: re.findall(regex,name)[0][key[i]])

Expected result: 预期结果：

['synopsys_SS_1v_-40c_SS.lib', 'synopsys_TT_1v_-40c_TT.lib', 'synopsys_SS_2v_-40c_SS.lib', 'synopsys_SS_2v_-40c_TT.lib', 'synopsys_FF_3v_-40c_FF.lib', 'synopsys_TT_4v_125c_TT.lib']

Am I on the wrong track completely? 我完全走错了路吗？ Any guidance is greatly appreciated. 非常感谢任何指导。

Answer 1

Write a key function that will return the relevent portions of each string, in order of precedence, and use that function for the sort key. 编写一个键函数，按优先顺序返回每个字符串的相关部分，并将该函数用于排序键。

one = ["synopsys_SS_2v_-40c_SS.lib","synopsys_SS_1v_-40c_SS.lib",
       "synopsys_SS_2v_-40c_TT.lib","synopsys_FF_3v_-40c_FF.lib",
       "synopsys_TT_4v_125c_TT.lib", "synopsys_TT_1v_-40c_TT.lib"]    

expected = ['synopsys_SS_1v_-40c_SS.lib', 'synopsys_TT_1v_-40c_TT.lib',
            'synopsys_SS_2v_-40c_SS.lib', 'synopsys_SS_2v_-40c_TT.lib',
            'synopsys_FF_3v_-40c_FF.lib', 'synopsys_TT_4v_125c_TT.lib']

Using your regular expression to split the string; 使用正则表达式拆分字符串;

import operator, re
pattern = r'.*(FF|TT|SS)_([-\.\d]+v)_([-\.\d]+c)_(FF|TT|SS).*'
rx = re.compile(pattern)
seq = [2,1,3]
def key(item, seq = seq):
    seq = operator.itemgetter(*seq)
    a, b, c, d = rx.findall(item)
    return seq([a, b, c, d])


one.sort(key = key)
assert one == expected

The key function can be written without using a regular expression which may make it a bit less complicated. 可以在不使用正则表达式的情况下编写关键函数，这可以使其稍微复杂一些。

def key(item, seq = seq):
    seq = operator.itemgetter(*seq)
    _, a, b, c, d = item.split('_')
    d, _ = d.split('.')
    print a, b, c, d
    return seq([a, b, c, d])

You may want to use names that are more descriptive than a, b, c, d . 您可能希望使用比a, b, c, d更具描述性的名称。 It relies on the strings having the same pattern . 它依赖于具有相同模式的字符串。

Answer 2

Many thanks to @a_guest for providing the missing piece of the puzzle. 非常感谢@a_guest提供了拼图的缺失部分。 Here's the working solution: 这是工作解决方案：

fids_single = sorted(strlist, key=lambda name: tuple(re.findall(regex,name)[0][i] for i in key))

python排序字符串列表使用正则表达式与不同数量的模式匹配

问题描述

2 个解决方案

解决方案1
2 2017-06-28 19:03:02

解决方案2
1 已采纳 2017-06-29 02:25:38

python排序字​​符串列表使用正则表达式与不同数量的模式匹配

问题描述

2 个解决方案

解决方案1 2 2017-06-28 19:03:02

解决方案2 1 已采纳 2017-06-29 02:25:38

python排序字符串列表使用正则表达式与不同数量的模式匹配

解决方案1
2 2017-06-28 19:03:02

解决方案2
1 已采纳 2017-06-29 02:25:38