[英]How to sort a list of strings following a certain pattern
我想對每個字符串列表進行排序,例如:
list1 = ['3DT1_PN_DIS3D_S001', '3DT1_PN_noDIS3D_S001', '3DT1_S001', '3DT1_noPN_DIS3D_S001']
list2 = ['3DT1_noPN_DIS3D_S002', '3DT1_PN_noDIS3D_S002', '3DT1_PN_DIS3D_S002']
遵循模式[ '3DT1_S##', '3DT1_noPN_DIS3D_S##', '3DT1_PN_noDIS3D_S##', '3DT1_PN_DIS3D_S##']
結果應該是:
list1 = [ '3DT1_S001', '3DT1_noPN_DIS3D_S001', '3DT1_PN_noDIS3D_S001', '3DT1_PN_DIS3D_S001']
list2 = [ '3DT1_noPN_DIS3D_S002', '3DT1_PN_noDIS3D_S002', '3DT1_PN_DIS3D_S002']
我試着用排序的方法玩一下,但沒有運氣!
有幫助嗎?
您可以定義一個按所需順序返回元組的鍵函數 ,然后將該函數傳遞給如此sorted
的key
參數。
>>> def key_fn(x):
... tags = x.split('_')
... if tags[1][0] == 'S':
... return (0, int(tags[1][1:]))
... elif tags[1] == 'noPN':
... return (1, int(tags[3][1:]))
... elif tags[1] == 'PN':
... if tags[2] == 'noDIS3D':
... return (2, int(tags[3][1:]))
... else:
... return (3, int(tags[3][1:]))
...
>>> list1 = ['3DT1_PN_DIS3D_S001', '3DT1_PN_noDIS3D_S001', '3DT1_S001', '3DT1_noPN_DIS3D_S001']
>>> sorted(list1, key=key_fn)
['3DT1_S001', '3DT1_noPN_DIS3D_S001', '3DT1_PN_noDIS3D_S001', '3DT1_PN_DIS3D_S001']
我的兩分錢......它有一個'patternList'變量來定義順序。 這可能是實現這一目標的最簡單(最易讀,最易擴展)的方式:沒有雜亂的if-elses。 此外,具有相同起始模式的列表項按字符串的其余部分排序。
list1.sort(key = myKey)
表示對於每個列表項, myKey
函數在排序之前執行。 myKey
函數僅以正常排序將執行您想要的方式修改排序列表項以進行排序 。 在輸出排序列表中,不使用原始列表項(不是myKey
修改的myKey
)。
在下面的示例中,myKey函數將列表項拆分為兩部分,並根據patternList變量使用整數標記第一部分。 正常排序可以以您想要的方式處理返回的元組。
list1 = ['3DT1_PN_DIS3D_S001', '3DT1_PN_noDIS3D_S001', '3DT1_S001', '3DT1_noPN_DIS3D_S001']
list2 = ['3DT1_noPN_DIS3D_S002', '3DT1_PN_noDIS3D_S002', '3DT1_PN_DIS3D_S002', '3DT1_PN_DIS3D_S003', '3DT1_PN_DIS3D_S001']
def myKey(x):
# create the 'order list' for starting pattern
patternsList = [ '3DT1_S', '3DT1_noPN_DIS3D_S', '3DT1_PN_noDIS3D_S', '3DT1_PN_DIS3D_S']
for i in range(len(patternsList)): # iterate patterns in order
pattern = patternsList[i]
if x.find(pattern) == 0: # check if x starts with pattern
# return order value i and x without the pattern
return (i, x.replace(pattern, ''))
# if undefined pattern is found, put it to first
return (-1, x)
# alternatively if you want undefind to be last
# return (len(patternList)+1, x)
print list1
list1.sort(key = myKey)
print list1
print list2
list2.sort(key = myKey)
print list2
此方法通過按找到的第一個模式的索引進行排序來工作。
>>> import re
>>> list1 = ['3DT1_PN_DIS3D_S001', '3DT1_PN_noDIS3D_S001', '3DT1_S001', '3DT1_noPN_DIS3D_S001']
>>> list2 = ['3DT1_noPN_DIS3D_S002', '3DT1_PN_noDIS3D_S002', '3DT1_PN_DIS3D_S002']
>>> patterns = [ '3DT1_S', '3DT1_noPN_DIS3D_S', '3DT1_PN_noDIS3D_S', '3DT1_PN_DIS3D_S']
>>> pattern = '|'.join('(%s)'%x for x in patterns)
>>> pattern #Creates a regex pattern with each pattern as a group in order
'(3DT1_S)|(3DT1_noPN_DIS3D_S)|(3DT1_PN_noDIS3D_S)|(3DT1_PN_DIS3D_S)'
>>> def sort_key(x):
return re.match(pattern,x).lastindex
>>> list1, list2 = [sorted(l, key=sort_key) for l in (list1,list2)]
>>> list1
['3DT1_S001', '3DT1_noPN_DIS3D_S001', '3DT1_PN_noDIS3D_S001', '3DT1_PN_DIS3D_S001']
>>> list2
['3DT1_noPN_DIS3D_S002', '3DT1_PN_noDIS3D_S002', '3DT1_PN_DIS3D_S002']
這是一種方法,它采用一系列“前綴”,用於在排序之前對列表進行分組。 每個項目都會添加到與第一個匹配的組中,並且只與其匹配的第一個前綴相對應。
list1 = ['3DT1_PN_DIS3D_S001', '3DT1_PN_noDIS3D_S001', '3DT1_S001', '3DT1_noPN_DIS3D_S001']
list2 = ['3DT1_noPN_DIS3D_S002', '3DT1_PN_noDIS3D_S002', '3DT1_PN_DIS3D_S002', '3DT1_S002']
prefixes = [ '3DT1_S', '3DT1_noPN_DIS3D_S', '3DT1_PN_noDIS3D_S', '3DT1_PN_DIS3D_S']
def f(l):
result = []
for p in prefixes: # for each prefix, in order
a = [] # items in the group
b = [] # items not in the group
for x in l: # for each item
if x.startswith(p): # does the item match the prefix?
a.append(x) # add it to the group
else:
b.append(x) # add it to the "rest"
result.append(sorted(a)) # sort the group and save it for the result
l = b # continue with the non-group elements
return result
這是結果:
>>> f(list1)
[['3DT1_S001'], ['3DT1_noPN_DIS3D_S001'], ['3DT1_PN_noDIS3D_S001'], ['3DT1_PN_DIS3D_S001']]
>>> f(list2)
[['3DT1_S002'], ['3DT1_noPN_DIS3D_S002'], ['3DT1_PN_noDIS3D_S002'], ['3DT1_PN_DIS3D_S002']]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.