简体   繁体   English

排序和匹配Python列表

[英]Sorting and Matching a Python list

I recently asked a similar question but need to go a little deeper. 我最近问了一个类似的问题,但需要更深入一点。

Essentially, I am reading a directory of files and appending everything to a list called filelistname 本质上,我正在读取文件目录,并将所有内容附加到名为filelistname的列表中

I am trying to sort this list by the diskcount (-#disk-), and running a function against that sorted list. 我试图按磁盘计数 (-# disk- )对该列表进行排序,并针对该排序后的列表运行一个函数。

Thanks for your help. 谢谢你的帮助。


Here is an example - 这是一个例子-

 In []: filelistname
Out []: ['C:\Test3\ARRAY05-2NODE-RAID1-12disk-128k-0-segmented.xlsx'
         'C:\Test1\ARRAY05-2NODE-RAID1-17disk-128k-0-segmented.xlsx',
         'C:\Test4\ARRAY05-2NODE-RAID1-25disk-128k-0-segmented.xlsx',
         'C:\Test2\ARRAY05-2NODE-RAID1-18disk-128k-0-segmented.xlsx',
         'C:\Test1\ARRAY05-2NODE-RAID1-12disk-32k-0-segmented.xlsx',
         'C:\Test6\ARRAY05-2NODE-RAID1-25disk-32k-0-segmented.xlsx',
         'C:\Test2\ARRAY05-2NODE-RAID1-12disk-64k-0-segmented.xlsx',
         'C:\Test5\ARRAY05-2NODE-RAID1-12disk-64k-100-segmented.xlsx']

An output for this would look something like this. 这样的输出看起来像这样。

A group 一个小组

  C:\Test3\ARRAY05-2NODE-RAID1-12disk-128k-0-segmented.xlsx
  C:\Test1\ARRAY05-2NODE-RAID1-17disk-128k-0-segmented.xlsx
  C:\Test2\ARRAY05-2NODE-RAID1-18disk-128k-0-segmented.xlsx

Another gorup 另一种

  C:\Test4\ARRAY05-4NODE-RAID1-25disk-128k-0-segmented.xlsx

Another group 另一组

  C:\Test1\ARRAY05-2NODE-RAID1-12disk-32k-0-segmented.xlsx
  C:\Test6\ARRAY05-2NODE-RAID1-25disk-32k-0-segmented.xlsx

Another Group 另一组

  C:\Test2\ARRAY05-2NODE-RAID1-12disk-64k-0-segmented.xlsx

Another group 另一组

  C:\Test5\ARRAY05-2NODE-RAID1-12disk-64k-100-segmented.xlsx

I'm currently playing with this, but having trouble identifying a correct key. 我目前正在玩这个游戏,但是无法确定正确的钥匙。

import os
from itertools import groupby
from collections import defaultdict

key_fn = lambda s: s.rsplit('-',4)[0]

filelistname = sorted(filelistname, key=key_fn)
print(key)

for key, grouped_file_names in groupby(filelistname, key=key_fn):
    print('\n'.join(list(grouped_file_names)))
    print("")

You seem to be grouping by d+k-d+ so split the basename and use those as the keys: 您似乎正在按d+k-d+分组,因此请拆分基本名称并将其用作键:

from collections import defaultdict
d = defaultdict(list)

for sub in l:
    spl = sub.rsplit("-", 3)
    k = spl[-3],spl[-2]
    d[k].append(sub)

Output: 输出:

from pprint import pprint as pp

pp(d)

{ ('128k', '0'): [ 'C:\\Test3\\ARRAY05-2NODE-RAID1-12disk-128k-0-segmented.xlsxC:\\Test1\\ARRAY05-2NODE-RAID1-17disk-128k-0-segmented.xlsx',
                   'C:\\Test4\\ARRAY05-2NODE-RAID1-25disk-128k-0-segmented.xlsx',
                   'C:\\Test2\\ARRAY05-2NODE-RAID1-18disk-128k-0-segmented.xlsx'],
  ('32k', '0'): [ 'C:\\Test1\\ARRAY05-2NODE-RAID1-12disk-32k-0-segmented.xlsx',
                  'C:\\Test6\\ARRAY05-2NODE-RAID1-25disk-32k-0-segmented.xlsx'],
  ('64k', '0'): ['C:\\Test2\\ARRAY05-2NODE-RAID1-12disk-64k-0-segmented.xlsx'],
  ('64k', '100'): [ 'C:\\Test5\\ARRAY05-2NODE-RAID1-12disk-64k-100-segmented.xlsx']}

If you want all but the disk part: 如果只需要磁盘部分,则:

from collections import defaultdict
from os import path
from ntpath import basename
d = defaultdict(list)

for sub in l:
    spl = basename(sub).rsplit("-", 5)
    k = spl[0]+"-" + "-".join(spl[3:5])
    d[k].append(sub)

Output: 输出:

{'ARRAY05-2NODE-128k-0': ['C:\\Test3\\ARRAY05-2NODE-RAID1-12disk-128k-0-segmented.xlsx',
                          'C:\\Test1\\ARRAY05-2NODE-RAID1-17disk-128k-0-segmented.xlsx',
                          'C:\\Test4\\ARRAY05-2NODE-RAID1-25disk-128k-0-segmented.xlsx',
                          'C:\\Test2\\ARRAY05-2NODE-RAID1-18disk-128k-0-segmented.xlsx'],
 'ARRAY05-2NODE-32k-0': ['C:\\Test1\\ARRAY05-2NODE-RAID1-12disk-32k-0-segmented.xlsx',
                         'C:\\Test6\\ARRAY05-2NODE-RAID1-25disk-32k-0-segmented.xlsx'],
 'ARRAY05-2NODE-64k-0': ['C:\\Test2\\ARRAY05-2NODE-RAID1-12disk-64k-0-segmented.xlsx'],
 'ARRAY05-2NODE-64k-100': ['C:\\Test5\\ARRAY05-2NODE-RAID1-12disk-64k-100-segmented.xlsx']}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM