![](/img/trans.png)
[英]How to find all of the most common elements in a python list (order alphabetically in case of tie)?
[英]Python: How to find most common elements of a list of files
首先抱歉這個簡單的問題,但我無法找出解決問題的最簡單方法。
我有一個目錄有幾個不同的文件,但有共同的元素(values_25,_26,_28等)如下:
xxxxx_25.txt
xxxxx_26.txt
xxxxx_27.txt
xxxxx_28.txt
yyyyy_25.txt
yyyyy_26.txt
yyyyy_27.txt
yyyyy_29.txt
mmmmm_25.txt
mmmmm_26.txt
mmmmm_27.txt
mmmmm_30.txt
我希望得到名單
xxxxx_25.txt
yyyyy_25.txt
mmmmm_25.txt
xxxxx_26.txt
yyyyy_26.txt
mmmmm_26.txt
xxxxx_27.txt
yyyyy_27.txt
mmmmm_27.txt
xxxxx_28.txt
yyyyy_29.txt
mmmmm_30.txt
import re
list_with_file_names = 'xxxx_25.txt xxxxx_26.txt xxxxx_27.txt xxxxx_28.txt yyyyy_25.txt yyyyy_26.txt yyyyy_27.txt yyyyy_29.txt mmmmm_25.txt mmmmm_26.txt mmmmm_27.txt mmmmm_30.txt'.split()
def get_number_and_prefix(text):
g = re.match('.*(\S+)(\d+)', text)
return tuple([
int(g.group(2)),
g.group(1)])
nice_list = sorted(list_with_file_names, key=get_number_and_prefix)
從get_number_and_prefix
返回的元組將首先按數字排序,然后按前綴排序
相反,如果您想根據文件名中的數字進行分組,可以使用以下內容:
def update_dict_with_file(dict_, filename):
g = re.match('.*(\d+)', filename)
key = g.group(1)
t = dict_.setdefault(key,[])
t.append(filename)
mydict = {}
[update_dict_with_file(mydict, filename)
for filename in list_with_file_names]
mydict
現在包含來自文件名的數字作為鍵,以及帶有文件名作為值的列表
編輯
總結到目前為止的所有答案,您只需要使用一個關鍵的getter函數從列表中構建一個sorted
列表,該函數從文件名中提取您想要的任何內容。 您可以通過使用itertools
+ list comprehension的花哨的單行程序,或者更長的for
循環(在任何地方都沒有yield
?)來實現。 但是,基本上,它們都是一樣的。 沒有火箭科學。
這樣做:
list_of_files = [
'xxxxx_25.txt',
'xxxxx_26.txt',
'xxxxx_27.txt',
'xxxxx_28.txt',
'yyyyy_25.txt',
'yyyyy_26.txt',
'yyyyy_27.txt',
'yyyyy_29.txt',
'mmmmm_25.txt',
'mmmmm_26.txt',
'mmmmm_27.txt',
'mmmmm_30.txt',
]
import re
regex = re.compile('_([0-9]+)\.txt$')
def keyfn(name):
match = regex.search(name)
if match is None:
return None
else:
return match.group(1)
import itertools
for (key, group) in itertools.groupby(sorted(list_of_files,key=keyfn),keyfn):
print [x for x in group]
或者如果您想要一個列表列表,請將for
循環替換for
:
[x for g in itertools.groupby(sorted(list_of_files,key=keyfn),keyfn) for x in g[1]]
#Considering your list of files is as follows
ur_file_list = """xxxxx_25.txt
xxxxx_26.txt
xxxxx_27.txt
xxxxx_28.txt
yyyyy_25.txt
yyyyy_26.txt
yyyyy_27.txt
yyyyy_29.txt
mmmmm_25.txt
mmmmm_26.txt
mmmmm_27.txt
mmmmm_30.txt"""
#Based on the pattern, you can get the key assuming, you need the part in the
#filename (without ext) after underscore. So this will give you the part without regex
key = lambda e: os.path.splitext(e)[0].split("_")[-1]
from itertools import groupby
#On a sorted list, group on the above key function
#And generate a list of these groups
[list(group) for _, group in groupby(sorted(ur_file_list.splitlines(), key = key), key = key)]
[['xxxxx_25.txt', 'yyyyy_25.txt', 'mmmmm_25.txt'], ['xxxxx_26.txt', 'yyyyy_26.txt', 'mmmmm_26.txt'], ['xxxxx_27.txt', 'yyyyy_27.txt', 'mmmmm_27.txt'], ['xxxxx_28.txt'], ['yyyyy_29.txt'], ['mmmmm_30.txt']]
使用collections.defaultdict
非常方便。
In [1]: import re; from collections import defaultdict
In [2]: filenames
Out[2]:
['xxxxx_25.txt',
'xxxxx_26.txt',
'xxxxx_27.txt',
'xxxxx_28.txt',
'yyyyy_25.txt',
'yyyyy_26.txt',
'yyyyy_27.txt',
'yyyyy_29.txt',
'mmmmm_25.txt',
'mmmmm_26.txt',
'mmmmm_27.txt',
'mmmmm_30.txt']
In [3]: d = defaultdict(list)
In [4]: for filename in filenames:
....: m = re.search(r'_(\d+)\.txt$', filename)
....: if m:
....: d[m.group(1)].append(filename)
In [5]: [sorted(filename_list) for filename_list in d.values()]
Out[5]:
[['xxxxx_25.txt', 'yyyyy_25.txt'],
['mmmmm_26.txt', 'xxxxx_26.txt', 'yyyyy_26.txt'],
['mmmmm_27.txt', 'yyyyy_27.txt'],
['xxxxx_28.txt'],
['yyyyy_29.txt'],
['mmmmm_30.txt']]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.