[英]python cut a list by certain id
我有一個包含日期和 ID 的列表,例如:
olist = ['20191101_01.csv','20191101_02.csv','20191101_03.csv','20191101_04.csv','20191102_01.csv','20191102_02.csv','20191102_03.csv','20191102_04.csv','20191103_01.csv','20191103_02.csv','20191103_03.csv','20191103_04.csv']
我想通過 id 來剪切它們,例如:
nlist = [['20191101_01.csv','20191102_01.csv','20191103_01.csv','20191104_01.csv'],['20191101_02.csv','20191102_02.csv','20191103_02.csv','20191104_02.csv']......]
有沒有一種簡單而干凈的方法來做到這一點?
我建議使用字典。 然后,您可以在 o(n) 時間內實現它
olist = ['20191101_01.csv','20191101_02.csv','20191101_03.csv','20191101_04.csv','20191102_01.csv','20191102_02.csv','20191102_03.csv','20191102_04.csv','20191103_01.csv','20191103_02.csv','20191103_03.csv','20191103_04.csv']
parsed_dict = {}
for el in olist:
key = el.split('_')[1]
if parsed_dict.get(key) is None:
parsed_dict[key] = [el]
else:
parsed_dict[key].append(el)
print(parsed_dict)
編輯,根據二戰的評論更新:
from collections import defaultdict
olist = ['20191101_01.csv','20191101_02.csv','20191101_03.csv','20191101_04.csv','20191102_01.csv','20191102_02.csv','20191102_03.csv','20191102_04.csv','20191103_01.csv','20191103_02.csv','20191103_03.csv','20191103_04.csv']
parsed_dict = defaultdict(list)
for el in olist:
key = el.split('_')[1]
parsed_dict[key].append(el)
print(parsed_dict)
我會使用collections.defaultdict和列表壓縮,即:
from collections import defaultdict
olist = ['20191101_01.csv','20191101_02.csv','20191101_03.csv','20191101_04.csv','20191102_01.csv','20191102_02.csv','20191102_03.csv','20191102_04.csv','20191103_01.csv','20191103_02.csv','20191103_03.csv','20191103_04.csv']
d = defaultdict(list)
[d[x.split("_")[1].split(".")[0]].append(x) for x in olist]
print(dict(d))
{'01': ['20191101_01.csv', '20191102_01.csv', '20191103_01.csv'], '02': ['20191101_02.csv', '20191102_02.csv', '20191103_02.csv'], '03': ['20191101_03.csv', '20191102_03.csv', '20191103_03.csv'], '04': ['20191101_04.csv', '20191102_04.csv', '20191103_04.csv']}
也可以為此使用 pandas :
import pandas as pd
df = pd.DataFrame({'files':olist})
df['grouper'] = df['files'].str.split('_',expand=True)[1]
nlist = df.groupby('grouper')['files'].agg(list).tolist()
output:
[['20191101_01.csv', '20191102_01.csv', '20191103_01.csv'], ['20191101_02.csv', '20191102_02.csv', '20191103_02.csv'], ['20191101_03.csv', '20191102_03.csv', '20191103_03.csv'], ['20191101_04.csv', '20191102_04.csv', '20191103_04.csv']]
您可以使用兩個字符 id 對列表進行排序,然后使用itertools.groupby
對其進行分組。
from itertools import groupby
olist = ['20191101_01.csv','20191101_02.csv','20191101_03.csv','20191101_04.csv',
'20191102_01.csv','20191102_02.csv','20191102_03.csv','20191102_04.csv',
'20191103_01.csv','20191103_02.csv','20191103_03.csv','20191103_04.csv']
file_id = lambda filename: filename[-6:-4]
slist = sorted(olist, key=file_id)
result = [list(value) for key, value in groupby(slist, key=file_id)]
print(result)
output:
[['20191101_01.csv', '20191102_01.csv', '20191103_01.csv'],
['20191101_02.csv', '20191102_02.csv', '20191103_02.csv'],
['20191101_03.csv', '20191102_03.csv', '20191103_03.csv'],
['20191101_04.csv', '20191102_04.csv', '20191103_04.csv']]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.