简体   繁体   English

python 按某个 id 切列表

[英]python cut a list by certain id

I have a list contains dates and id, for example:我有一个包含日期和 ID 的列表,例如:

olist = ['20191101_01.csv','20191101_02.csv','20191101_03.csv','20191101_04.csv','20191102_01.csv','20191102_02.csv','20191102_03.csv','20191102_04.csv','20191103_01.csv','20191103_02.csv','20191103_03.csv','20191103_04.csv']

and I want to cut them by ids, for example:我想通过 id 来剪切它们,例如:

nlist = [['20191101_01.csv','20191102_01.csv','20191103_01.csv','20191104_01.csv'],['20191101_02.csv','20191102_02.csv','20191103_02.csv','20191104_02.csv']......]

is there a simple and clean way to do it?有没有一种简单而干净的方法来做到这一点?

I would suggest using a dict.我建议使用字典。 You can then achieve it o(n) time然后,您可以在 o(n) 时间内实现它

olist = ['20191101_01.csv','20191101_02.csv','20191101_03.csv','20191101_04.csv','20191102_01.csv','20191102_02.csv','20191102_03.csv','20191102_04.csv','20191103_01.csv','20191103_02.csv','20191103_03.csv','20191103_04.csv']
parsed_dict = {}
for el in olist:
  key = el.split('_')[1]
  if parsed_dict.get(key) is None:
    parsed_dict[key] = [el]
  else:
    parsed_dict[key].append(el)

print(parsed_dict)

edit, updated according to wwii's comment:编辑,根据二战的评论更新:

from collections import defaultdict

olist = ['20191101_01.csv','20191101_02.csv','20191101_03.csv','20191101_04.csv','20191102_01.csv','20191102_02.csv','20191102_03.csv','20191102_04.csv','20191103_01.csv','20191103_02.csv','20191103_03.csv','20191103_04.csv']
parsed_dict = defaultdict(list)
for el in olist:
  key = el.split('_')[1]
  parsed_dict[key].append(el)

print(parsed_dict)

I'd use collections.defaultdict and a list compreension , ie:我会使用collections.defaultdict列表压缩,即:

from collections import defaultdict
olist = ['20191101_01.csv','20191101_02.csv','20191101_03.csv','20191101_04.csv','20191102_01.csv','20191102_02.csv','20191102_03.csv','20191102_04.csv','20191103_01.csv','20191103_02.csv','20191103_03.csv','20191103_04.csv']
d = defaultdict(list)
[d[x.split("_")[1].split(".")[0]].append(x) for x in olist]
print(dict(d))

{'01': ['20191101_01.csv', '20191102_01.csv', '20191103_01.csv'], '02': ['20191101_02.csv', '20191102_02.csv', '20191103_02.csv'], '03': ['20191101_03.csv', '20191102_03.csv', '20191103_03.csv'], '04': ['20191101_04.csv', '20191102_04.csv', '20191103_04.csv']}

Demo演示

could also use pandas for this:也可以为此使用 pandas :

import pandas as pd
df = pd.DataFrame({'files':olist})

df['grouper'] = df['files'].str.split('_',expand=True)[1]
nlist = df.groupby('grouper')['files'].agg(list).tolist()

output: output:

[['20191101_01.csv', '20191102_01.csv', '20191103_01.csv'], ['20191101_02.csv', '20191102_02.csv', '20191103_02.csv'], ['20191101_03.csv', '20191102_03.csv', '20191103_03.csv'], ['20191101_04.csv', '20191102_04.csv', '20191103_04.csv']]

You could sort the list using the two character id and then group it using itertools.groupby .您可以使用两个字符 id 对列表进行排序,然后使用itertools.groupby对其进行分组。

from itertools import groupby

olist = ['20191101_01.csv','20191101_02.csv','20191101_03.csv','20191101_04.csv',
         '20191102_01.csv','20191102_02.csv','20191102_03.csv','20191102_04.csv',
         '20191103_01.csv','20191103_02.csv','20191103_03.csv','20191103_04.csv']

file_id = lambda filename: filename[-6:-4]

slist = sorted(olist, key=file_id)

result = [list(value) for key, value in groupby(slist, key=file_id)]

print(result)

The output: output:

[['20191101_01.csv', '20191102_01.csv', '20191103_01.csv'],
 ['20191101_02.csv', '20191102_02.csv', '20191103_02.csv'],
 ['20191101_03.csv', '20191102_03.csv', '20191103_03.csv'],
 ['20191101_04.csv', '20191102_04.csv', '20191103_04.csv']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM