简体   繁体   English

按条件将列表分成多个块

[英]Split list into chunks by condition

I have a list like: 我有一个类似的清单:

["asdf-1-bhd","uuu-2-ggg","asdf-2-bhd","uuu-1-ggg","asdf-3-bhd"]

that I want to split into the two groups who's elements are equal after I remove the number: 在删除数字后,我想分成元素相等的两组:

"asdf-1-bhd", "asdf-2-bhd", "asdf-3-bhd"
"uuu-2-ggg" , uuu-1-ggg"

I have been using itertools.groupby with 我一直在使用itertools.groupby

for key, group in itertools.groupby(elements, key= lambda x : removeIndexNumber(x)):

but this does not work when the elements to be grouped are not consecutive. 但这在要分组的元素不连续时不起作用。

I have thought about using list comprehensions, but this seems impossible since the number of groups is not fixed. 我已经考虑过使用列表推导,但是由于组数不是固定的,所以这似乎是不可能的。

tl;dr: tl; dr:

I want to group stuff, two problems: 我想对东西进行分组,有两个问题:

  1. I don't know the number of chunks I will obtain 我不知道我会得到多少块
  2. I the elements that will be grouped into a chunk might not be consecutive 我将被分组为大块的元素可能不是连续的

Why don't you think about it a bit differently. 您为什么不对此有所不同。 You can map everyting into a dict: 您可以将所有内容映射到字典中:

import re
from collections import defaultdict
regex = re.compile('([a-z]+\-)\d(\-[a-z]+)')

t = ["asdf-1-bhd","uuu-2-ggg","asdf-2-bhd","uuu-1-ggg","asdf-3-bhd"]

maps = defaultdict(list)

for x in t:
    parts = regex.match(x).groups()
    maps[parts[0]+parts[1]].append(x)

Output: 输出:

[['asdf-1-bhd', 'asdf-2-bhd', 'asdf-3-bhd'], ['uuu-2-ggg', 'uuu-1-ggg']]

This is really fast because you don't have to compare one thing to another. 这确实非常快,因为您不必将一件事与另一件事进行比较。

Edit: 编辑:

On Thinking differently 论不同的思维

Your original approach was to iterate through each item and compare them to one another. 您最初的方法是遍历每一项并将它们相互比较。 This is overcomplicated and unnecessary. 这过于复杂且不必要。

Let's consider what my code does. 让我们考虑一下我的代码的作用。 First it gets the stripped down version: 首先,它获得精简版:

"asdf-1-bhd" -> "asdf--bhd"
"uuu-2-ggg" -> "uuu--ggg"
"asdf-2-bhd" -> "asdf--bhd"
"uuu-1-ggg" -> "uuu--ggg"
"asdf-3-bhd" -> "asdf--bhd"

You can already start to see the groups, and we haven't compared anything yet! 您已经可以开始查看组了,我们还没有任何比较!

We now do a sort of reverse mapping. 现在,我们进行某种反向映射。 We take everything thing on the right and make it a key, and anything on the left and put it in a list that is mapped by its value on the left: 我们将所有事物放在右边,并使其成为键,将任何事物放在左边,并将其放在由其值在左侧映射的列表中:

'asdf--bhd' -> ['asdf-1-bhd', 'asdf-2-bhd', 'asdf-3-bhd']
'uuu--ggg' -> ['uuu-2-ggg', 'uuu-1-ggg']

And there we have our groups defined by their common computed value (key). 在这里,我们的组由它们的公共计算值(键)定义。 This will work for any amount of elements and groups. 这将适用于任何数量的元素和组。

Ok, simple solution (it must be too late over here): 好的,简单的解决方案(在这里必须为时已晚):

Use itertools.groupby , but first sort the list. 使用itertools.groupby ,但首先sort列表进行sort

As for the example given above: 对于上面给出的示例:

elements = ["asdf-1-bhd","uuu-2-ggg","asdf-2-bhd","uuu-1-ggg","asdf-3-bhd"]
elemens.sort(key = lambda  x : removeIndex(x))
for key, group in itertools.groupby(elements, key= lambda x : removeIndexNumber(x)):
     for element in group:
         # do stuff

As you can see, the condition for sorting is the same as for grouping. 如您所见,排序的条件与分组的条件相同。 That way, the elements that will eventually have to be grouped are first put into consecutive order. 这样,首先将必须最终分组的元素按连续顺序放置。 After this has been done, itertools.groupy can work properly. 完成此操作后, itertools.groupy可以正常工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM