简体   繁体   English

如何对列表中的类似项目进行分组?

[英]How to group similar items in a list?

I am looking to group similar items in a list based on the first three characters in the string. 我希望根据字符串中的前三个字符对列表中的类似项进行分组。 For example: 例如:

test = ['abc_1_2', 'abc_2_2', 'hij_1_1', 'xyz_1_2', 'xyz_2_2']

How can I group the above list items into groups based on the first grouping of letters (eg 'abc' )? 如何根据第一组字母(例如'abc' )将上述列表项分组? The following is the intended output: 以下是预期输出:

output = {1: ('abc_1_2', 'abc_2_2'), 2: ('hij_1_1',), 3: ('xyz_1_2', 'xyz_2_2')}

or 要么

output = [['abc_1_2', 'abc_2_2'], ['hij_1_1'], ['xyz_1_2', 'xyz_2_2']]

I have tried using itertools.groupby to accomplish this without success: 我尝试使用itertools.groupby来完成此操作但没有成功:

>>> import os, itertools
>>> test = ['abc_1_2', 'abc_2_2', 'hij_1_1', 'xyz_1_2', 'xyz_2_2']
>>> [list(g) for k.split("_")[0], g in itertools.groupby(test)]
[['abc_1_2'], ['abc_2_2'], ['hij_1_1'], ['xyz_1_2'], ['xyz_2_2']]

I have looked at the following posts without success: 我查看了以下帖子但没有成功:

How to merge similar items in a list . 如何合并列表中的类似项目 The example groups similar items (eg 'house' and 'Hose' ) using an approach that is overly complicated for my example. 该示例使用对我的示例过于复杂的方法对类似项目(例如'house''Hose' )进行分组。

How can I group equivalent items together in a Python list? 如何在Python列表中将等效项组合在一起? . This is where I found the idea for the list comprehension. 这是我找到列表理解的想法。

The .split("_")[0] part should be inside a single-argument function that you pass as the second argument to itertools.groupby . .split("_")[0]部分应该在单个参数函数中,作为第二个参数传递给itertools.groupby

>>> import os, itertools
>>> test = ['abc_1_2', 'abc_2_2', 'hij_1_1', 'xyz_1_2', 'xyz_2_2']
>>> [list(g) for _, g in itertools.groupby(test, lambda x: x.split('_')[0])]
[['abc_1_2', 'abc_2_2'], ['hij_1_1'], ['xyz_1_2', 'xyz_2_2']]
>>>

Having it in the for ... part does nothing since the result is immediately discarded. 将它放在for ...部分中什么都不做,因为结果立即被丢弃。


Also, it would be slightly more efficient to use str.partition when you only want a single split: 此外,当您只需要一次拆分时,使用str.partition会稍微有点效率:

[list(g) for _, g in itertools.groupby(test, lambda x: x.partition('_')[0])]

Demo: 演示:

>>> from timeit import timeit
>>> timeit("'hij_1_1'.split('_')")
1.3149855638076913
>>> timeit("'hij_1_1'.partition('_')")
0.7576401470019234
>>>

This isn't a major concern as both methods are pretty fast on small strings, but I figured I'd mention it. 这不是一个主要问题,因为这两种方法在小字符串上都非常快,但我想我会提到它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM