简体   繁体   English

Python结合重复元素

[英]Python Combine Repeating Elements

I have a list of stings that have some repeating elements that I want to combine into a shorter list. 我有一个stings列表,它有一些重复元素,我想要组合成一个较短的列表。

The original list contents look something like this: 原始列表内容如下所示:

lst = [['0.1', '0', 'RC', '100'],
        ['0.2', '10', 'RC', '100'],
        ['0.3', '5', 'HC', '20'],
        ['0.4', '5', 'HC', '20'],
        ['0.5', '5', 'HC', '20'],
        ['0.6', '5', 'HC', '20'],
        ['0.7', '5', 'HC', '20'],
        ['0.8', '5', 'HC', '20'],
        ['0.9', '10', 'RC', '100'],
        ['1.0', '0', 'RC', '100']]

After running it through the function it would become: 通过该功能运行后,它将变为:

lst = [['0.1', '0', 'RC', '100'],
        ['0.2', '10', 'RC', '100'],
        ['0.3', '5', 'HC', '20'],
        ['0.9', '10', 'RC', '100'],
        ['1.0', '0', 'RC', '100']]

The list will always have this general structure, so essentially I want to combine the list based on whether or not the last 3 columns are exactly the same. 列表将始终具有此一般结构,因此我基本上希望根据最后3列是否完全相同来组合列表。

I want it to be a callable function so it would look some thing like: 我希望它是一个可调用的函数,所以它看起来像:

def combine_list(lst):
    if sublist[1:3] == next_sublist[1:3]:
        let.remove(next_sublist)

My initial research on this showed many methods to remove a sublist based on its index, but that is not necessarily known before hand. 我对此的初步研究显示了许多基于其索引删除子列表的方法,但这不一定是事先已知的。 I also found the re module, however I have never used it and unsure on how to implement it. 我也找到了re模块,但是我从未使用它并且不确定如何实现它。 Thank you in advanced 先谢谢你

If you want to remove sub lists that are the same for the last three elements and consecutive , you would need itertools.groupby keyed on the last three elements: 如果要删除最后三个元素和连续元素相同的子列表,则需要使用最后三个元素的itertools.groupby

from itertools import groupby
[next(g) for _, g in groupby(lst, key=lambda x: x[1:])]

#[['0.1', '0', 'RC', '100'],
# ['0.2', '10', 'RC', '100'],
# ['0.3', '5', 'HC', '20'],
# ['0.9', '10', 'RC', '100'],
# ['1.0', '0', 'RC', '100']]

Maybe just use a set to keep track of duplicates? 也许只是使用一套来跟踪重复?

def combine_list(lst):
    out = []
    seen = set()
    for item in lst:
        if not tuple(item[1:]) in seen:
            out.append(item)
            seen.add(tuple(item[1:]))
    return out

Lists are a mutable data structure. 列表是可变数据结构。 And so there is no guarantee that the contents of a list does not change over time. 因此无法保证列表内容不会随时间而变化。 That means it cannot be used in a hashing function (which the set uses). 这意味着它不能用于散列函数(集合使用)。 The tuple, on the other hand, is immutable, and hence hashable. 另一方面,元组是不可变的,因此可以清洗。

for index in range(len(lst) - 1, 0, -1):
    if lst[index][1:] == lst[index - 1][1:]:
        lst.pop(index)

By going through the list backwards, we remove the problems with indices changing when we remove elements. 通过向后浏览列表,我们在删除元素时删除索引更改的问题。 This results in an in-place reduction. 这导致就地减少。

If you'd like to make a new list, this can be done via list comprehension following the same idea, but since we're not doing it in place, we don't have to work in reverse: 如果您想创建一个新列表,可以按照相同的想法通过列表理解来完成,但由于我们没有这样做,我们不必反向工作:

lst[0] + [lst[ind] for ind in range(1, len(lst)) if lst[ind][1:] != lst[ind-1][1:]]

Again, lst[0] is trivially non-duplicate and therefore automatically included. 同样, lst[0]通常是非重复的,因此自动包含在内。

def combine_list(ls):
    cpy = ls[:]

    for i, sub in enumerate(ls[:len(ls) - 1]):
        if sub[1:] == ls[i + 1][1:]:
            cpy.remove(ls[i + 1])

    return cpy

This function should work. 这个功能应该有效。 It creates a new copy of the list, to avoid modifying the original. 它会创建列表的新副本,以避免修改原始列表。 Then it iterates over the original list (except the last value), as that stays the same. 然后它迭代原始列表(最后一个值除外),因为它保持不变。

It then checks if the last values of the list are equal to the last values of the next list. 然后检查列表的最后一个值是否等于下一个列表的最后一个值。 If they are, the next list is deleted. 如果是,则删除下一个列表。

The function then returns the new list. 然后该函数返回新列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM