简体   繁体   English

根据每个组的第一个元素从列表列表中收集元素

[英]Collect elements from a list of lists based on the first elements of each group

I have a list 我有一个清单

mainlist = [['a','online',20],
            ['a','online',22],
            ['a','offline',26],
            ['a','online',28],
            ['a','offline',31],
            ['a','online',32],
            ['a','online',33],
            ['a','offline',34]]

I want to get a min of the 3rd element if the 2nd element is 'online' and the next 'offline' value as the 4th element. 如果第二个元素是'online' ,而下一个'offline'值是第四个元素,我想获得第三个元素'offline'最小值。 Iteration should happen till the end of the list. 迭代应该进行到列表的末尾。

Final output should be 最终输出应为

outputlist = [['a', 'online', 20, 26], ['a', 'online', 28, 31], ['a', 'online', 32, 34]]

I tried the code below but it didn't help me: 我尝试了下面的代码,但没有帮助我:

from itertools import product

for a, b in product(mainlist,mainlist):
    if a[1] == 'online':
        minvalue=min(a, key=lambda x:x[2])
    if b[1] == 'offline' and b[2] >=minvalue[2]:
        maxvalue=min(b, key=lambda x:x[2])

seems like your looking for consecutive streak of 'online' 好像您正在寻找连续的“在线”连胜纪录

just iterate the list from start to finish, and remember when 'online' started, and at the next 'offline', add this streak to the result: 只需从头到尾遍历该列表,并记住“联机”何时开始,然后在下一个“脱机”中将此条纹添加到结果中:

mainlist = [['a', 'online', 20], ['a', 'online', 22], ['a', 'offline', 26], ['a', 'online', 28], ['a', 'offline', 31], ['a', 'online', 32], ['a', 'online', 33], ['a', 'offline', 34]]

output = []
first_online = -1
for item, status, num in mainlist:
    if status == 'online':
        if first_online == -1:
            first_online = num
    elif status == 'offline':
        output.append([item, 'online', first_online, num])
        first_online = -1

print(output)

This is one approach using iter 这是使用iter一种方法

Ex: 例如:

mainlist=iter([['a','online',20],['a','online',22],['a','offline',26],['a','online',28],['a','offline',31],['a','online',32],['a','online',33],['a','offline',34]])

result = []
for i in mainlist:
    if i[1] == 'online':
        result.append(i)
        while True:
            i = next(mainlist)
            if i[1] == "offline":
                result[-1].append(i[-1])
                break

Output: 输出:

[['a', 'online', 20, 26], ['a', 'online', 28, 31], ['a', 'online', 32, 34]]

We can use itertools.groupby to group consecutive lists that have same 2nd elements, 'online' or 'offline' , with the help of itertools.itemgetter , and then just collect the necessary output lists: 我们可以用itertools.groupby到具有相同的第二元素组连续的名单, 'online''offline' ,的帮助下itertools.itemgetter ,然后只收集必要的输出列表:

from itertools import groupby
from operator import itemgetter

mainlist = [['a', 'online', 20],
            ['a', 'online', 22],
            ['a', 'offline', 26],
            ['a', 'online', 28],
            ['a', 'offline', 31],
            ['a', 'online', 32],
            ['a', 'online', 33],
            ['a', 'offline', 34]]
result = []
for key, group in groupby(mainlist, key=itemgetter(1)):
    if key == 'online':
        output = min(group, key=itemgetter(2)).copy()
        # or `output = next(group).copy()` if data is always sorted
    else:
        next_offline = next(group)
        output.append(next_offline[2])
        result.append(output)
print(result)
# [['a', 'online', 20, 26], ['a', 'online', 28, 31], ['a', 'online', 32, 34]]

I find this version better than the other ones presented here as the code is not deeply nested and doesn't use "flag" variables. 我发现此版本比此处介绍的其他版本更好,因为代码未深层嵌套且不使用“标志”变量。


Further improvements: 进一步改进:

As Guido van Rossum said: " Tuples are for heterogeneous data, list are for homogeneous data. " But right now your lists keep heterogeneous data. 正如Guido van Rossum所说: 元组用于异构数据,列表用于异构数据。但是现在,您的列表保留了异构数据。 I suggest using namedtuple which allows to easier distinguish between the fields. 我建议使用namedtuple ,它可以更轻松地区分字段。 I'm gonna use the typed version from typing module, but you are free to use the one from collections . 我将使用来自typing模块的打字版本 ,但您可以自由使用来自collections For example, it could look like this: 例如,它可能看起来像这样:

from typing import NamedTuple


class Record(NamedTuple):
    process: str
    status: str
    time: int


class FullRecord(NamedTuple):
    process: str
    status: str
    start: int
    end: int

We can get the list of Record s from your list of lists easily by using itertools.starmap : 我们可以使用itertools.starmap从您的列表列表中轻松获取Record的列表:

from itertools import starmap

records = list(starmap(Record, mainlist))
# [Record(process='a', status='online', time=20),
#  Record(process='a', status='online', time=22),
#  Record(process='a', status='offline', time=26),
#  Record(process='a', status='online', time=28),
#  Record(process='a', status='offline', time=31),
#  Record(process='a', status='online', time=32),
#  Record(process='a', status='online', time=33),
#  Record(process='a', status='offline', time=34)]

and then let's wrap the first code example in a generator function , and replace some parts of it to reflect the changes in input data: 然后,将第一个代码示例包装在generator函数中 ,并替换其中的某些部分以反映输入数据中的更改:

def collect_times(values):
    for key, group in groupby(values, key=Record.status.fget):
        if key == 'online':
            min_online_record = next(group)
        else:
            next_offline_record = next(group)
            yield FullRecord(process=min_online_record.process,
                             status='online',
                             start=min_online_record.time,
                             end=next_offline_record.time)


result = list(collect_times(records))
# [FullRecord(process='a', status='online', start=20, end=26),
#  FullRecord(process='a', status='online', start=28, end=31),
#  FullRecord(process='a', status='online', start=32, end=34)]

This is it, now the code looks more self-explanatory than before. 就是这样,现在的代码看起来比以前更容易解释。 We can see which field goes where, and they are referenced by names, not indices. 我们可以看到哪个字段在哪里,它们是通过名称而不是索引来引用的。

Note that as your data is sorted, I write min_online_record = next(group) , but if it is not always the case, you should write min_online_record = min(group, key=Record.time.fget) instead. 请注意,在对数据进行排序时,我会写min_online_record = next(group) ,但如果并非总是如此,则应改写min_online_record = min(group, key=Record.time.fget)

Also, if you are interested, note that there is duplication of fields in Record and FullRecord . 另外,如果您有兴趣,请注意RecordFullRecord的字段重复。 You could circumvent that by inheriting from a parent class with two fields process and status , but inheriting from a namedtuple is not really pretty . 您可以通过从具有两个字段processstatus的父类继承来避免这种status ,但是从namedtuple继承并不是一件很漂亮的事情 So, if you do that, use dataclass instead. 因此,如果这样做,请改用dataclass

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM