简体   繁体   English

如何使用 groupby 将 3 个元组列表转换为元组列表

[英]How to convert 3 tuple list to a list of tuple using groupby

I have a string below我下面有一个字符串

test = '''AWS-1 - opened at Jan 23 2010 10:30:08AM 
AWS-2 - opened at Jan 23 2010 11:04:56AM 
AWS-2 - closed at Jan 23 2010 1:18:32PM 
AWS-1 - closed at Jan 23 2010 9:43:44PM 
AWS-1 - opened at Feb 1 2010 12:40:28AM
AWS-1 - closed at Jan 23 2010 9:43:44PM
'''

My Code我的代码

import re
from itertools import groupby
y = re.findall(r'\b(\w+-\d+)\s+-\s+(\w+[-.\w]+)\s+at\s+(\w+[\s:.\w]+)\n', test)
print (y)
for key, time in groupby(y,lambda z: y[2]):
for thing in y:
    print( (y[1], key))
print (" ")

My Out我的出局

(('AWS-2', 'opened', 'Jan 23 2010 11:04:56AM '), ('AWS-2', 'closed', 'Jan 23 2010 1:18:32PM ')) (('AWS-2', 'opened', 'Jan 23 2010 11:04:56AM '), ('AWS-2', 'closed', 'Jan 23 2010 1:18:32PM ')) (('AWS-2', 'opened', 'Jan 23 2010 11:04:56AM '), ('AWS-2', 'closed', 'Jan 23 2010 1:18:32PM ')) (('AWS-2', 'opened', 'Jan 23 2010 11:04:56AM '), ('AWS-2', 'closed', 'Jan 23 2010 1:18:32PM ')) (('AWS-2', 'opened', 'Jan 23 2010 11:04:56AM '), ('AWS-2', 'closed', 'Jan 23 2010 1:18:32PM '))

Expected out does not coming AWS-1 , instead everywhere AWS-2 is coming预期不会出现AWS-1 ,而是无处不在AWS-2即将到来

(('AWS-1', 'opened', 'Jan 23 2010 10:30:08AM '), ('AWS-1', 'closed', 'Jan 23 2010 9:43:44PM '))
(('AWS-1', 'opened', 'Feb 1 2010 12:40:28AM'), ('AWS-1', 'closed', 'Feb 23 2010 9:43:44PM'))
(('AWS-2', 'opened', 'Jan 23 2010 11:04:56AM '), ('AWS-2', 'closed', 'Jan 23 2010 1:18:32PM '))

Your request is unclear, but it appears you wish to make opened-closed pairs based on parameters.您的要求不清楚,但您似乎希望根据参数制作开闭对。

Given给定

import re

import dateutil


records = """\
AWS-1 - opened at Jan 23 2010 10:30:08AM
AWS-2 - opened at Jan 23 2010 11:03:56AM
AWS-2 - closed at Jan 23 2010 1:18:32PM 
AWS-1 - closed at Feb 27 2010 9:32:50PM
AWS-1 - opened at Feb 1 2010 12:50:28AM
AWS-1 - closed at Jan 23 2010 9:32:50PM
"""

Code代码

def splitlines(s: str) -> tuple:
    """Return tuples of parsed lines: id, status, time."""
    res = []

    for line in s.split("\n"):

        if not line:
            continue            
        parsed = tuple(map(str.strip, filter(None, re.split("(\s-\s)|(at)", line))))
        id_, _, status, _, time = parsed 
        data = id_, status, dateutil.parser.parse(time)
        res.append(data)

    return tuple(res)


def pairwise_records(s: str) -> list:
    """Return paired records according to id, status and time."""
    key = lambda x: (x[0], x[2], x[1])

    sorted_recs = ((i, s, str(t)) for i, s, t in sorted(splitlines(s), key=key))

    return list(zip(sorted_recs, sorted_recs))

Demo演示

pairwise_records(records)

Output Output

[(('AWS-1', 'opened', '2010-01-23 10:30:08'),
  ('AWS-1', 'closed', '2010-01-23 21:32:50')),
 (('AWS-1', 'opened', '2010-02-01 00:50:28'),
  ('AWS-1', 'closed', '2010-02-27 21:32:50')),
 (('AWS-2', 'opened', '2010-01-23 11:03:56'),
  ('AWS-2', 'closed', '2010-01-23 13:18:32'))]

Details细节

Kudos to the OP on getting a partial answer with an elaborate regex.感谢 OP 通过精心制作的正则表达式获得部分答案。 It turns out you can do this more explicitly with more clear regex and without groupby .事实证明,您可以使用更清晰的正则表达式更明确地做到这一点,而无需groupby

splitlines

We attempt to parse the input string into tuples.我们尝试将输入字符串解析为元组。 We do this with re.split which leaves behind extra elements we don't need.我们使用re.split来做到这一点,它留下了我们不需要的额外元素。 These extras are cleaned up with filter and by unpacking into (id_, status, time) where time is parsed as a datetime object.这些额外内容使用filter进行清理,并通过解包到(id_, status, time)中,其中time被解析为datetime时间 object。 The results are tuples of parsed lines.结果是已解析行的元组。 Example:例子:

splitlines("AWS-1 - opened at Jan 23 2010 10:30:08AM")
# (('AWS-1', 'opened', datetime.datetime(2010, 1, 23, 10, 30, 8)),)

pairwise_records

We sort the tuples by id and datetime .我们按 id 和datetime对元组进行排序。 Sorting by time will naturally align pairs in order.按时间排序自然会按顺序排列对。 Example, if something opens at 9 AM, it must close at some future time;例如,如果某物在上午 9 点开门,那么它必须在未来某个时间关闭 time is naturally sorted.时间是自然排序的。 Finally, we use a "trick" with iterators to pair the results together.最后,我们使用带有迭代器的“技巧”将结果配对在一起。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM