[英]How to convert 3 tuple list to a list of tuple using groupby
I have a string below我下面有一个字符串
test = '''AWS-1 - opened at Jan 23 2010 10:30:08AM
AWS-2 - opened at Jan 23 2010 11:04:56AM
AWS-2 - closed at Jan 23 2010 1:18:32PM
AWS-1 - closed at Jan 23 2010 9:43:44PM
AWS-1 - opened at Feb 1 2010 12:40:28AM
AWS-1 - closed at Jan 23 2010 9:43:44PM
'''
My Code我的代码
import re
from itertools import groupby
y = re.findall(r'\b(\w+-\d+)\s+-\s+(\w+[-.\w]+)\s+at\s+(\w+[\s:.\w]+)\n', test)
print (y)
for key, time in groupby(y,lambda z: y[2]):
for thing in y:
print( (y[1], key))
print (" ")
My Out我的出局
(('AWS-2', 'opened', 'Jan 23 2010 11:04:56AM '), ('AWS-2', 'closed', 'Jan 23 2010 1:18:32PM ')) (('AWS-2', 'opened', 'Jan 23 2010 11:04:56AM '), ('AWS-2', 'closed', 'Jan 23 2010 1:18:32PM ')) (('AWS-2', 'opened', 'Jan 23 2010 11:04:56AM '), ('AWS-2', 'closed', 'Jan 23 2010 1:18:32PM ')) (('AWS-2', 'opened', 'Jan 23 2010 11:04:56AM '), ('AWS-2', 'closed', 'Jan 23 2010 1:18:32PM ')) (('AWS-2', 'opened', 'Jan 23 2010 11:04:56AM '), ('AWS-2', 'closed', 'Jan 23 2010 1:18:32PM '))
Expected out does not coming AWS-1
, instead everywhere AWS-2
is coming预期不会出现
AWS-1
,而是无处不在AWS-2
即将到来
(('AWS-1', 'opened', 'Jan 23 2010 10:30:08AM '), ('AWS-1', 'closed', 'Jan 23 2010 9:43:44PM '))
(('AWS-1', 'opened', 'Feb 1 2010 12:40:28AM'), ('AWS-1', 'closed', 'Feb 23 2010 9:43:44PM'))
(('AWS-2', 'opened', 'Jan 23 2010 11:04:56AM '), ('AWS-2', 'closed', 'Jan 23 2010 1:18:32PM '))
Your request is unclear, but it appears you wish to make opened-closed pairs based on parameters.您的要求不清楚,但您似乎希望根据参数制作开闭对。
Given给定
import re
import dateutil
records = """\
AWS-1 - opened at Jan 23 2010 10:30:08AM
AWS-2 - opened at Jan 23 2010 11:03:56AM
AWS-2 - closed at Jan 23 2010 1:18:32PM
AWS-1 - closed at Feb 27 2010 9:32:50PM
AWS-1 - opened at Feb 1 2010 12:50:28AM
AWS-1 - closed at Jan 23 2010 9:32:50PM
"""
Code代码
def splitlines(s: str) -> tuple:
"""Return tuples of parsed lines: id, status, time."""
res = []
for line in s.split("\n"):
if not line:
continue
parsed = tuple(map(str.strip, filter(None, re.split("(\s-\s)|(at)", line))))
id_, _, status, _, time = parsed
data = id_, status, dateutil.parser.parse(time)
res.append(data)
return tuple(res)
def pairwise_records(s: str) -> list:
"""Return paired records according to id, status and time."""
key = lambda x: (x[0], x[2], x[1])
sorted_recs = ((i, s, str(t)) for i, s, t in sorted(splitlines(s), key=key))
return list(zip(sorted_recs, sorted_recs))
Demo演示
pairwise_records(records)
Output Output
[(('AWS-1', 'opened', '2010-01-23 10:30:08'),
('AWS-1', 'closed', '2010-01-23 21:32:50')),
(('AWS-1', 'opened', '2010-02-01 00:50:28'),
('AWS-1', 'closed', '2010-02-27 21:32:50')),
(('AWS-2', 'opened', '2010-01-23 11:03:56'),
('AWS-2', 'closed', '2010-01-23 13:18:32'))]
Details细节
Kudos to the OP on getting a partial answer with an elaborate regex.感谢 OP 通过精心制作的正则表达式获得部分答案。 It turns out you can do this more explicitly with more clear regex and without
groupby
.事实证明,您可以使用更清晰的正则表达式更明确地做到这一点,而无需
groupby
。
splitlines
We attempt to parse the input string into tuples.我们尝试将输入字符串解析为元组。 We do this with
re.split
which leaves behind extra elements we don't need.我们使用
re.split
来做到这一点,它留下了我们不需要的额外元素。 These extras are cleaned up with filter
and by unpacking into (id_, status, time)
where time
is parsed as a datetime
object.这些额外内容使用
filter
进行清理,并通过解包到(id_, status, time)
中,其中time
被解析为datetime
时间 object。 The results are tuples of parsed lines.结果是已解析行的元组。 Example:
例子:
splitlines("AWS-1 - opened at Jan 23 2010 10:30:08AM")
# (('AWS-1', 'opened', datetime.datetime(2010, 1, 23, 10, 30, 8)),)
pairwise_records
We sort the tuples by id and datetime
.我们按 id 和
datetime
对元组进行排序。 Sorting by time will naturally align pairs in order.按时间排序自然会按顺序排列对。 Example, if something opens at 9 AM, it must close at some future time;
例如,如果某物在上午 9 点开门,那么它必须在未来某个时间关闭; time is naturally sorted.
时间是自然排序的。 Finally, we use a "trick" with iterators to pair the results together.
最后,我们使用带有迭代器的“技巧”将结果配对在一起。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.