在日志文件中查找每天的 min() 和 max()

Question

New to Python, but I am stumped. Python 新手，但我很困惑。

I have a text log file that starts each entry with a timestamp.我有一个文本日志文件，它以时间戳开头每个条目。 So :所以：

03/17/2020 01:38:20 PM
03/18/2020 09:21:28 AM

I want to go through and create a two dimensional list that has one entry for each day as well as the earliest and latest timestamp found.我想通过并创建一个二维列表，该列表每天都有一个条目以及找到的最早和最新时间戳。 For instance, the list would contain [3/17/2020, 09:00:00 AM, 01:26:16 PM], [4/28/2020, 10:14:00 AM, 03:16:16 PM], with additional entries for each day.例如，列表将包含 [3/17/2020, 09:00:00 AM, 01:26:16 PM]、[4/28/2020, 10:14:00 AM, 03:16:16 PM] ，每天都有额外的条目。

Here is what I have so far (I blew away a previous attempt)这是我到目前为止所拥有的（我吹走了以前的尝试）

    lActDays = []
lActDayTimes = []
for item in lUAData:
    # Find the first space in the Time column.
    ispaceindx = item[0].find(' ')
    # Use the space as a delimiter, print everything before that - should be the date.
    sRecDay = item[0][0:ispaceindx]
    # Use the space as a delimiter, print everything after - should be the time.
    sRecTime = item[0][ispaceindx:].strip()
    if not sRecDay in lActDays:
        lActDays.append([sRecDay, [sRecTime]])

When I run this, it keeps appending [sRecDay, [sRecTime]] each time the for loop runs.当我运行它时，每次 for 循环运行时它都会附加 [sRecDay, [sRecTime]]。 Its like the 'if not' condition isn't being run.就像没有运行“如果不是”条件一样。 However, if I change the last line to lActDays.append(sRecDay), it works fine.但是，如果我将最后一行更改为 lActDays.append(sRecDay)，它就可以正常工作。 I get a list of unique days (but without a time)我得到一个独特的日子列表（但没有时间）

Answer 1

You want to sort dates, you'll want to take advantage of the special properties of datetime objects that allow them to be sorted.您想要对日期进行排序，您将想要利用允许对它们进行排序的datetime对象的特殊属性。 Since you're reading in what looks to be a predictable format, you can also take advantage of datetime parsing and output formatting :由于您正在阅读看似可预测的格式，因此您还可以利用datetime解析和输出格式：

from datetime import datetime

entries = ['06/05/2020 09:21:00 AM log file line 1 text',
           '06/15/2020 10:59:59 PM log file line 2 text',
           '06/25/2020 04:12:58 AM log file line 3 text',
           '06/05/2020 07:24:11 AM log file line 4 text',
           '06/15/2020 08:18:56 PM log file line 5 text',
           '06/25/2020 03:46:00 AM log file line 6 text',
           '06/05/2020 09:40:57 PM log file line 7 text',
           '06/15/2020 08:50:35 PM log file line 8 text',
           '06/25/2020 09:30:45 PM log file line 9 text',
           '06/05/2020 01:40:14 AM log file line 10 text']

fp = 'dummyfile.txt'
with open(fp, 'w') as dfile:
    dfile.write("\n".join(entries))


def get_daily_first_last(file_path):
    fmt = '%m/%d/%Y %H:%M:%S %p'
    with open(file_path, 'r') as infile:
        data = {}
        for line in infile:
            dt, txt = datetime.strptime(line[:22], fmt), line.strip()
            day = dt.date().isoformat()
            if day in data.keys():
                data[day].append((dt, txt))
            else:
                data[day] = [(dt, txt)]

    for k, v in data.items():
        v = sorted(v)
        print(f"Day: {k}\nFirst entry: {v[0][1]}\nLast entry: {v[-1][1]}")


if __name__ == "__main__":
    get_daily_first_last(fp)

You should get the output:你应该得到输出：

Day: 2020-06-05
First entry: 06/05/2020 01:40:14 AM log file line 10 text
Last entry: 06/05/2020 09:40:57 PM log file line 7 text
Day: 2020-06-15
First entry: 06/15/2020 08:18:56 PM log file line 5 text
Last entry: 06/15/2020 10:59:59 PM log file line 2 text
Day: 2020-06-25
First entry: 06/25/2020 03:46:00 AM log file line 6 text
Last entry: 06/25/2020 09:30:45 PM log file line 9 text

The "entries" list and "dummyfile.txt" is just an example to show it works. "entries" 列表和 "dummyfile.txt" 只是一个例子来展示它的工作原理。 You asked for a list, but I really think you want a dictionary for this problem, so you can group data as you're parsing the file.您要求提供一个列表，但我真的认为您需要一个字典来解决这个问题，这样您就可以在解析文件时对数据进行分组。 I'm storing the entire line to the second item of a tuple in the data dictionary, so I can just print that out after it's sorted.我将整行存储到data字典中元组的第二项，因此我可以在排序后将其打印出来。 The first item int the tuple is the datetime object that supports comparisons (eg sorting).元组中的第一项是支持比较（例如排序）的日期时间对象。 The v = sorted(v) line returns a list that is sorted by the first item in each tuple (the datetime object). v = sorted(v)行返回一个列表，该列表按每个元组（日期时间对象）中的第一项排序。

Answer 2

You have a bunch of questions in one.你有一堆问题。 You really should break things down into separate tasks.你真的应该把事情分解成单独的任务。 That way, when you ask a question, it's one issue that you're working on, and you have code and data to demonstrate the issue, and your progress so far.这样，当您提出问题时，这是您正在处理的一个问题，并且您有代码和数据来演示该问题以及您目前的进展。

I put together this small demo based on the data you gave.我根据您提供的数据整理了这个小演示。 Normally, you would want a regex to parse the date and separate it from the rest of the log data (see Extracting a date from a log file? )通常，您需要一个正则表达式来解析日期并将其与日志数据的其余部分分开（请参阅从日志文件中提取日期？）

But since your date is a fixed length, here's a super quick and dirty way to parse it, assuming it's date-space-text:但由于您的日期是固定长度，这里有一个超级快速和肮脏的方式来解析它，假设它是日期空间文本：

import datetime
import re
from collections import defaultdict
from pprint import pprint

logfile = [
    '03/17/2020 01:38:20 PM stuff goes here',
    '03/18/2020 08:21:28 AM earlier',
    '03/18/2020 09:21:28 AM more stuff in this line',
    '03/18/2020 11:21:28 AM later',
]

print ('parsing debug prints:')
mydata = defaultdict(list)
for line in logfile:
    timestamp, message = line[:22], line[23:]
    dt = datetime.datetime.strptime(timestamp, '%m/%d/%Y %I:%M:%S %p')
    date_string = dt.strftime('%m/%d/%Y')
    print (date_string)
    print (message)
    mydata[date_string].append((dt, message))
print()

print ('The full data structure:')
pprint(mydata)
print()

There's nothing special about finding the min/max items, assuming those items are a type that supports comparison.假设这些项目是支持比较的类型，查找最小/最大项目没有什么特别之处。 That is why I have put your timestamps into a datetime.这就是为什么我将您的时间戳放入日期时间。

day = '03/18/2020'
list_of_records_from_day = mydata[day]
list_of_datetime_objects = [r[0] for r in list_of_records_from_day]
print ('The earliest timestamp on', day, 'is', min(list_of_datetime_objects))

在日志文件中查找每天的 min() 和 max()

问题描述

2 个解决方案

解决方案1
0 2020-03-24 17:20:03

解决方案2
0 2020-03-24 17:20:52

在日志文件中查找每天的 min() 和 max()

问题描述

2 个解决方案

解决方案1 0 2020-03-24 17:20:03

解决方案2 0 2020-03-24 17:20:52

解决方案1
0 2020-03-24 17:20:03

解决方案2
0 2020-03-24 17:20:52