简体   繁体   English

在日志文件中查找每天的 min() 和 max()

[英]Find min() and max() for each day in log file

New to Python, but I am stumped. Python 新手,但我很困惑。

I have a text log file that starts each entry with a timestamp.我有一个文本日志文件,它以时间戳开头每个条目。 So :所以 :

03/17/2020 01:38:20 PM
03/18/2020 09:21:28 AM

I want to go through and create a two dimensional list that has one entry for each day as well as the earliest and latest timestamp found.我想通过并创建一个二维列表,该列表每天都有一个条目以及找到的最早和最新时间​​戳。 For instance, the list would contain [3/17/2020, 09:00:00 AM, 01:26:16 PM], [4/28/2020, 10:14:00 AM, 03:16:16 PM], with additional entries for each day.例如,列表将包含 [3/17/2020, 09:00:00 AM, 01:26:16 PM]、[4/28/2020, 10:14:00 AM, 03:16:16 PM] ,每天都有额外的条目。

Here is what I have so far (I blew away a previous attempt)这是我到目前为止所拥有的(我吹走了以前的尝试)

    lActDays = []
lActDayTimes = []
for item in lUAData:
    # Find the first space in the Time column.
    ispaceindx = item[0].find(' ')
    # Use the space as a delimiter, print everything before that - should be the date.
    sRecDay = item[0][0:ispaceindx]
    # Use the space as a delimiter, print everything after - should be the time.
    sRecTime = item[0][ispaceindx:].strip()
    if not sRecDay in lActDays:
        lActDays.append([sRecDay, [sRecTime]])

When I run this, it keeps appending [sRecDay, [sRecTime]] each time the for loop runs.当我运行它时,每次 for 循环运行时它都会附加 [sRecDay, [sRecTime]]。 Its like the 'if not' condition isn't being run.就像没有运行“如果不是”条件一样。 However, if I change the last line to lActDays.append(sRecDay), it works fine.但是,如果我将最后一行更改为 lActDays.append(sRecDay),它就可以正常工作。 I get a list of unique days (but without a time)我得到一个独特的日子列表(但没有时间)

You want to sort dates, you'll want to take advantage of the special properties of datetime objects that allow them to be sorted.您想要对日期进行排序,您将想要利用允许对它们进行排序的datetime对象的特殊属性。 Since you're reading in what looks to be a predictable format, you can also take advantage of datetime parsing and output formatting :由于您正在阅读看似可预测的格式,因此您还可以利用datetime解析和输出格式

from datetime import datetime

entries = ['06/05/2020 09:21:00 AM log file line 1 text',
           '06/15/2020 10:59:59 PM log file line 2 text',
           '06/25/2020 04:12:58 AM log file line 3 text',
           '06/05/2020 07:24:11 AM log file line 4 text',
           '06/15/2020 08:18:56 PM log file line 5 text',
           '06/25/2020 03:46:00 AM log file line 6 text',
           '06/05/2020 09:40:57 PM log file line 7 text',
           '06/15/2020 08:50:35 PM log file line 8 text',
           '06/25/2020 09:30:45 PM log file line 9 text',
           '06/05/2020 01:40:14 AM log file line 10 text']

fp = 'dummyfile.txt'
with open(fp, 'w') as dfile:
    dfile.write("\n".join(entries))


def get_daily_first_last(file_path):
    fmt = '%m/%d/%Y %H:%M:%S %p'
    with open(file_path, 'r') as infile:
        data = {}
        for line in infile:
            dt, txt = datetime.strptime(line[:22], fmt), line.strip()
            day = dt.date().isoformat()
            if day in data.keys():
                data[day].append((dt, txt))
            else:
                data[day] = [(dt, txt)]

    for k, v in data.items():
        v = sorted(v)
        print(f"Day: {k}\nFirst entry: {v[0][1]}\nLast entry: {v[-1][1]}")


if __name__ == "__main__":
    get_daily_first_last(fp)

You should get the output:你应该得到输出:

Day: 2020-06-05
First entry: 06/05/2020 01:40:14 AM log file line 10 text
Last entry: 06/05/2020 09:40:57 PM log file line 7 text
Day: 2020-06-15
First entry: 06/15/2020 08:18:56 PM log file line 5 text
Last entry: 06/15/2020 10:59:59 PM log file line 2 text
Day: 2020-06-25
First entry: 06/25/2020 03:46:00 AM log file line 6 text
Last entry: 06/25/2020 09:30:45 PM log file line 9 text

The "entries" list and "dummyfile.txt" is just an example to show it works. "entries" 列表和 "dummyfile.txt" 只是一个例子来展示它的工作原理。 You asked for a list, but I really think you want a dictionary for this problem, so you can group data as you're parsing the file.您要求提供一个列表,但我真的认为您需要一个字典来解决这个问题,这样您就可以在解析文件时对数据进行分组。 I'm storing the entire line to the second item of a tuple in the data dictionary, so I can just print that out after it's sorted.我将整行存储到data字典中元组的第二项,因此我可以在排序后将其打印出来。 The first item int the tuple is the datetime object that supports comparisons (eg sorting).元组中的第一项是支持比较(例如排序)的日期时间对象。 The v = sorted(v) line returns a list that is sorted by the first item in each tuple (the datetime object). v = sorted(v)行返回一个列表,该列表按每个元组(日期时间对象)中的第一项排序。

You have a bunch of questions in one.你有一堆问题。 You really should break things down into separate tasks.你真的应该把事情分解成单独的任务。 That way, when you ask a question, it's one issue that you're working on, and you have code and data to demonstrate the issue, and your progress so far.这样,当您提出问题时,这是您正在处理的一个问题,并且您有代码和数据来演示该问题以及您目前的进展。

I put together this small demo based on the data you gave.我根据您提供的数据整理了这个小演示。 Normally, you would want a regex to parse the date and separate it from the rest of the log data (see Extracting a date from a log file? )通常,您需要一个正则表达式来解析日期并将其与日志数据的其余部分分开(请参阅从日志文件中提取日期?

But since your date is a fixed length, here's a super quick and dirty way to parse it, assuming it's date-space-text:但由于您的日期是固定长度,这里有一个超级快速和肮脏的方式来解析它,假设它是日期空间文本:

import datetime
import re
from collections import defaultdict
from pprint import pprint

logfile = [
    '03/17/2020 01:38:20 PM stuff goes here',
    '03/18/2020 08:21:28 AM earlier',
    '03/18/2020 09:21:28 AM more stuff in this line',
    '03/18/2020 11:21:28 AM later',
]

print ('parsing debug prints:')
mydata = defaultdict(list)
for line in logfile:
    timestamp, message = line[:22], line[23:]
    dt = datetime.datetime.strptime(timestamp, '%m/%d/%Y %I:%M:%S %p')
    date_string = dt.strftime('%m/%d/%Y')
    print (date_string)
    print (message)
    mydata[date_string].append((dt, message))
print()

print ('The full data structure:')
pprint(mydata)
print()

There's nothing special about finding the min/max items, assuming those items are a type that supports comparison.假设这些项目是支持比较的类型,查找最小/最大项目没有什么特别之处。 That is why I have put your timestamps into a datetime.这就是为什么我将您的时间戳放入日期时间。

day = '03/18/2020'
list_of_records_from_day = mydata[day]
list_of_datetime_objects = [r[0] for r in list_of_records_from_day]
print ('The earliest timestamp on', day, 'is', min(list_of_datetime_objects))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 Pandas DataFrame 中查找具有每天最小值/最大值的行 - Find row with min/max value for each day in Pandas DataFrame 在给定文件中查找列的数据类型,查找每列的最大值和最小值,如果是字符串,则根据长度查找最大值,最小值字符串 - To find datatypes of column in a file given, to find max and min value of each column, in case of string find max, min string based on length 在没有索引的最小值和最大值的情况下查找熊猫中每一列的最小值和最大值 - find min and max of each column in pandas without min and max of index 在列表中找到每个嵌套列表的最小值和最大值 - Find the min and max of each nested list in a list Groupby列并查找每个组的最小值和最大值 - Groupby column and find min and max of each group 从文本文件中查找最小和最大-Python - Find Min and Max from Text File - Python 如何使用熊猫找到30分钟的平均流量,然后找到每天最多30分钟的平均流量? - How to use Pandas to find 30 min average flows and then find max 30 min average flow per day? Python在当前目录中查找扩展名类型,对它们进行计数并查找每种扩展名类型的最小,平均和最大文件大小 - Python Find Type of extension in current directory, count them and find min, average, and max file size of each type of extension 需要遍历每列并找到最大值和最小值 - Need to iterate through each column and find max and min 查找最大和最小日期 - Find max and min date
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM