简体   繁体   English

在文件中的特定行之后找到包含时间戳的第一行

[英]Find the first line containing a time stamp after a specific line in a file

I'm trying to add time stamps to my search results from a file. 我正在尝试从文件向我的搜索结果添加时间戳。

My code is: 我的代码是:

def findIcommingStats():
    #read the result file
    replication_file = open("result.log", "r")

    #create a new temp file for all the prints we will find
    tempFile = open("incomingTemp.txt", "w")

    #loop over the file and move all relevant lines to another temp file
    for line in replication_file:
            if ((line.find('STATISTICS') >= 0) & ( line.find('DeltaMarkerIncomingData') > 0 ) & ( line.find('Counter') == -1  ) &
                     ( line.find('0.00e+00') == -1 ) & ( line.find('0.00') == -1 ) & ( line.find('description') == -1 ) ):
                            tempFile.write(line)
    #cleanup
    replication_file.close()
    tempFile.close()

This gives me the strings I'm searching for in my file, that look like: "STATISTICS: name=gridDeltaMarkerIncomingData kVolSlot=0 GroupCopy(26764 SiteUID(0x3d1d0445) 0) 0 8582 sec window: Rate: 3.53e-06 MB/sec" 这使我在文件中搜索的字符串看起来像:“ STATISTICS:name = gridDeltaMarkerIncomingData kVolSlot = 0 GroupCopy(26764 SiteUID(0x3d1d0445)0)0 8582 sec窗口:速率:3.53e-06 MB / sec ”

The time stamps are ~20-30 lines before that. 时间戳在此之前约为20-30行。 How can I get them to be printed in line \\ before the strings? 如何在字符串之前将它们打印在\\行中?

The time stamps looks like "2015/07/08 10:08:00.079 " 时间戳看起来像“ 2015/07/08 10:08:00.079”

File looks like: 文件看起来像:

2015/07/08 10:14:46.971 - #2 - 4080/4064 - AccumulatorManager: ProcessID= RAW STATS:

<statistics>

STATISTICS: name=gridDeltaMarkerIncomingData kVolSlot=0 GroupCopy(26764 SiteUID(0x3d1d0445) 0) 0 924 sec window: Rate: 0.00e+00 MB/sec
STATISTICS: name=gridDeltaMarkerIncomingData kVolSlot=0 GroupCopy(26764 SiteUID(0x3d1d0445) 0) 0 8582 sec window: Rate: 3.53e-06 MB/sec
STATISTICS: name=gridDeltaMarkerIncomingData kVolSlot=0 GroupCopy(26764 SiteUID(0x3d1d0445) 0) 0 63612 sec window: Rate: 4.23e-06 MB/sec

<more statistics>

I want to get that time stamp in RAW STATS line., so it will look like: 我想在RAW STATS行中获取该时间戳记,因此它将如下所示:

2015/07/08 10:14:46.971 STATISTICS: name=gridDeltaMarkerIncomingData kVolSlot=0 GroupCopy(26764 SiteUID(0x3d1d0445) 0) 0 924 sec window: Rate: 0.00e+00 MB/sec

2015/07/08 10:14:46.971 STATISTICS: name=gridDeltaMarkerIncomingData kVolSlot=0 GroupCopy(26764 SiteUID(0x3d1d0445) 0) 0 8582 sec window: Rate: 3.53e-06 MB/sec

This should basically do the job: 这基本上应该做的工作:

def stat_entry(line):
    return line.startswith('STATISTICS')

def date_entry(line):
    return line.startswith('20')

def findIcommingStats():
    date = ''
    with open("result.log", "r") as replication_file:
        with open("incomingTemp.txt", "w") as tempFile:
            for line in replication_file:
                if date_entry(line):
                    date = ' '.join(line.split(' ')[:2]) # set new date
                elif stat_entry(line):
                    tempFile.write(date  + ' ' + line) # write to tempfile

findIcommingStats()

Output: 输出:

2015/07/08 10:14:46.971 STATISTICS: name=gridDeltaMarkerIncomingData...
2015/07/08 10:14:46.971 STATISTICS: name=gridDeltaMarkerIncomingData...
2015/07/08 10:14:46.971 STATISTICS: name=gridDeltaMarkerIncomingData...

As you see I factored out the stat_entry and date_entry functions; 如您所见,我排除了stat_entrydate_entry函数; You might want to change those and add some better criteria to check whether a given line is a date or a statistics entry. 您可能想要更改这些内容并添加一些更好的条件,以检查给定的行是日期还是统计信息条目。

You can do it and other problems like this using regular expressions. 您可以使用正则表达式来执行此操作以及其他类似问题。

first you need to find the time stamp 首先,您需要找到时间戳记

 regexTimeStamp = re.complie('\d{4}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2}.\d{3}')

then you can use 那么你可以使用

match = regexTimeStamp.match(Str)

here I am using Str as one line in the file. 在这里,我使用Str作为文件中的一行。 then use TimeStamp = match.group() to get your time stamp 然后使用TimeStamp = match.group()来获取您的时间戳

now simillarly use regular expression to find 现在类似地使用正则表达式来查找

regexStat = re.compile('STATISTICS:')

match1 = regexStat.match(str)
match1.start()

will give you the beginning index of STATISTICS: you can append your TimeStamp before that. 将为您提供STATISTICS的开始索引:您可以在此之前附加TimeStamp。

here is a guide on regex 这是正则表达式的指南

and here is for hit and try 这是为了尝试

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM