在文件中的特定行之后找到包含时间戳的第一行

Question

I'm trying to add time stamps to my search results from a file. 我正在尝试从文件向我的搜索结果添加时间戳。

My code is: 我的代码是：

def findIcommingStats():
    #read the result file
    replication_file = open("result.log", "r")

    #create a new temp file for all the prints we will find
    tempFile = open("incomingTemp.txt", "w")

    #loop over the file and move all relevant lines to another temp file
    for line in replication_file:
            if ((line.find('STATISTICS') >= 0) & ( line.find('DeltaMarkerIncomingData') > 0 ) & ( line.find('Counter') == -1  ) &
                     ( line.find('0.00e+00') == -1 ) & ( line.find('0.00') == -1 ) & ( line.find('description') == -1 ) ):
                            tempFile.write(line)
    #cleanup
    replication_file.close()
    tempFile.close()

This gives me the strings I'm searching for in my file, that look like: "STATISTICS: name=gridDeltaMarkerIncomingData kVolSlot=0 GroupCopy(26764 SiteUID(0x3d1d0445) 0) 0 8582 sec window: Rate: 3.53e-06 MB/sec" 这使我在文件中搜索的字符串看起来像：“ STATISTICS：name = gridDeltaMarkerIncomingData kVolSlot = 0 GroupCopy（26764 SiteUID（0x3d1d0445）0）0 8582 sec窗口：速率：3.53e-06 MB / sec ”

The time stamps are ~20-30 lines before that. 时间戳在此之前约为20-30行。 How can I get them to be printed in line \\ before the strings? 如何在字符串之前将它们打印在\\行中？

The time stamps looks like "2015/07/08 10:08:00.079 " 时间戳看起来像“ 2015/07/08 10：08：00.079”

File looks like: 文件看起来像：

2015/07/08 10:14:46.971 - #2 - 4080/4064 - AccumulatorManager: ProcessID= RAW STATS:

<statistics>

STATISTICS: name=gridDeltaMarkerIncomingData kVolSlot=0 GroupCopy(26764 SiteUID(0x3d1d0445) 0) 0 924 sec window: Rate: 0.00e+00 MB/sec
STATISTICS: name=gridDeltaMarkerIncomingData kVolSlot=0 GroupCopy(26764 SiteUID(0x3d1d0445) 0) 0 8582 sec window: Rate: 3.53e-06 MB/sec
STATISTICS: name=gridDeltaMarkerIncomingData kVolSlot=0 GroupCopy(26764 SiteUID(0x3d1d0445) 0) 0 63612 sec window: Rate: 4.23e-06 MB/sec

<more statistics>

I want to get that time stamp in RAW STATS line., so it will look like: 我想在RAW STATS行中获取该时间戳记，因此它将如下所示：

2015/07/08 10:14:46.971 STATISTICS: name=gridDeltaMarkerIncomingData kVolSlot=0 GroupCopy(26764 SiteUID(0x3d1d0445) 0) 0 924 sec window: Rate: 0.00e+00 MB/sec

2015/07/08 10:14:46.971 STATISTICS: name=gridDeltaMarkerIncomingData kVolSlot=0 GroupCopy(26764 SiteUID(0x3d1d0445) 0) 0 8582 sec window: Rate: 3.53e-06 MB/sec

Answer 1

This should basically do the job: 这基本上应该做的工作：

def stat_entry(line):
    return line.startswith('STATISTICS')

def date_entry(line):
    return line.startswith('20')

def findIcommingStats():
    date = ''
    with open("result.log", "r") as replication_file:
        with open("incomingTemp.txt", "w") as tempFile:
            for line in replication_file:
                if date_entry(line):
                    date = ' '.join(line.split(' ')[:2]) # set new date
                elif stat_entry(line):
                    tempFile.write(date  + ' ' + line) # write to tempfile

findIcommingStats()

Output: 输出：

2015/07/08 10:14:46.971 STATISTICS: name=gridDeltaMarkerIncomingData...
2015/07/08 10:14:46.971 STATISTICS: name=gridDeltaMarkerIncomingData...
2015/07/08 10:14:46.971 STATISTICS: name=gridDeltaMarkerIncomingData...

As you see I factored out the stat_entry and date_entry functions; 如您所见，我排除了stat_entry和date_entry函数； You might want to change those and add some better criteria to check whether a given line is a date or a statistics entry. 您可能想要更改这些内容并添加一些更好的条件，以检查给定的行是日期还是统计信息条目。

Answer 2

You can do it and other problems like this using regular expressions. 您可以使用正则表达式来执行此操作以及其他类似问题。

first you need to find the time stamp 首先，您需要找到时间戳记

 regexTimeStamp = re.complie('\d{4}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2}.\d{3}')

then you can use 那么你可以使用

match = regexTimeStamp.match(Str)

here I am using Str as one line in the file. 在这里，我使用Str作为文件中的一行。 then use TimeStamp = match.group() to get your time stamp 然后使用TimeStamp = match.group()来获取您的时间戳

now simillarly use regular expression to find 现在类似地使用正则表达式来查找

regexStat = re.compile('STATISTICS:')

match1 = regexStat.match(str)
match1.start()

will give you the beginning index of STATISTICS: you can append your TimeStamp before that. 将为您提供STATISTICS的开始索引：您可以在此之前附加TimeStamp。

here is a guide on regex 这是正则表达式的指南

and here is for hit and try 这是为了尝试

在文件中的特定行之后找到包含时间戳的第一行

问题描述

2 个解决方案

解决方案1
2 已采纳 2015-07-16 11:51:16

解决方案2
1 2015-07-17 11:18:55

在文件中的特定行之后找到包含时间戳的第一行

问题描述

2 个解决方案

解决方案1 2 已采纳 2015-07-16 11:51:16

解决方案2 1 2015-07-17 11:18:55

解决方案1
2 已采纳 2015-07-16 11:51:16

解决方案2
1 2015-07-17 11:18:55