简体   繁体   中英

Removing date/timestamp from file using regex re.compile and re.sub

first off, I'm new and just learning Python so thanks for entertaining my question. I'm attempting to compare a file with another file that should have the same content, other than the time stamp. I'm attempting to remove the time stamp from each line with regex and re.sub, but I'm obviously missing something. I've also researched and I havent been able to get anything to work quite how I want it. Ultimately I'd like to remove the date and timestamp but I wanted to try and get the date portion to work first. Here's how the log file looks:

15/03/2019  18:25:35 0446: Successful Compile (Script file: C:\PodTools\Automation\TL000635 - Serial Interface Tool Gen2_Automation Script\Script_Pair.txt)
15/03/2019  18:25:35 0448: Pairing with the Pod
15/03/2019  18:25:35 0448: V 82 2952790016 10051
15/03/2019  18:25:35 0550: I  52 B0 00 00 00 00 00 27 43
15/03/2019  18:25:40 0974: O  3D 02
15/03/2019  18:25:40 0976: SCRIPT COMPLETE

Code in question:

import re
import datetime

today = datetime.date.today()

with open('C:\\PodTools\\Automation\\TL000635 - Serial Interface Tool Gen2_Automation Script\\OutputFolder\\'+str(today)+'\\Output_'+str(today)+'.txt') as f:
    outputFile_contents = f.readlines()

newOutputFileContents = []

pat = re.compile(r'\d{2}[-/]\d{2}[-/]\d{4}')

for line in outputFile_contents:
    [re.sub(pat, '', line)]
    newOutputFileContents.append(line)
    print(newOutputFileContents)

For your purpose it's much easier to split each line into 3 columns and write only the third column to the new file:

with open('file.txt') as f:
    for line in f:
        print(line.split(maxsplit=2)[2], end='')

To answer your specific question about what is the problem in the code you included in your question, let's look at the line

[re.sub(pat, '', line)]

I suspect this is the problem because it looks like the following lines assume that the value of line has changed - but the code above doesn't do that. You should use eg:

line = re.sub(pat, '', line)

However if every single line of your log file has the same format I'd recommend @blhsing's answer as a simpler solution to the problem of stripping off the timestamp.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM