I have an inputfile of the form
All tests start with the word "Test" and all errors start with the word "error"
Test1
Error1
Error1
Error2
Test1
Error3
Test2
Error1
Error4
Test2
Error5
Error1
Test3
Error1
I want it in the format:
Test1
Error1
Error1
Error2
Error3 // Removed test1
Test2
Error1
Error4
Error5
Error1
Test3
Error1
Basically while going through the file, it should delete repeated testnames and write it in the same order to an output file. Following is my code
import os
import sys
import optparse
def delete_duplicate(inputfile,outputfile):
output = open(outputfile, "w")
from collections import OrderedDict
input = open(inputfile, "r")
lines = (line.strip() for line in input)
unique_lines = OrderedDict.fromkeys((line for line in lines if line))
for unique_line in unique_lines:
output.write(unique_line)
output.write("\n")
My code removes duplicate lines and gives result as below:
Test1
Error1
Error2
Error3
Test2
Error4
Error5
Test3
It is working fine with testnames but not with errors. Can anybody help?
All you need is to preserve the lines that starts with Test
in a set and check if you have it already just don't write it in output file :
def delete_duplicate(inputfile,outputfile,seen={}):
with open(outputfile, "w") as output,open(inputfile, "r") as input:
for line in input:
if line not in seen:
output.write(line+'\n')
if line.startswith('Test'):
seen.add(line)
The advantage of set
is that its order is O(1) for check the membership and adding items.
At the moment it looks like your code is simply inserting each line into the dictionary if it hasn't come across it before. It also seems like you want to track the errors independently for each test. You could do this with an OrderedDict that would look a bit like this:
output_dict = {
'test1' : ['Error1','Error1','Error2','Error3'],
'test2' : ['Error1','Error4','Error5','Error1']
}
The code to handle this would look like the following.
import os
import sys
import optparse
from collections import OrderedDict
def delete_duplicate(inputfile,outputfile):
# Declare the files and get the lines
outfile = open(outputfile, "w")
infile = open(inputfile, "r")
lines = (line.strip() for line in infile)
output_dict = OrderedDict()
currentTest = '' # Used to keep track of which test we are working with
for line in lines:
if line.startswith('Test'): # A new test is starting
currentTest = line
if currentTest not in output_dict:
output_dict[currentTest] = []
elif line.startswith('Error'): # Add the error to the current test
output_dict[currentTest].append(line)
for test in output_dict.keys():
outfile.write(test + '\n') # Write the test number
for error in output_dict[test]:
outfile.write(error + '\n') # Write the errors for that test
outfile.write('\n')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.