I am trying to parse huge 50K lined file in which I have to remove any line that starts with the word present in a predefined list.
Currently I have tried the below and the output file (DB12_NEW) is not working as desired -
rem = ['remove', 'remove1', 'remove2'....., 'removen']
inputFile = open(r"C:\file", "r")
outputFile = open(r"C:\file_12", "w")
lines = inputFile.readlines()
inputFile.close()
for line in lines:
for i in rem:
if line.startswith(i):
outputFile.write('\n')
else:
outputFile.write(line)
I am getting the same file as output that I initially put in... the script is not removing the lines that start with any of the strings present in the list.
Can you please help understand how to achieve this?
Use a tuple
instead of list
for str.startswith
.
# rem = ['remove', 'rem-ove', 'rem ove']
rem = ('remove', 'rem-ove', 'rem ove')
with open('DB12', 'r') as inputFile, open('DB12_NEW', 'w') as outputFile:
for line in inputFile.readlines():
if not line.startswith(rem):
outputFile.writelines(line)
Currently you check if the line starts with the a word from the remove list one at a time. For example:
If the line starts with "rem ABCDF..." and in your loop you check if the line starts with 'remove' then your if-statement returns false and writes the line in your outputfile.
You could try something like this:
remove = ['remove', 'rem-ove', 'rem', 'rem ove' ...... 'n']
inputFile = open(r"C:\DB12", "r")
outputFile = open(r"C:\DB12_NEW", "w")
for line in inputFile.splitlines():
if not any(line.startswith(i) for i in remove):
outputFile.write(line)
The any
keyword only returns False
if all elements are also False
.
Sometimes this could be caused by leading/trailing spaces.
Try stripping off empty spaces using strip()
and check.
rem = [x.strip() for x in rem]
lines = [line.strip() for line in lines]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.