Python - Remove all the lines starting with word/string present in a list

Question

I am trying to parse huge 50K lined file in which I have to remove any line that starts with the word present in a predefined list.

Currently I have tried the below and the output file (DB12_NEW) is not working as desired -

rem = ['remove', 'remove1', 'remove2'....., 'removen']

inputFile = open(r"C:\file", "r")
outputFile = open(r"C:\file_12", "w")
lines = inputFile.readlines()
inputFile.close()
for line in lines:
    for i in rem:
        if line.startswith(i):
            outputFile.write('\n')
        else:
            outputFile.write(line)

I am getting the same file as output that I initially put in... the script is not removing the lines that start with any of the strings present in the list.

Can you please help understand how to achieve this?

Answer 1

Use a tuple instead of list for str.startswith .

# rem = ['remove', 'rem-ove', 'rem ove']
rem = ('remove', 'rem-ove', 'rem ove')

with open('DB12', 'r') as inputFile, open('DB12_NEW', 'w') as outputFile:
    for line in inputFile.readlines():
        if not line.startswith(rem):
            outputFile.writelines(line)

Answer 2

Currently you check if the line starts with the a word from the remove list one at a time. For example:

If the line starts with "rem ABCDF..." and in your loop you check if the line starts with 'remove' then your if-statement returns false and writes the line in your outputfile.

You could try something like this:

remove = ['remove', 'rem-ove', 'rem', 'rem ove' ...... 'n']
inputFile = open(r"C:\DB12", "r")
outputFile = open(r"C:\DB12_NEW", "w")
for line in inputFile.splitlines():
    if not any(line.startswith(i) for i in remove):
        outputFile.write(line)

The any keyword only returns False if all elements are also False .

Answer 3

Sometimes this could be caused by leading/trailing spaces.

Try stripping off empty spaces using strip() and check.

rem = [x.strip() for x in rem]
lines = [line.strip() for  line in lines]

Python - Remove all the lines starting with word/string present in a list

Question

3 answers

solution1
1 2021-07-08 09:38:29

solution2
0 ACCPTED 2021-07-08 09:24:55

solution3
0 2021-07-08 09:48:54

Python - Remove all the lines starting with word/string present in a list

Question

3 answers

solution1 1 2021-07-08 09:38:29

solution2 0 ACCPTED 2021-07-08 09:24:55

solution3 0 2021-07-08 09:48:54

solution1
1 2021-07-08 09:38:29

solution2
0 ACCPTED 2021-07-08 09:24:55

solution3
0 2021-07-08 09:48:54