简体   繁体   中英

Python - Remove all the lines starting with word/string present in a list

I am trying to parse huge 50K lined file in which I have to remove any line that starts with the word present in a predefined list.

Currently I have tried the below and the output file (DB12_NEW) is not working as desired -

rem = ['remove', 'remove1', 'remove2'....., 'removen']

inputFile = open(r"C:\file", "r")
outputFile = open(r"C:\file_12", "w")
lines = inputFile.readlines()
inputFile.close()
for line in lines:
    for i in rem:
        if line.startswith(i):
            outputFile.write('\n')
        else:
            outputFile.write(line)

I am getting the same file as output that I initially put in... the script is not removing the lines that start with any of the strings present in the list.

Can you please help understand how to achieve this?

Use a tuple instead of list for str.startswith .

# rem = ['remove', 'rem-ove', 'rem ove']
rem = ('remove', 'rem-ove', 'rem ove')

with open('DB12', 'r') as inputFile, open('DB12_NEW', 'w') as outputFile:
    for line in inputFile.readlines():
        if not line.startswith(rem):
            outputFile.writelines(line)

Currently you check if the line starts with the a word from the remove list one at a time. For example:

If the line starts with "rem ABCDF..." and in your loop you check if the line starts with 'remove' then your if-statement returns false and writes the line in your outputfile.

You could try something like this:

remove = ['remove', 'rem-ove', 'rem', 'rem ove' ...... 'n']
inputFile = open(r"C:\DB12", "r")
outputFile = open(r"C:\DB12_NEW", "w")
for line in inputFile.splitlines():
    if not any(line.startswith(i) for i in remove):
        outputFile.write(line)

The any keyword only returns False if all elements are also False .

Sometimes this could be caused by leading/trailing spaces.

Try stripping off empty spaces using strip() and check.

rem = [x.strip() for x in rem]
lines = [line.strip() for  line in lines]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM