简体   繁体   中英

Using Python to Remove All Lines Matching Regex

I'm attempting to remove all lines where my regex matches(regex is simply looking for any line that has yahoo in it). Each match is on it's own line, so there's no need for the multiline option.

This is what I have so far...

import re
inputfile = open('C:\\temp\\Scripts\\remove.txt','w',encoding="utf8")

inputfile.write(re.sub("\[(.*?)yahoo(.*?)\n","",inputfile))

inputfile.close()

I'm receiving the following error:

Traceback (most recent call last): line 170, in sub return _compile(pattern, flags).sub(repl, string, count) TypeError: expected string or buffer

Use fileinput module if you want to modify the original file:

import re
import fileinput
for line in fileinput.input(r'C:\temp\Scripts\remove.txt', inplace = True):
   if not re.search(r'\byahoo\b',line):
      print line,

Here's Python 3 variant of @Ashwini Chaudhary's answer , to remove all lines that contain a regex pattern from a give filename :

#!/usr/bin/env python3
"""Usage: remove-pattern <pattern> <file>"""
import fileinput
import re
import sys

def main():
    pattern, filename = sys.argv[1:] # get pattern, filename from command-line
    matched = re.compile(pattern).search
    with fileinput.FileInput(filename, inplace=1, backup='.bak') as file:
        for line in file:
            if not matched(line): # save lines that do not match
                print(line, end='') # this goes to filename due to inplace=1

main()

It assumes locale.getpreferredencoding(False) == input_file_encoding otherwise it might break on non-ascii characters.

To make it work regardless what current locale is or for input files that have a different encoding:

#!/usr/bin/env python3
import os
import re
import sys
from tempfile import NamedTemporaryFile

def main():
    encoding = 'utf-8'
    pattern, filename = sys.argv[1:]
    matched = re.compile(pattern).search
    with open(filename, encoding=encoding) as input_file:
        with NamedTemporaryFile(mode='w', encoding=encoding,
                                dir=os.path.dirname(filename),
                                delete=False) as outfile:
            for line in input_file:
                if not matched(line):
                    print(line, end='', file=outfile)
    os.replace(outfile.name, input_file.name)

main()

You have to read the file try something like:

import re
inputfile = open('C:\\temp\\Scripts\\remove.txt','w',encoding="utf8")

inputfile.write(re.sub("\[(.*?)yahoo(.*?)\n","",inputfile.read()))

file.close()
outputfile.close()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM