搜索字符串並刪除包含字符串和下面的行的行

Question

我有一個包含

### 174.10.150.10 on 2018-06-20 12:19:47.533613 ###
IP : 174.10.150.10 : 

IP : ALL :

我目前有使用正則表達式搜索日期/時間字符串的代碼。 如何刪除包含找到的字符串的行？ 我要刪除該行以及下面的行。

因此，這兩行都將被刪除：

### 174.10.150.10 on 2018-06-20 12:19:47.533613 ###
IP : 174.10.150.10 :

我的代碼當前僅將“ None”添加到文本文件的底部。

import re

def run():  
    try:
        with open('file.txt', 'r') as f:
            with open('file.txt', 'a') as f2:
                reg = re.compile('###\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}.+(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{0,})\s###')
                for line in f:
                    m = reg.match(line)
                answer = raw_input("Delete line? ")
                if answer == "y":

                    # delete line that contains "###" and line underneath
                    f2.write(str(m))

                else:
                    print("You chose no.")
    except OSError as e:
        print (e)

run()

Answer 1

經過一些基本的重構，結果如下...

import re
valid_lines = []

def run():  
    try:
        with open('file.txt', 'r') as f:
            reg = re.compile('###\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}.+(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{0,})\s###\s?')
            lines = f.readlines()
            invalid_index = -10

            for a in range(len(lines)):
                reg_result = reg.match(lines[a])

                if invalid_index == (a - 1):
                    # Skip the line underneath the invalid line
                    continue

                if reg_result != None:
                    # If the line matches the regexp.
                    invalid_index = a
                    answer = raw_input("Delete line? ")

                    if answer.lower() != 'y':
                        print("You chose no.")
                        valid_lines.append(lines[a])
                else:
                    valid_lines.append(lines[a])

        with open('file.txt', 'w') as f:
            # Override the file...
            f.writelines(valid_lines)

    except OSError as e:
        print (e)

run()

如果要刪除以###開頭的任何行，則也許應將其視為正則表達式： ###.*

編輯：在您的正則表達式中，您應該添加一個\\s? 最后，可以選擇匹配\\n ，因為文件包含換行符。 另外，請使用fullmatch()而不是match() 。

Answer 2

（ 編輯：我現在從您的注釋中了解到，您在兩行數據之后有一個空白行，因此，當您要刪除一行時，您也要刪除接下來的兩行。我的代碼已進行了調整。）

這是一些代碼，可以對代碼進行各種更改。 為了安全起見，我寫了一個新文件而不是覆蓋舊文件，從而避免了將整個文件立即保存在內存中。 為了便於閱讀，我將with行合並為一行。 同樣，我分割了正則表達式字符串以允許較短的代碼行。 為了避免一次在內存中占用多行，我使用了一個倒計時變量skipline來說明新文件中是否要跳過一行。 在詢問是否將其刪除之前，我還會顯示每行（及其下一行）。 請注意，通過檢查regexp匹配變量是否為None ，可以復制沒有日期和時間的行。 最后，我將raw_input更改為input因此此代碼將在Python 3中運行。對於Python 2，將其更改回raw_input 。

順便說一句，您的代碼只是在文件末尾添加'None'的原因是，您將write行放在了文件行的主循環之外。 因此，您只為文件的最后一行寫正則表達式匹配對象。 由於文件中的最后一行沒有日期和時間，因此正則表達式不匹配，因此失敗匹配的字符串表示形式為'None' 。 在第二個with語句中，您以附加模式打開了file.txt ，因此'None'被附加到文件中。

我想強調一點，您應該創建一個新文件。 如果您確實要覆蓋舊文件，那么安全的方法是先創建一個名稱稍有不同的新文件。 然后，如果該文件制作成功，則用新文件覆蓋舊文件，並將一個副本重命名為file.bak 。 在您的代碼嘗試執行時，這會考慮可能的OS錯誤。 沒有這樣的東西，錯誤可能最終會完全刪除或破壞您的文件。 我把那部分代碼留給您。

import re

def run():  
    try:
        with open('file.txt', 'r') as f, open('file.tmp', 'w') as f2:
            reg = re.compile('###\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
                             '.+(\d{4}-\d{2}-\d{2}\s\d{2}'
                             ':\d{2}:\d{2}.\d{0,})\s###')
            skipline = 0  # do not skip lines
            for line in f:
                if skipline:
                    skipline -= 1
                    continue  # Don't write or process this line
                m = reg.match(line)
                if m:
                    answer = input("Delete line {} ? ".format(m.group()))
                    if answer == "y":
                        skipline = 2 # leave out this and next 2 lines
                    else:
                        print("You chose no.")
                if not skipline:
                    f2.write(line)
    except OSError as e:
        print(e)

run()

Answer 3

我將過濾部分重構為一個名為filter_lines的函數，並將正則表達式作為模塊變量移動。 這種方法利用了迭代器。

import re

regex = re.compile('###\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}.+(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{0,})\s###')

def filter_lines(lines):
    it = iter(lines)

    try:
        while True:
            line = next(it)
            m = regex.match(line)

            if m:
                # You may add the question-answer code here to ask the user whether delete the matched line.
                next(it)  # Comsume the line following the commented line
                continue

            yield line
    except StopIteration:
        # In the future, StopIteration raised in generator function will be converted to RuntimeError so it have to be caught.
        # https://www.python.org/dev/peps/pep-0479/
        pass

def run():  
    try:
        with open('file.txt', 'r') as f:
            with open('file.txt', 'a') as f2:
                filtered_lines = list(filter_lines(f1.readlines()))
                print(*filtered_lines, sep='')
                # You may use the following line to actually write the result to a file
                # f2.writelines(filtered_lines)
    except OSError as e:
        print (e)

run()

該程序應打印結果內容。

搜索字符串並刪除包含字符串和下面的行的行

問題描述

3 個解決方案

解決方案1
1 2018-06-20 18:22:52

解決方案2
1 已采納 2018-06-20 18:30:11

解決方案3
1 2018-06-20 18:33:35

搜索字符串並刪除包含字符串和下面的行的行

問題描述

3 個解決方案

解決方案1 1 2018-06-20 18:22:52

解決方案2 1 已采納 2018-06-20 18:30:11

解決方案3 1 2018-06-20 18:33:35

解決方案1
1 2018-06-20 18:22:52

解決方案2
1 已采納 2018-06-20 18:30:11

解決方案3
1 2018-06-20 18:33:35