使用 python 如何使用模式匹配分隔文本行並將它們存儲到不同的文本文件中

Question

下面是代碼示例，它是一個很長的日志，但我剛剛粘貼了它的一個片段。 我需要提取模式之間的行 ---------------------------------- 並將每個信息存儲在分別為每個單獨的文本文件。

Like:
------------------
info1 
------------------
info2
------------------
info3
------------------

Output：

fetch info1 and store it into file1.txt
fetch info2 and store it into file2.txt
fetch info3 and store it into file3.txt
And so on...

++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++

**This is the text data   :** 
------------------------------------------------------------------------

revision88 106 | rohit | 2018-06-08 13:41:46 +0530 (Fri, 08 Jun 2018) | 1 line

initial code import from FinanavialAnalytics branch


------------------------------------------------------------------------
revision88 99 | dhammdip.sawate | 2018-06-04 20:59:48 +0530 (Mon, 04 Jun 2018) | 1 line

Added Little Bit Java Support.!

Index: resources.properties
===================================================================
--- resources.properties    (revision 98)
+++ resources.properties    (revision 99)
@@ -1,15 +1,15 @@
 ####################Elastsic Search#########################
 ElasticClusterName=UProbe
-ElasticHost=192.168.0.91
+ElasticHost=192.168.0.73
 ElasticPort=19300
 
-esSQLURL=http://192.168.0.91:19200/_sql?sql=
+esSQLURL=http://192.168.0.73:19200/_sql?sql=
 resultsize =1024

@@ -72,45 +72,65 @@
 secfile /home/sandeep/Desktop/LIC/Uprobe-LIC/Uprobe-Dev.seed
 licfile /home/sandeep/Desktop/LIC/Uprobe-LIC/Uprobe-Dev.lic
 
------------------------------------------------------------------------
revision88 | sandeep.yadav | 2018-05-31 15:31:26 +0530 (Thu, 31 May 2018) | 1 line

Acc_Ref Data front-end side functionality with validation done.

------------------------------------------------------------------------

Answer 1

嘗試這個：

lg = open("log.txt")
fl = open("temp.txt", 'w')
cnt = 0

for i in lg:
    if i == "------------------------------------------------------------------------\n":
        fl.close()
        cnt += 1
        fl = open("file{}.txt".format(str(cnt)), 'w')
    else:
        fl.write(i)

fl.close()
lg.close()

這甚至可以在不使用正則表達式的情況下完成。

Answer 2

我假設，主文本文件與名稱“text.txt”位於同一目錄中，並且您希望將文件保存在同一目錄中。 請根據您的需要更改文件路徑。這應該適合您：

with open('./text.txt', 'r') as content:
    paragraphs = list(filter(lambda x : x != '', content.read().split('------------------------------------------------------------------------')))
    for index, para in enumerate(paragraphs):
        filepath = './new_file' + str(index) + '.txt'
        with open(filepath, 'w') as file:
            file.write(para)

Answer 3

如果日志文件不是太大（例如 1GB），你可以這樣做：

with open('log.log') as f:
    content = f.read()
content = content.split('------------------------------------------------------------------------')
for idx, info in enumerate(content):
    with open('info{}.txt'.format(idx + 1), 'w') as f:
        f.write(info)

Answer 4

我認為“========...”也是模式之一，因此我使用了 re 模塊...因此，如果需要，您可以添加更多模式 [最初有 re.compile( "-+|=+")]

import re

with open("file.txt", "r") as input_file:
    text = input_file.read()
    regex = re.compile("-+")
    mo = regex.findall(text)

text = text.split("\n")
mo_wanted_patterns = [pattern for pattern in mo if len(pattern) > 5]
print(mo_wanted_patterns)

output_text = []
for index,line in enumerate(text):
    if line in mo_wanted_patterns:
        filepath = 'new_file' + str(index) + '.txt'
        
        with open(filepath, 'w') as file:
            file.write("\n".join(output_text))

        output_text = []

    elif line not in mo_wanted_patterns:
        output_text.append(line)

編輯：我注意到它的代碼比其他人提供的要復雜得多。 實施正則表達式使事情變得更加復雜，但很想知道它是否適合您

使用 python 如何使用模式匹配分隔文本行並將它們存儲到不同的文本文件中

問題描述

4 個解決方案

解決方案1
1 已采納 2021-05-10 07:24:56

解決方案2
0 2021-05-10 07:43:07

解決方案3
0 2021-05-10 07:48:11

解決方案4
0 2021-05-10 08:07:37

使用 python 如何使用模式匹配分隔文本行並將它們存儲到不同的文本文件中

問題描述

4 個解決方案

解決方案1 1 已采納 2021-05-10 07:24:56

解決方案2 0 2021-05-10 07:43:07

解決方案3 0 2021-05-10 07:48:11

解決方案4 0 2021-05-10 08:07:37

解決方案1
1 已采納 2021-05-10 07:24:56

解決方案2
0 2021-05-10 07:43:07

解決方案3
0 2021-05-10 07:48:11

解決方案4
0 2021-05-10 08:07:37