简体   繁体   中英

Using python how can I separate lines of text using pattern matching and store them into different text file

Below is the example of code, it's a long log but I have just pasted a snippet of it. I need to extract lines that comes between a patter ---------------------------------- and store each information in a separate text file for every respectively.

Like:
------------------
info1 
------------------
info2
------------------
info3
------------------

Output:

fetch info1 and store it into file1.txt
fetch info2 and store it into file2.txt
fetch info3 and store it into file3.txt
And so on...

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

**This is the text data   :** 
------------------------------------------------------------------------

revision88 106 | rohit | 2018-06-08 13:41:46 +0530 (Fri, 08 Jun 2018) | 1 line

initial code import from FinanavialAnalytics branch


------------------------------------------------------------------------
revision88 99 | dhammdip.sawate | 2018-06-04 20:59:48 +0530 (Mon, 04 Jun 2018) | 1 line

Added Little Bit Java Support.!

Index: resources.properties
===================================================================
--- resources.properties    (revision 98)
+++ resources.properties    (revision 99)
@@ -1,15 +1,15 @@
 ####################Elastsic Search#########################
 ElasticClusterName=UProbe
-ElasticHost=192.168.0.91
+ElasticHost=192.168.0.73
 ElasticPort=19300
 
-esSQLURL=http://192.168.0.91:19200/_sql?sql=
+esSQLURL=http://192.168.0.73:19200/_sql?sql=
 resultsize =1024

@@ -72,45 +72,65 @@
 secfile /home/sandeep/Desktop/LIC/Uprobe-LIC/Uprobe-Dev.seed
 licfile /home/sandeep/Desktop/LIC/Uprobe-LIC/Uprobe-Dev.lic
 
------------------------------------------------------------------------
revision88 | sandeep.yadav | 2018-05-31 15:31:26 +0530 (Thu, 31 May 2018) | 1 line

Acc_Ref Data front-end side functionality with validation done.

------------------------------------------------------------------------

Try this:

lg = open("log.txt")
fl = open("temp.txt", 'w')
cnt = 0

for i in lg:
    if i == "------------------------------------------------------------------------\n":
        fl.close()
        cnt += 1
        fl = open("file{}.txt".format(str(cnt)), 'w')
    else:
        fl.write(i)

fl.close()
lg.close()

This can be done without even using regex.

I have assumed, the main text file is in the same directory with the name 'text.txt' and you want to save the files in the same directory. Please change the file paths according to your needs.This should work for you:

with open('./text.txt', 'r') as content:
    paragraphs = list(filter(lambda x : x != '', content.read().split('------------------------------------------------------------------------')))
    for index, para in enumerate(paragraphs):
        filepath = './new_file' + str(index) + '.txt'
        with open(filepath, 'w') as file:
            file.write(para)

If the log file is not too large (eg 1GB), you can do it with:

with open('log.log') as f:
    content = f.read()
content = content.split('------------------------------------------------------------------------')
for idx, info in enumerate(content):
    with open('info{}.txt'.format(idx + 1), 'w') as f:
        f.write(info)

I thought that the "========..." was also one of the patterns, hence i used re module... So with this you could add more patterns if need be [originally had re.compile("-+|=+")]

import re

with open("file.txt", "r") as input_file:
    text = input_file.read()
    regex = re.compile("-+")
    mo = regex.findall(text)

text = text.split("\n")
mo_wanted_patterns = [pattern for pattern in mo if len(pattern) > 5]
print(mo_wanted_patterns)

output_text = []
for index,line in enumerate(text):
    if line in mo_wanted_patterns:
        filepath = 'new_file' + str(index) + '.txt'
        
        with open(filepath, 'w') as file:
            file.write("\n".join(output_text))

        output_text = []

    elif line not in mo_wanted_patterns:
        output_text.append(line)

EDIT: I noticed its a far more complicated code than others have provided. Implementing regex made things more complicated, would be curious to know if it works for you though

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM