简体   繁体   中英

Parsing MIME Files from TXT file using Python

I'm trying to parse the Mimefile in a TXT file the format is like this

Content-Type: multipart/related; boundary="MIMEBoundary"
MIME-Version: 1.0
Content-Description: This Transmission File is created with Pegasus Test Suite
X-eFileRoutingCode: MEF
Content-Transfer-Encoding: Binary
Content-Location: manifest_xml

--MIMEBoundary
Content-Type: text/xml; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit

<?xml version='1.0' encoding='UTF-8'?>
<SOAP:Envelope xmlns="http://www.efiles.id/efile" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/" xmlns:efile="http://www.efiles.id/efile" xsi:schemaLocation="http://schemas.xmlsoap.org/soap/envelope/ ../message/SOAP.xsd http://www.files./efile ../message/efileMessage.xsd"><SOAP:Header><IFATransmissionHeader><MessageId>012342018ABCDEFGHIJK</MessageId><TransmissionTs>2018-07-01T09:51:56-05:00</TransmissionTs><TransmitterDetail><ETIN>XXXXX</ETIN></TransmitterDetail></IFATransmissionHeader></SOAP:Header><SOAP:Body><TransmissionManifest><SubmissionDataList><Cnt>1</Cnt><SubmissionData><SubmissionId>0123452018OPQRSTUVWX</SubmissionId><ElectronicPostmarkTs>2018-07-01T09:51:56-05:00</ElectronicPostmarkTs></SubmissionData></SubmissionDataList></TransmissionManifest></SOAP:Body></SOAP:Envelope>
--MIMEBoundary
Content-Type: application/octet-stream
MIME-Version: 1.0
Content-Transfer-Encoding: base64

UEsDBBQAAAAAAOh2lFQAAAAAAAAAAAAAAAALAAAAYXR0YWNobWVudC9QSwMEFAAAAAAA6HaUVAAA
AAAAAAAAAAAAAAQAAAB4bWwvUEsDBBQAAAAAAOh2lFT0i/87pT0AAKU9AAAVAAAAYXR0YWNobWVu
dC9zYW1wbGUucGRmJVBERi0yLjAKJbq63toKMSAwIG9iajw8L1R5cGUvQ2F0YWxvZy9QYWdlcyAy
IDAgUi9NZXRhZGF0YSAxMiAwIFI+PgplbmRvYmoKMiAwIG9iajw8L1R5cGUvUGFnZXMvS2lkc1sz
IDAgUl0vQ291bnQgMT4+CmVuZG9iagoxMiAwIG9iajw8L1R5cGUvTWV0YWRhdGEvU3VidHlwZS9Y

--MIMEBoundary--

i want to separate XML and encode it into separate string, how i have made a decoder like this

class NewlineSafeBytesParser(email.parser.BytesParser):
    def parse(self, fp, headersonly=False):
        from io import TextIOWrapper
        fp = TextIOWrapper(fp, encoding='ascii', errors='surrogateescape', newline='')
        try:
            return self.parser.parse(fp, headersonly)
        finally:
            fp.detach()

# Subclassing:
parser = NewlineSafeBytesParser()

but I get a corrupt decoded result, how do separate the encoded zip file (Base64) encode from the txt file so that it can be decoded separately?

Parsing on python3 and showing only Content-Type: text/xml

mailFile = open("list1.txt", "rb").read()
msg = email.message_from_file(mailFile)
for part in msg.walk():
    print(part.get_content_type())
    if part.get_content_type() == 'text/xml':
        print(part.get_payload())

<?xml version='1.0' encoding='UTF-8'?>
<SOAP:Envelope xmlns="http://www.efiles.id/efile" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/" xmlns:efile="http://www.efiles.id/efile" xsi:schemaLocation="http://schemas.xmlsoap.org/soap/envelope/ ../message/SOAP.xsd http://www.files./efile ../message/efileMessage.xsd"><SOAP:Header><IFATransmissionHeader><MessageId>012342018ABCDEFGHIJK</MessageId><TransmissionTs>2018-07-01T09:51:56-05:00</TransmissionTs><TransmitterDetail><ETIN>XXXXX</ETIN></TransmitterDetail></IFATransmissionHeader></SOAP:Header><SOAP:Body><TransmissionManifest><SubmissionDataList><Cnt>1</Cnt><SubmissionData><SubmissionId>0123452018OPQRSTUVWX</SubmissionId><ElectronicPostmarkTs>2018-07-01T09:51:56-05:00</ElectronicPostmarkTs></SubmissionData></SubmissionDataList></TransmissionManifest></SOAP:Body></SOAP:Envelope>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM