簡體   English   中英

將編碼的字符串轉換為普通的可打印字符

[英]convert encoded strings to normal printable characters

我試圖從 MBOX 文件中提取詳細信息,並創建了以下示例程序。

這有效,但一些標題打印編碼的字符串,例如

 =?UTF-8?B?QVJNIE1hY3MgYXJlIGNvbWluZywgdGhyZWUgeWVhcnMgYWZ0ZXIgQXBwbGU=?=
 =?UTF-8?B?4oCZcyBhdHRpdHVkZSBjaGFuZ2U=?=

我收集“=?UTF-8?B?” 表示 Base64 編碼,所以我想必須有一個 2 步過程才能從 Base64 轉換為 UTF-8。

任何人都可以指出一種將這些字符串轉換為普通可打印字符的方法嗎?

#! /usr/bin/env python3
#import locale
#2020-02-27

"""
Extract Subject from MBOX file
"""

import os, time
import mailbox
from email.header import Header

for message in mailbox.mbox('~/temp/Inbox'):
    subject = message['subject']
    sender = message['from']
    ddate = message['Delivery-date'].
    print(subject, sender)

我取得了一些進展——如果我去掉

=?UTF-8?B?

?=  

然后調用base64.b64decode()我得到可讀文本

上面的字符串變成了 b'\\xe2\\x80\\x99s 姿態變化'

=?UTF-8?B?QVJNIE1hY3MgYXJlIGNvbWluZywgdGhyZWUgeWVhcnMgYWZ0ZXIgQXBwbGU=?=

變成了“ARM Macs 來了,比蘋果晚了三年”

將這些連接在一起給出主題

蘋果態度轉變三年后,ARM Mac 即將問世

這行得通嗎?

#! /usr/bin/env python3
"""
Extract Subject from MBOX file
"""

import os, time
import mailbox
from email.header import Header

for message in mailbox.mbox('~/temp/Inbox'):
    subject = message['subject']
    sender = message['from']
    ddate = message['Delivery-date'].
    print(subject.decode('utf-8', 'ignore'), sender.decode('utf-8', 'ignore'))

在線代碼鏈接

我編寫了一個函數來轉換 UTF-8 Base64 或 Quoted Printable 字符串,盡管我很驚訝找不到現有的方法。

#! /usr/bin/env python3
#import locale
#2020-02-27

"""
Extract Subject from MBOX file
"""

import os, time
import mailbox
import base64, quopri

def bdecode(s):
    """
    Convert UTF-8 Base64 or Quoted Printable strings to str
    """
    outstr = ""
    if s is None:
        return outstr
    for ss in s.splitlines():   # split multiline strings
        sss = ss.strip()
        for sssp in sss.split(' '):   # split multiple strings
            if sssp.upper().startswith('=?UTF-8?B?'):
                bbb = base64.b64decode(sssp[10:-2])
                outstr+=bbb.decode("utf-8")
            elif sssp.upper().startswith('=?UTF-8?Q?'):
                bbb = quopri.decodestring(sssp[10:-2])
                outstr+=bbb.decode("utf-8")
            else:
                outstr+=sssp
    return outstr

for message in mailbox.mbox('~/temp/Inbox'):
    subject = message['subject']
    print(bdecode(subject))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM