簡體   English   中英

Beautiful Soup不會讓我解析包含HTML的變量

[英]Beautiful Soup won't let me parse a variable containing HTML

我試圖漂亮地打印存儲在變量中的HTML電子郵件,但是我不斷從BS4中收到錯誤消息,提示它期望字符串。

這是我的代碼:

from bs4 import BeautifulSoup
import imaplib
import email


mail = imaplib.IMAP4_SSL('imap.gmail.com')

username = raw_input('USERNAME (email):')
password = raw_input('PASSWORD: ')

try:
    mail.login(username, password)
    print "Logged in as %r !" % username
except: 
    imaplib.error
    print "Log in failed."

mail.list()
# Out: list of "folders" aka labels in gmail.
mail.select("inbox") # connect to inbox.

result, data = mail.uid('search', None, '(FROM "tiffany@e.tiffany.com")')
latest_email_uid = data[0].split()[1]
result, data = mail.uid('fetch', latest_email_uid, '(RFC822)')
raw_email = data[0][1]

email_message = email.message_from_string(raw_email)

print email_message

html = email_message
soup = BeautifulSoup(html)
print soup.prettify()

這是我正在使用的印刷HTML電子郵件: http : //pastebin.com/qfAHwkdV

這是我得到的錯誤:

Traceback (most recent call last):
  File "tiff.py", line 34, in <module>
    soup = BeautifulSoup(html)
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/bs4/__init__.py", line 169, in __init__
    self.builder.prepare_markup(markup, from_encoding))
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/bs4/builder/_htmlparser.py", line 139, in prepare_markup
    dammit = UnicodeDammit(markup, try_encodings, is_html=True)
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/bs4/dammit.py", line 203, in __init__
    self._detectEncoding(markup, is_html)
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/bs4/dammit.py", line 372, in _detectEncoding
    xml_encoding_match = xml_encoding_re.match(xml_data)
TypeError: expected string or buffer

為什么我無法將HTML pss轉換為變量以與BS4一起解析?

謝謝

根據有關.message_from_string的文檔 ,這不會返回字符串,而是一個消息對象。 BeautifulSoup()需要一個字符串(或緩沖區)。

也許soup = BeautifulSoup(str(html))soup = BeautifulSoup(unicode(html))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM