Python：如何在 HTML 閱讀 Gmail 中搜索網址？

Question

我正在嘗試自動化腳本以下載我通常收到的 pdf。 如果附上pdf，我有正確的程序（我想）。

我的問題是（我認為），我收到電子郵件中嵌入的 HTML，其中包含 HTML 中的 URL。 例如：

這是來自垃圾郵件文件夾，但它可以幫助我們了解問題...

我有以下代碼：mail.py

import pickle,os.path,base64,time
from datetime import datetime

from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

def get_credentials(token_path,credentials_path,scopes):
    creds = None
    if os.path.exists(token_path):
        with open(token_path, 'rb') as token:
            creds = pickle.load(token)

    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(credentials_path, scopes)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open(token_path, 'wb') as token:
            pickle.dump(creds, token)
    return creds

def get_labels(service):
    return service.users()\
                  .messages()\
                  .list(userId='me',labelIds = labels)\
                  .execute()\
                  .get('labels',[])

def get_all_messages_id(service,labels=["INBOX"]):
    return service.users()\
                  .messages()\
                  .list(userId='me',labelIds = labels)\
                  .execute()\
                  .get("messages")

def get_message(message_id,service):
    return service.users()\
                  .messages()\
                  .get(userId='me', id=message_id)\
                  .execute()

def get_subject_of_message(message):
    for header in message.get("payload").get("headers"):
        for k,v in header.items():
            if v=='Subject': return header.get("value")

那么，如果我使用...

 >>> service = mail.login("token.pickle","credentials.json")
 >>> message_id = mail.get_all_messages_id(service)[0]
 >>> mail.get_message(message_id.get("id"),service)

我可以在 str 模式下看到“Original Xiaomi Mi Band 4 ...”（message_id 沒問題），但我看不到它的 URL。

相反，我可以看到一個非常大和丑陋的字符串

我認為“text/html”標簽阻止了我，但我不知道如何繼續。 如果我有 HTML 格式的文件和標簽，我可以使用 BeautifulSoup 來分析它。 但我有這個丑陋的字符串......

有沒有人更早發現這個問題？

謝謝你的幫助

PS：如果有人想知道我是如何生成 token.pickle 和credentials.json 來重復它的，您可以查看 Google 的 API 文檔，我已按照他們的說明進行操作，這非常簡單。

Answer 1

那個丑陋的字符串是base64編碼的內容，

你所要做的就是解碼和解析它。

嘗試這樣的事情：

str(base64.urlsafe_b64decode(encoded_string_here), "utf-8")

參考

64位蟒蛇

Python：如何在 HTML 閱讀 Gmail 中搜索網址？

問題描述

1 個解決方案

解決方案1
2 已采納 2020-03-10 13:50:03

參考

Python：如何在 HTML 閱讀 Gmail 中搜索網址？

問題描述

1 個解決方案

解決方案1 2 已采納 2020-03-10 13:50:03

參考

解決方案1
2 已采納 2020-03-10 13:50:03