無法使用 openpyxl 在 excel 文件中寫入 html 內容

Question

我在python中創建了一個openpyxl從網站上抓取第一個標題和它的描述，並使用openpyxl庫在excel文件中編寫相同的openpyxl 。 這里要注意的重要一點是，我希望將標題保存為文本，但將描述保存為原始 html 內容，而不是文本。

我試過這樣：

import requests
from bs4 import BeautifulSoup
from openpyxl import Workbook

link = "https://stackoverflow.com/questions/tagged/web-scraping"
wb = Workbook()
wb.remove(wb['Sheet'])

def fetch_content(link):
    req = requests.get(link)
    soup = BeautifulSoup(req.text,"lxml")
    title = soup.select_one("#questions .summary .question-hyperlink").get_text(strip=True)
    desc = soup.select_one("#questions .summary")

    ws.append([title,desc])
    print(title,desc)

if __name__ == '__main__':
    ws = wb.create_sheet("output")
    ws.append(['Title','Description'])
    fetch_content(link)
    wb.save("SO.xlsx")

當我運行腳本時，出現以下錯誤：

raise ValueError("Cannot convert {0!r} to Excel".format(value))
ValueError: Cannot convert <div class="summary"> -----so on

該 excel 文件中的預期輸出（均被截斷）：

How to scrape data   <div class="summary">

Answer 1

stovfl 和robot.txt 是完美的解決方案。 我冒昧地發布了答案，因為我經常忘記這種方法。

def fetch_content(link):
    req = requests.get(link)
    soup = BeautifulSoup(req.text,"lxml")
    title = soup.select_one("#questions .summary .question-  hyperlink").get_text(strip=True)
    desc = soup.select_one("#questions .summary")

    ws.append([title,str(desc)]) #cast desc to str
    print(title,desc)

無法使用 openpyxl 在 excel 文件中寫入 html 內容

問題描述

1 個解決方案

解決方案1
2 已采納 2019-12-28 11:25:43

無法使用 openpyxl 在 excel 文件中寫入 html 內容

問題描述

1 個解決方案

解決方案1 2 已采納 2019-12-28 11:25:43

解決方案1
2 已采納 2019-12-28 11:25:43