[英]Read and Write multiple excel data into one excel file using openpyxl
[英]Unable to write html content in an excel file using openpyxl
我在python中創建了一個openpyxl
從網站上抓取第一個標題和它的描述,並使用openpyxl
庫在excel文件中編寫相同的openpyxl
。 這里要注意的重要一點是,我希望將標題保存為文本,但將描述保存為原始 html 內容,而不是文本。
我試過這樣:
import requests
from bs4 import BeautifulSoup
from openpyxl import Workbook
link = "https://stackoverflow.com/questions/tagged/web-scraping"
wb = Workbook()
wb.remove(wb['Sheet'])
def fetch_content(link):
req = requests.get(link)
soup = BeautifulSoup(req.text,"lxml")
title = soup.select_one("#questions .summary .question-hyperlink").get_text(strip=True)
desc = soup.select_one("#questions .summary")
ws.append([title,desc])
print(title,desc)
if __name__ == '__main__':
ws = wb.create_sheet("output")
ws.append(['Title','Description'])
fetch_content(link)
wb.save("SO.xlsx")
當我運行腳本時,出現以下錯誤:
raise ValueError("Cannot convert {0!r} to Excel".format(value))
ValueError: Cannot convert <div class="summary"> -----so on
該 excel 文件中的預期輸出(均被截斷):
How to scrape data <div class="summary">
stovfl 和robot.txt 是完美的解決方案。 我冒昧地發布了答案,因為我經常忘記這種方法。
def fetch_content(link):
req = requests.get(link)
soup = BeautifulSoup(req.text,"lxml")
title = soup.select_one("#questions .summary .question- hyperlink").get_text(strip=True)
desc = soup.select_one("#questions .summary")
ws.append([title,str(desc)]) #cast desc to str
print(title,desc)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.