將抓取的數據存儲到 Python 中的文本文件中

Question

我能夠使用 Beautifulsoup 抓取數據，現在希望生成一個文件，其中包含我使用 Beautiful Soup 抓取的所有數據。

file = open("copy.txt", "w") 
data = soup.get_text()
data
file.write(soup.get_text()) 
file.close()

我沒有看到文本文件中的所有標簽和全部內容。 關於如何實現它的任何想法？

Answer 1

您可以使用：

with open("copy.txt", "w") as file:
    file.write(str(soup))

如果您有一個將被抓取的 URL 列表，然后您想將抓取的每個 URL 存儲在不同的文件中，您可以嘗試：

my_urls = [url_1, url_2, ..., url_n]
for index, url in enumerate(my_urls):
    # .............
    # some code to scrape 
    with open(f"scraped_{index}.txt", "w") as file:
        file.write(str(soup))

Answer 2

快速解決方案：

您只需將湯轉換為字符串即可。 使用測試站點，以防其他人希望遵循：

from bs4 import BeautifulSoup as BS
import requests

r = requests.get("https://webscraper.io/test-sites/e-commerce/allinone")
soup = BS(r.content)

file = open("copy.txt", "w") 
file.write(str(soup))
file.close()

稍微好一點的解決方案：

更好的做法是為文件 IO 使用上下文（ with使用）：

from bs4 import BeautifulSoup as BS
import requests

r = requests.get("https://webscraper.io/test-sites/e-commerce/allinone")
soup = BS(r.content)

with open("copy.txt", "w") as file:
    file.write(str(soup))

將抓取的數據存儲到 Python 中的文本文件中

問題描述

2 個解決方案

解決方案1
1 2019-12-28 00:13:01

解決方案2
0 2019-12-28 00:18:51

將抓取的數據存儲到 Python 中的文本文件中

問題描述

2 個解決方案

解決方案1 1 2019-12-28 00:13:01

解決方案2 0 2019-12-28 00:18:51

解決方案1
1 2019-12-28 00:13:01

解決方案2
0 2019-12-28 00:18:51