如何將抓取的數據保存到 csv unsing pandas

Question

我想使用 Pandas 將我抓取的數據保存到 csv 文件中。 但是我一直遇到一個錯誤。

這是我的代碼：

import requests
from bs4 import BeautifulSoup
import pandas as pd

link = ("https://sofifa.com/team/1/arsenal/?&showCol%5B%5D=ae&showCol%5B%5D=hi&showCol%5B%5D=le&showCol%5B%5D=vl&showCol%5B%5D=wg&showCol%5B%5D=rc")
get_text = requests.get(link)
soup = BeautifulSoup(get_text.content, "lxml") 
table = soup.find("table", {"class":"table table-hover persist-area"})
table1 = table.get_text()

table1.to_csv("Arsenal_players.csv")

Answer 1

您需要先輸入更多解釋，然后再詢問諸如您得到的錯誤類型之類的問題，這對給出答案會更有幫助。 無論如何，我運行您的代碼並按預期看到錯誤。 好吧 table1 變量現在只包含字符串，因為

table1 = table.get_text()

因此，您的情況沒有將所有數據輸入 csv 的功能，但您可以在此處找到幫助。 但請記住，下次要准確地解決您的問題。

Answer 2

您需要首先使用read_html將 html 讀入Pandas數據幀，然后使用to_csv寫入文件。 下面是一個例子：

import requests
from bs4 import BeautifulSoup
import pandas as pd

link = ("https://sofifa.com/team/1/arsenal/?&showCol%5B%5D=ae&showCol%5B%5D=hi&showCol%5B%5D=le&showCol%5B%5D=vl&showCol%5B%5D=wg&showCol%5B%5D=rc")
get_text = requests.get(link)
soup = BeautifulSoup(get_text.content, "lxml")
table = soup.find("table", {"class":"table table-hover persist-area"})

# produces a list of dataframes from the html, see docs for more options
dfs = pd.read_html(str(table)) 
dfs[0].to_csv("Arsenal_players.csv")

read_html方法有很多可以改變行為的選項。 您還可以使用它直接讀取您的鏈接，而不是首先使用 requests/BeautifulSoup（它可以在后台執行此操作）。

它可能看起來像這樣，但這是未經測試的，因為當我這樣做時，該鏈接給出了 403 禁止（也許它們是基於用戶代理阻止的）：

dfs = pd.read_html(link, attrs={"class":"table table-hover persist-area"})

編輯：由於 read_html 不允許您指定用戶代理，我相信這最終將成為此特定鏈接的最簡潔方式：

dfs = pd.read_html(
    requests.get(link).text,
    attrs={"class":"table table-hover persist-area"}
)
dfs[0].to_csv("Arsenal_players.csv")

如何將抓取的數據保存到 csv unsing pandas

問題描述

2 個解決方案

解決方案1
1 2020-03-04 12:33:13

解決方案2
1 已采納 2020-03-04 12:52:07

如何將抓取的數據保存到 csv unsing pandas

問題描述

2 個解決方案

解決方案1 1 2020-03-04 12:33:13

解決方案2 1 已采納 2020-03-04 12:52:07

解決方案1
1 2020-03-04 12:33:13

解決方案2
1 已采納 2020-03-04 12:52:07