I want to save my scraped data to csv file using pandas. But I keep getting a bug.
Here's my code:
import requests
from bs4 import BeautifulSoup
import pandas as pd
link = ("https://sofifa.com/team/1/arsenal/?&showCol%5B%5D=ae&showCol%5B%5D=hi&showCol%5B%5D=le&showCol%5B%5D=vl&showCol%5B%5D=wg&showCol%5B%5D=rc")
get_text = requests.get(link)
soup = BeautifulSoup(get_text.content, "lxml")
table = soup.find("table", {"class":"table table-hover persist-area"})
table1 = table.get_text()
table1.to_csv("Arsenal_players.csv")
You need to enter more explanation before asking a question like type of error you get this will be more helpful to give the answer. Anyway I run your code and see the error as expected. well table1 variable only consist strings now because of
table1 = table.get_text()
so there is no function in your situation to enter all data in csv but you can find help here . But remember for next time be precise about your problem.
You need to first read the html into a pandas dataframe using read_html , and then use to_csv
to write to a file. Here is an example:
import requests
from bs4 import BeautifulSoup
import pandas as pd
link = ("https://sofifa.com/team/1/arsenal/?&showCol%5B%5D=ae&showCol%5B%5D=hi&showCol%5B%5D=le&showCol%5B%5D=vl&showCol%5B%5D=wg&showCol%5B%5D=rc")
get_text = requests.get(link)
soup = BeautifulSoup(get_text.content, "lxml")
table = soup.find("table", {"class":"table table-hover persist-area"})
# produces a list of dataframes from the html, see docs for more options
dfs = pd.read_html(str(table))
dfs[0].to_csv("Arsenal_players.csv")
The read_html
method has quite a few options that can change the behavior. You can also use it to read your link directly instead of first using requests/BeautifulSoup (it can do that under the hood).
It might look something like this, but this is untested because that link gives a 403 forbidden when I do this (perhaps they are blocking based on user agent):
dfs = pd.read_html(link, attrs={"class":"table table-hover persist-area"})
EDIT: since read_html doesn't allow you to specify a user agent, I believe this will end up being the most concise way for this particular link:
dfs = pd.read_html(
requests.get(link).text,
attrs={"class":"table table-hover persist-area"}
)
dfs[0].to_csv("Arsenal_players.csv")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.