How to save scraped data to csv unsing pandas

Question

I want to save my scraped data to csv file using pandas. But I keep getting a bug.

Here's my code:

import requests
from bs4 import BeautifulSoup
import pandas as pd

link = ("https://sofifa.com/team/1/arsenal/?&showCol%5B%5D=ae&showCol%5B%5D=hi&showCol%5B%5D=le&showCol%5B%5D=vl&showCol%5B%5D=wg&showCol%5B%5D=rc")
get_text = requests.get(link)
soup = BeautifulSoup(get_text.content, "lxml") 
table = soup.find("table", {"class":"table table-hover persist-area"})
table1 = table.get_text()

table1.to_csv("Arsenal_players.csv")

Answer 1

You need to enter more explanation before asking a question like type of error you get this will be more helpful to give the answer. Anyway I run your code and see the error as expected. well table1 variable only consist strings now because of

table1 = table.get_text()

so there is no function in your situation to enter all data in csv but you can find help here . But remember for next time be precise about your problem.

Answer 2

You need to first read the html into a pandas dataframe using read_html , and then use to_csv to write to a file. Here is an example:

import requests
from bs4 import BeautifulSoup
import pandas as pd

link = ("https://sofifa.com/team/1/arsenal/?&showCol%5B%5D=ae&showCol%5B%5D=hi&showCol%5B%5D=le&showCol%5B%5D=vl&showCol%5B%5D=wg&showCol%5B%5D=rc")
get_text = requests.get(link)
soup = BeautifulSoup(get_text.content, "lxml")
table = soup.find("table", {"class":"table table-hover persist-area"})

# produces a list of dataframes from the html, see docs for more options
dfs = pd.read_html(str(table)) 
dfs[0].to_csv("Arsenal_players.csv")

The read_html method has quite a few options that can change the behavior. You can also use it to read your link directly instead of first using requests/BeautifulSoup (it can do that under the hood).

It might look something like this, but this is untested because that link gives a 403 forbidden when I do this (perhaps they are blocking based on user agent):

dfs = pd.read_html(link, attrs={"class":"table table-hover persist-area"})

EDIT: since read_html doesn't allow you to specify a user agent, I believe this will end up being the most concise way for this particular link:

dfs = pd.read_html(
    requests.get(link).text,
    attrs={"class":"table table-hover persist-area"}
)
dfs[0].to_csv("Arsenal_players.csv")

How to save scraped data to csv unsing pandas

Question

2 answers

solution1
1 2020-03-04 12:33:13

solution2
1 ACCPTED 2020-03-04 12:52:07

How to save scraped data to csv unsing pandas

Question

2 answers

solution1 1 2020-03-04 12:33:13

solution2 1 ACCPTED 2020-03-04 12:52:07

solution1
1 2020-03-04 12:33:13

solution2
1 ACCPTED 2020-03-04 12:52:07