I'm downloading football data with pandas read_html function, but not struggling to clean the player names with all the accented characters. This is what I have so far:
import pandas as pd
from unidecode import unidecode
shooting = pd.read_html("https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F9%2Fshooting%2FPremier-League-Stats&div=div_stats_shooting")
for idx,table in enumerate(shooting):
print("***************************")
print(idx)
print(table)
shooting = table
for col in [('Unnamed: 1_level_0', 'Player')]:
shooting[col] = shooting[col].apply(unidecode)
shooting
shooting = table
#print(shooting.droplevel(1))
table.to_csv (r'C:\Users\khabs\OneDrive\Documents\Python Testing\shooting.csv', index = False, header=True)
print (shooting)
I think the issue is that the coding is messed before I even do the cleaning, but really not sure.
Any help would be greatly appreciated!!
Just use the encoding parameter in pandas
.
import pandas as pd
url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F9%2Fshooting%2FPremier-League-Stats&div=div_stats_shooting"
shooting = pd.read_html(url, header=1, encoding='utf8')[0]
However, that (and I'm assuming) will not get you what you want, as there are extra html characters in the response returned from that widget.
Just go after the actual html. The table is within the comments.
import requests
import pandas as pd
url = 'https://fbref.com/en/comps/9/shooting/Premier-League-Stats'
html = requests.get(url).text.replace('<!--', '').replace('-->', '')
shooting = pd.read_html(html, header=1)[-1]
shooting = shooting[shooting['Rk'].ne('Rk')]
Output:
print(shooting.head(10))
Rk Player Nation Pos ... npxG/Sh G-xG np:G-xG Matches
0 1 Brenden Aaronson us USA FW,MF ... 0.03 -0.1 -0.1 Matches
1 2 Che Adams sct SCO FW ... 0.09 +1.6 +1.6 Matches
2 3 Tyler Adams us USA MF ... 0.01 0.0 0.0 Matches
3 4 Tosin Adarabioyo eng ENG DF ... NaN 0.0 0.0 Matches
4 5 Rayan Aït Nouri fr FRA DF ... 0.08 -0.1 -0.1 Matches
5 6 Nathan Aké nl NED DF ... 0.05 -0.2 -0.2 Matches
6 7 Thiago Alcántara es ESP MF ... NaN 0.0 0.0 Matches
7 8 Trent Alexander-Arnold eng ENG DF ... 0.05 -0.2 -0.2 Matches
8 9 Alisson br BRA GK ... NaN 0.0 0.0 Matches
9 10 Dele Alli eng ENG FW,MF ... NaN 0.0 0.0 Matches
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.