简体   繁体   中英

How to scrape data from a specific table in HTML using BeautifulSoup, Requests, Python?

Here is the code I currently have:

from bs4 import BeautifulSoup

import requests

url  = requests.get("http://eiupanthers.com/boxscore.aspx?path=baseball&id=5065").content

soup = BeautifulSoup(url, 'html.parser')

table = soup.find('table', {'class': 'sidearm-table play-by-play'})

My table variable continually returns that is empty (or 'None'). This may merely be a syntax issue. I am very proficient in Matlab, however, I am fairly new to Python/BeautifulSoup/Requests/etc.

Any pointers would be much appreciated.

I am mainly attempting to extract the data from the play-by-play tables so that I can parse this data in an alternative program and assemble data structures for individual players. This part I am quite confident I can accomplish once I assemble the data.

Thanks for any help!

from bs4 import BeautifulSoup

import requests

header = {'User-agent' : 'Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5'}

url = requests.get("http://eiupanthers.com/boxscore.aspx?path=baseball&id=5065", headers=header).text

soup = BeautifulSoup(url, 'html.parser')
table = soup.find('table', {'class': 'sidearm-table play-by-play'})

print(table)

The issue seems to be that the website requires some sort of headers, even tho the request module has quite a good support you'll have to pass for instance something like mentioned above.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM