I am trying to scrape table data from this link
http://bet.hkjc.com/racing/pages/odds_wp.aspx?date=30-01-2017&venue=ST&raceno=2&lang=en
Here is my code
from lxml import html
import webbrowser
import re
import xlwt
import requests
import bs4
content = requests.get("http://bet.hkjc.com/racing/pages/odds_wp.aspx?date=30-01-2017&venue=ST&raceno=1&lang=en").text # Get page content
soup = bs4.BeautifulSoup(content, 'lxml') # Parse page content
table = soup.find('div', {'id': 'detailWPTable'}) # Locate that table tag
rows = table.find_all('tr') # Find all row tags in that table
for row in rows:
columns = row.find_all('td') # Find all data tags in each column
print ('\n')
for column in columns:
print (column.text.strip(),end=' ') # Output data in each column
It is not giving any output . Please help !
I just wanted to mention that id you are using are for the wrapping div, not for the child table element.
Maybe you could try something like:
wrapper = soup.find('div', {'id': 'detailWPTable'})
table_body = wrapper.table.tbody
rows = table_body.find_all('tr')
But thinking about it, the tr elements are also descendants of the wrapping div, so find_all should still find them %]
Update: adding tbody
Update: sorry I'm not allowed to comment yet :). Are you sure you have the correct document. Have you checked the whole soup that the tags are actually there?
And I guess all those lines could be written as:
rows = soup.find('div', {'id': 'detailWPTable'}).find('tbody').find_all('tr')
Update: Yeah the wrapper div is empty. So it seems that you don't get whats being generated by javascript like the other guy said. Maybe you should try Selenium as he suggested? Possibly PhantomJS as well.
I'm looking at the last line of your code:
print (column.text.strip(),end=' ') # Output data in each column
Are you sure that should read column.text
. Maybe you could try column.strings
or column.get_text()
. Or column.stripped_strings
even
You can try it with dryscrape
like so:
import dryscrape
from bs4 import BeautifulSoup as BS
import re
import xlwt
ses=dryscrape.Session()
ses.visit("http://bet.hkjc.com/racing/pages/odds_wp.aspx?date=30-01-2017&venue=ST&raceno=1&lang=en")
soup = BS(ses.body(), 'lxml') # Parse page content
table = soup.find('div', {'id': 'detailWPTable'}) # Locate that table tag
rows = table.find_all('tr') # Find all row tags in that table
for row in rows:
columns = row.find_all('td') # Find all data tags in each column
print ('\n')
for column in columns:
print (column.text.strip())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.