简体   繁体   中英

Getting the child element of a particular div element using beautiful soup

I am trying to scrape table data from this link

http://bet.hkjc.com/racing/pages/odds_wp.aspx?date=30-01-2017&venue=ST&raceno=2&lang=en

Here is my code

from lxml import html
import webbrowser
import re
import xlwt
import requests
import bs4

content = requests.get("http://bet.hkjc.com/racing/pages/odds_wp.aspx?date=30-01-2017&venue=ST&raceno=1&lang=en").text # Get page content
soup = bs4.BeautifulSoup(content, 'lxml') # Parse page content 

table = soup.find('div', {'id': 'detailWPTable'}) # Locate that table tag

rows = table.find_all('tr') # Find all row tags in that table

for row in rows:
    columns = row.find_all('td') # Find all data tags in each column
    print ('\n')
    for column in columns:
        print (column.text.strip(),end=' ') # Output data in each column

It is not giving any output . Please help !

在此处输入图片说明

The table is generated by JavaScrip and requests will only return html code like the picture shows.

Use selemium

I just wanted to mention that id you are using are for the wrapping div, not for the child table element.

Maybe you could try something like:

wrapper = soup.find('div', {'id': 'detailWPTable'})
table_body = wrapper.table.tbody
rows = table_body.find_all('tr')

But thinking about it, the tr elements are also descendants of the wrapping div, so find_all should still find them %]

Update: adding tbody

Update: sorry I'm not allowed to comment yet :). Are you sure you have the correct document. Have you checked the whole soup that the tags are actually there?

And I guess all those lines could be written as:

rows = soup.find('div', {'id': 'detailWPTable'}).find('tbody').find_all('tr')

Update: Yeah the wrapper div is empty. So it seems that you don't get whats being generated by javascript like the other guy said. Maybe you should try Selenium as he suggested? Possibly PhantomJS as well.

I'm looking at the last line of your code:

print (column.text.strip(),end=' ') # Output data in each column

Are you sure that should read column.text . Maybe you could try column.strings or column.get_text() . Or column.stripped_strings even

You can try it with dryscrape like so:

import dryscrape
from bs4 import BeautifulSoup as BS
import re
import xlwt

ses=dryscrape.Session()
ses.visit("http://bet.hkjc.com/racing/pages/odds_wp.aspx?date=30-01-2017&venue=ST&raceno=1&lang=en")
soup = BS(ses.body(), 'lxml') # Parse page content 

table = soup.find('div', {'id': 'detailWPTable'}) # Locate that table tag

rows = table.find_all('tr') # Find all row tags in that table

for row in rows:
    columns = row.find_all('td') # Find all data tags in each column
    print ('\n')
    for column in columns:
        print (column.text.strip())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM