简体   繁体   中英

BeautifulSoup empty result

I'm currently running this code:

    import urllib
    from bs4 import BeautifulSoup

    htmltext = urllib.urlopen("http://www.fifacoin.com/")
    html = htmltext.read()

    soup = BeautifulSoup(html)
    for item in soup.find_all('tr', {'data-price': True}):
        print(item['data-price'])

When I run this code I don't get any output at all, when I know there are html tags with these search parameters in them on that particular website. I'm probably making an obvious mistake here, i'm new to Python and BeautifulSoup.

The problem is that the price list table is loaded through javascript, and urllib does not include any javascript engine as far as I know. So all of the javascript in that page, which is executed in a normal browser, is not executed in the page fetched by urllib. The only way of doing this is emulating a real browser. Solutions that come to mind are PhantomJS and Node.js.

I recently did a similar thing with nodejs (although I am a python fan as well) and was presently surprised. I did it a little differently, but this page seems to explain quite well what you would want to do: http://liamkaufman.com/blog/2012/03/08/scraping-web-pages-with-jquery-nodejs-and-jsdom/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM