简体   繁体   中英

Cannot scrape with beautifulsoup and urllib because of javascript variable

Unfortunately I am newbie with beautifulsoup and urllib so I might not even ask correctly what I need.. There is a website www.example.com I need to extract some data from this website which displays a random message.

The problem is the message is displayed after the user presses a button, otherwise it shows a general message like "press the button to see the message".

After searching stackoverflow I realised that probably there is NO way to change the variables by calling with my browser the url like this.. www.example.com/?showRandomMsg='true'

In some threads I read that maybe I can do it with bookmarlets..

Is there anyway to use bookmarklets with beautifulsoup or urllib in order to access the website and make it display a random message?

Thanks in advance! :D

I came back after a long time just to answer quickly my own question..

I found many solutions and tutorials on the web and most of them were suggesting using Selenium and xpath but this method was more complex than I needed..

So I ended up using Selenium ONLY for emulating the Browser (firefox in my case) and grabbing the html after the page was loaded completely.

After that I was still using beautifoulsoup to parse the html code (whihc now would include the javascript data too).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM