简体   繁体   中英

Python: Extract text from website that is not in the raw HTML

I have a situation where I am scraping data from webpages and need to store that data (a bunch of strings) in a txt file. I already have the code written to do this for many websites, however I have a roadblock where BeautifulSoup does not seem to work.

Take this website for example: http://www.vucommodores.com/gametracker/launch/gt_mbasebl.html?event=1530990&school=vand&sport=mbasebl&camefrom=&startschool=&

I want to be able to click on the play-by-play button and then extract the text from the 1st inning, 2nd inning, etc. Is anyone aware of a method to do so, because the text is not available in the raw HTML as has been the case with all of my other examples.

Thanks!

I don't think this is what BeautifulSoup is meant for. You can use Selenium for Python to interact with the page as if from a browser, and simulate the click. Then extract from the html.

@Lgiro is right. Is you want to manipulate with page elements, for example switch tabs or click buttons, you need simulate a browser and inject javascript into the window. The best tool for this is Selenium. Here are python-selenium docs .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM