I am currently working on a project of finding empty classrooms in our school in real time. For that purpose, I need to extract substitution published on our school page ( https://ssnovohradska.edupage.org/substitution/ ?), since there might be any additional changes.
But when I try to extract the html source code and parse it with bs4, it cannot find the divs(class: "section print-nobreak") that contain the substitution text. When I took a look at the page source code(Ctrl+U) I found that there is only a javascript that prints it all directly.
Is there any way to extract the html after the javascript output has been already rendered?
Thanks for help!
Parsing HTML is unfortunately necessary to solve your problem. But I will explain how to find ways to avoid that in your future projects (not based on this website).
ReactDOM.render(React.createElement(
in the code. They're providing a HTML string to the createElement call, so I would suggest looking into the AJAX way of doing things.__gsh
) to not fail. So, going back to step 1 - seems like our only solution is to use regular expressions to find the text between "report_html":"<div class
and </div></div></div>
from the source code, if you're interested in today's date only. If you want to get contents for tomorrow or any other date - you will need to either fetch the page, save the cookies and find the token to supply to the request and then make that request, or use something like puppeteer or pyppeteer (since you've mentioned BS4) and load the webpage in that. If you aren't doing the data fetching that often, you should be fine overall.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.