简体   繁体   中英

When using CasperJS, is it possible to interact with the DOM of a loaded page before any inline or external Javascript is executed?

The situation I have is that I'm opening a page using CasperJS.

The page in question has some Javascript (a combination of both inline and external) that removes several HTML elements from the document.

However, I want to be able to retrieve those elements using something like getElementsByXPath() within CasperJS before they are removed. Is this possible?

When I dump out the value of getPageContent(), the elements are not in there. However, if I set casper.page.settings.javascriptEnabled = false; before calling the page, getPageContent() now shows the raw HTML before any Javascript is executed, and the missing HTML tags are there. The problem now, though, is that disabling Javascript prevents any usage of evaluate() , so I still can't retrieve the elements. I could probably do it using a regex of some sort on the raw content, but I was hoping there could be a cleaner method of doing it.

Any suggestions welcome!

I've never heard of anyone doing this. I wouldn't say using regex is a bad idea. I usually scrape with a combination of casperjs xpath and python regex it works extremely well and I personally don't think it's any messier than trying to intercept JavaScript before the page is loaded.

That being said, casperjs allows you to inject JavaScript which you could use jquery if it's available on the page you're requesting. The below code fires before anything is loaded. You actually have to go out of your way to add code to prevent this from firing before the page loads.

<script type='text/javascript'>
    alert("Stop that parsing!");
</script>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM