简体   繁体   中英

How to crawl websites that have front end js framework

We are currently trying swiftype and wanted to see how to Crawl our website that has javascript frameworks becauase there are async calls.

I created a engine and was able to run a crawl based my sitemap, but instead of reading the actual content, it is reading my Angular js code.

For eg:

if have an angular code something like <div ng-class='grey title'> {{ctrl.title}}</div>

and if this data gets binded on page load, instead of reading the title, it reads the actual code as {{ctrl.title}} so when i search, the page returns something like "This article is about {{ctrl.title}} . We take you through.... "

Any idea on how to make it compatible with js frameworks?

You can use a "headless" browser like Phantom.js . "Headless" means it doesn't have a GUI. Since it's actually a browser it'll interpret the page correctly. It can be started from a JavaScript that runs server-side. Check out Web Scraping : Handling AJAX website part I and the code on GitHub: introWebScraping .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM