简体   繁体   English

使用 Node.js 抓取网页

[英]WebScraping with Node.js

I was wondering if someone can give me an example of how to web scrape multiple pages with Node.js?我想知道是否有人可以给我一个如何使用 Node.js 抓取多个页面的示例? I am finding examples online but I want to scrape a search result webpage.我正在网上查找示例,但我想抓取搜索结果网页。 Then if there is a "next" button to move to the next page and view more results, I want to move to the next page and scrape it as well.然后,如果有一个“下一步”按钮可以移到下一页并查看更多结果,我想移到下一页并将其刮掉。

Has anyone done something similar to this?有没有人做过类似的事情?

Thanks!谢谢!

I managed something like this to work using nightmare.js .我使用nightmare.js管理这样的工作。 It let's you click('#someElement') and wait('#someElement') , getting the page content between these actions using evaluate .它让你click('#someElement')wait('#someElement') ,使用evaluate获取这些操作之间的页面内容。 It must be in websites that let you do that , note that you may need a while using the nightmare.exists or a for using the page count, for that you may need to use a query selector that can get all specified elements, like document.querySelectorAll('.nextPageElement').length (using the querySelectorAll ) to get that count.它必须是网站,让你这样做,请注意您可能需要while使用nightmare.existsfor使用页数,对于您可能需要使用的查询选择,可以让所有指定的元素,如document.querySelectorAll('.nextPageElement').length (使用querySelectorAll )来获得该计数。 Just try to keep using variable lifting when needed and don't fall in a callback hell then nightmare.js will do the job.只需尝试在需要时继续使用变量提升,不要陷入回调地狱,然后 nightmare.js 将完成这项工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM