简体繁体 English

使用 Node.js 抓取网页

[英]WebScraping with Node.js

原文 2018-03-06 18:09:38 0 1 javascript/ node.js/ web-scraping

I was wondering if someone can give me an example of how to web scrape multiple pages with Node.js?我想知道是否有人可以给我一个如何使用 Node.js 抓取多个页面的示例？ I am finding examples online but I want to scrape a search result webpage.我正在网上查找示例，但我想抓取搜索结果网页。 Then if there is a "next" button to move to the next page and view more results, I want to move to the next page and scrape it as well.然后，如果有一个“下一步”按钮可以移到下一页并查看更多结果，我想移到下一页并将其刮掉。

Has anyone done something similar to this?有没有人做过类似的事情？

Thanks!谢谢！

1 个解决方案

I managed something like this to work using nightmare.js .我使用nightmare.js管理这样的工作。 It let's you click('#someElement') and wait('#someElement') , getting the page content between these actions using evaluate .它让你click('#someElement')和wait('#someElement') ，使用evaluate获取这些操作之间的页面内容。 It must be in websites that let you do that , note that you may need a while using the nightmare.exists or a for using the page count, for that you may need to use a query selector that can get all specified elements, like document.querySelectorAll('.nextPageElement').length (using the querySelectorAll ) to get that count.它必须是网站，让你这样做，请注意您可能需要while使用nightmare.exists或for使用页数，对于您可能需要使用的查询选择，可以让所有指定的元素，如document.querySelectorAll('.nextPageElement').length （使用querySelectorAll ）来获得该计数。 Just try to keep using variable lifting when needed and don't fall in a callback hell then nightmare.js will do the job.只需尝试在需要时继续使用变量提升，不要陷入回调地狱，然后 nightmare.js 将完成这项工作。