简体   繁体   English

带有 Selenium 和 Node.js 的递归 Facebook 页面 Webscraper

[英]Recursive Facebook Page Webscraper with Selenium & Node.js

What I try to do is to loop through an array of Facebook page IDs and to return the code from each event page.我尝试做的是遍历 Facebook 页面 ID 数组并从每个事件页面返回代码。 Unfortunately, I only get the code of the last page ID in the array but as many times as elements are in the array.不幸的是,我只得到数组中最后一个页面 ID 的代码,但与数组中元素的次数一样多。 Eg when I have 3 ID's in the array I get 3 times the code of the last page ID.例如,当我在数组中有 3 个 ID 时,我得到的是最后一页 ID 代码的 3 倍。

I already experimented with async await but I had no success.我已经尝试过 async await 但我没有成功。

The expected outcome would be the code of each page.预期的结果将是每个页面的代码。 Thank you for any help and examples.感谢您提供任何帮助和示例。

 //Looping through pages pages.forEach( function(page) { //Creating URL let url = "https://mbasic.facebook.com/"+page+"?v=events"; //Getting URL driver.get(url).then( function() { //Page loaded driver.getPageSource().then(function(result) { console.log(result); }); } ); } );

you faced the same issue i did when i created a scraper using python and selenium.当我使用 python 和 selenium 创建刮板时,您遇到了同样的问题。 Facebook has countermeasure on manual URL change, you cannot change it , i receive the same data again and again even though it was automated. Facebook 有手动更改 URL 的对策,您无法更改它,即使它是自动化的,我也会一次又一次地收到相同的数据。 in order to get a good result you need to have access of face books Graph API which provides a complete object of Facebook page with its pagination URL.为了获得良好的结果,您需要访问 Facebook Graph API,该 API 提供了完整的 Facebook 页面对象及其分页 URL。

or the second way i got it write was i used on click button of selenium browser automation to scroll down the next page.it wont work like you are typing , i prefer the usage of graph API或者我写的第二种方式是我在 selenium 浏览器自动化的点击按钮上向下滚动下一页。它不会像你打字那样工作,我更喜欢使用图形 API

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM