简体   繁体   中英

Web scraping using Apify

I'm trying to scrape URLs from https://en.wikipedia.org/wiki/List_of_hedge_funds

Specifically, I'm trying to use Apify to scrape that page and return a list of URLs from anchor tags present in the HTML. In my console, I expect to see the value of the href attribute of one or more anchor tags that exist on the target page in a property called myValue . I also expect to see the page title in a property called title . Instead, I just see the following URL property and its value.

在此处输入图片说明

My Apify actor uses the Puppeteer platform. So I'm using a pageFunction similar to the way Puppeteer uses it .

Below is a screen shot of the Apify UI just before I run it.

在此处输入图片说明

Page function
function pageFunction( context ) { // called on every page the crawler visits, use it to extract data from it var $ = context.jQuery; var result = { title: $('.wikitable').text, myValue: $('a[href]').text, }; return result; }

What am I doing wrong?

You have a typo in your code, text is a function so you need to add parentheses:

var result = {
    title: $('.wikitable').text(),
    myValue: $('a[href]').text(),
};

But note that this will probably not do what you expect anyway - it will return text of all matched elements. You probably need to use jQuery's each() function ( https://api.jquery.com/jquery.each/ ) to iterate the found elements, push some values from them to an array and return the array from your page function.

该页面似乎是由 JavaScript 加载的,所以实际上我必须使用异步代码。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM