Web scraping using Apify

Question

I'm trying to scrape URLs from https://en.wikipedia.org/wiki/List_of_hedge_funds

Specifically, I'm trying to use Apify to scrape that page and return a list of URLs from anchor tags present in the HTML. In my console, I expect to see the value of the href attribute of one or more anchor tags that exist on the target page in a property called myValue . I also expect to see the page title in a property called title . Instead, I just see the following URL property and its value.

My Apify actor uses the Puppeteer platform. So I'm using a pageFunction similar to the way Puppeteer uses it .

Below is a screen shot of the Apify UI just before I run it.

Page function

function pageFunction( context ) { // called on every page the crawler visits, use it to extract data from it var $ = context.jQuery; var result = { title: $('.wikitable').text, myValue: $('a[href]').text, }; return result; }

What am I doing wrong?

Answer 1

You have a typo in your code, text is a function so you need to add parentheses:

var result = {
    title: $('.wikitable').text(),
    myValue: $('a[href]').text(),
};

But note that this will probably not do what you expect anyway - it will return text of all matched elements. You probably need to use jQuery's each() function ( https://api.jquery.com/jquery.each/ ) to iterate the found elements, push some values from them to an array and return the array from your page function.

Answer 2

该页面似乎是由 JavaScript 加载的，所以实际上我必须使用异步代码。

Web scraping using Apify

Question

2 answers

solution1
0 2020-02-25 08:38:20

solution2
0 2020-03-01 00:05:42

Web scraping using Apify

Question

2 answers

solution1 0 2020-02-25 08:38:20

solution2 0 2020-03-01 00:05:42

solution1
0 2020-02-25 08:38:20

solution2
0 2020-03-01 00:05:42