Possible to dump AJAX content from webpage?

Question

I would like to dump all the names on this page and all the remaining 146 pages.

The red/orange previous/next buttons uses JavaScript it seams, and gets the names by AJAX.

Question

Is it possible to write a script to crawl the 146 pages and dump the names?

Does there exist Perl modules for this kind of thing?

Answer 1

You can use WWW::Mechanize or another Crawler for this. Web::Scraper might also be a good idea.

use Web::Scraper;
use URI;
use Data::Dump;

# First, create your scraper block
my $scraper = scraper {
    # grab the text nodes from all elements with class type_firstname (that way you could also classify them by type)
    process ".type_firstname", "list[]" => 'TEXT';
};

my @names;
foreach my $page ( 1 .. 146) {
  # Fetch the page (add page number param)
  my $res = $scraper->scrape( URI->new("http://www.familiestyrelsen.dk/samliv/navne/soeginavnelister/godkendtefornavne/drengenavne/?tx_lfnamelists_pi2[gotopage]=" . $page) );
  # add them to our list of names
  push @names, $_ for @{ $res->{list} };
}

dd \@names;

It will give you a very long list with all the names. Running it may take some time. Try with 1..1 first.

Answer 2

In general, try using WWW::Mechanize::Firefox which will essentially remote-control Firefox.

For that particular page though, you can just use something as simple as HTTP::Tiny .

Just make POST requests to the URL and pass the parameter tx_lfnamelists_pi2[gotopage] from 1 to 146.

Example at http://hackst.com/#4sslc for page #30.

Moral of the story: always look in Chrome's Network tab and see what requests the web page makes.

Possible to dump AJAX content from webpage?

Question

2 answers

solution1
3 ACCPTED 2014-02-04 12:52:31

solution2
1 2014-02-04 12:38:22

Possible to dump AJAX content from webpage?

Question

2 answers

solution1 3 ACCPTED 2014-02-04 12:52:31

solution2 1 2014-02-04 12:38:22

solution1
3 ACCPTED 2014-02-04 12:52:31

solution2
1 2014-02-04 12:38:22