How to enable 'wget' to download the whole content of HTML with Javascript

Question

I have a site which I want to download using Unix wget . If you look at the source code and content of the file it contain section called SUMMARY. However after issuing a wget command like this:

wget   -O downdloadedtext.txt  http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?db=mouse&c=gene&a=fiche&l=2610008E11Rik

The content of the downdloadedtext.txt is incomplete and different with the source code of that site. For example it doesn't contain SUMMARY section. Is there a correct way to obtain the full content correctly?

The reason I ask this because I want to automate the download from different values in that HTML.

Answer 1

You need to put the link inside quotes:

 wget -O downdloadedtext.txt  'http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?db=mouse&c=gene&a=fiche&l=2610008E11Rik'

This is because the & has a special meaning and will split the command into multiple commands.

Answer 2

The & character has special meaning in shells. Quote the URI so you actually request the URI you want to request.

Answer 3

You can use the -p ( --page-prerequisites ) flag to tell wget to retrieve linked resources. From man wget :

This option causes Wget to download all the files that are necessary to properly display a given HTML page. This includes such things as inlined images, sounds, and referenced stylesheets.

You might also look at the --follow-tags option, which lets you limit that process:

Wget has an internal table of HTML tag / attribute pairs that it considers when looking for linked documents during a recursive retrieval. If a user wants only a subset of those tags to be considered, however, he or she should be specify such tags in a comma-separated list with this option.

How to enable 'wget' to download the whole content of HTML with Javascript

Question

3 answers

solution1
11 ACCPTED 2010-04-14 09:46:39

solution2
3 2010-04-14 09:42:01

solution3
2 2010-04-14 09:43:52

How to enable 'wget' to download the whole content of HTML with Javascript

Question

3 answers

solution1 11 ACCPTED 2010-04-14 09:46:39

solution2 3 2010-04-14 09:42:01

solution3 2 2010-04-14 09:43:52

solution1
11 ACCPTED 2010-04-14 09:46:39

solution2
3 2010-04-14 09:42:01

solution3
2 2010-04-14 09:43:52