wget grep sed to extract links and save them to a file

Question

I need to download all page links from http://en.wikipedia.org/wiki/Meme and save them to a file all with one command.

First time using the commmand line so I'm unsure of the exact commands, flags, etc to use. I only have a general idea of what to do and had to search around for what href means.

wget http://en.wikipedia.org/wiki/Meme -O links.txt | grep 'href=".*"' | sed -e 's/^.*href=".*".*$/\1/'

The output of the links in the file does not need to be in any specific format.

Answer 1

Using gnu grep:

grep -Po '(?<=href=")[^"]*' links.txt

or with wget

wget http://en.wikipedia.org/wiki/Meme -q -O - |grep -Po '(?<=href=")[^"]*'

Answer 2

You could use wget 's spider mode. See this SO answer for an example.

wget spider

Answer 3

wget http://en.wikipedia.org/wiki/Meme -O links.txt | sed -n 's/.*href="\([^"]*\)".*/\1/p'

but this only take 1 href per line, if there is more than 1, other are lost (same as your original line). You also forget to have a group ( \\( -> \\) ) in your orginal sed first pattern so \\1 refere to nothing

wget grep sed to extract links and save them to a file

Question

3 answers

solution1
4 2014-02-19 00:02:41

solution2
1 2014-02-19 00:43:03

solution3
0 2014-02-19 09:32:23

wget grep sed to extract links and save them to a file

Question

3 answers

solution1 4 2014-02-19 00:02:41

solution2 1 2014-02-19 00:43:03

solution3 0 2014-02-19 09:32:23

solution1
4 2014-02-19 00:02:41

solution2
1 2014-02-19 00:43:03

solution3
0 2014-02-19 09:32:23