I tried lots of suggestion but i can't find a solution (I don't know if it's possible) I use terminal of Ubuntu 15.04
I'd need to download in a text file all of internal and external links from mywebsite.com/links_ (all links start with links_) For example http://www.mywebsite.com/links_sony.aspx I don't need all other links ex. mywebsite.com/index.aspx or conditions.asp etc. I use wget --spider --recursive --no-verbose --output-file="links.csv" http://www.mywebsite.com
Can you help me please? Thanks in advance
If you don't mind using a couple of other tools to coax wget, then you can try this bash script that employs awk, grep, wget and lynx:
#! /bin/bash
lynx --dump $1 | awk '/http/{print $2}' | grep $2 > /tmp/urls.txt
for i in $( cat /tmp/urls.txt ); do wget $i; done
Save the above script as getlinks and then run it as
./getlinks 'http://www.mywebsite.com' 'links_' > mycollection.txt
This approach does not load/need too many other tools; instead reuses commonly available tools.
You may have to play with quoting depending what shell you are using. The above works in standard bash and is not dependent on specific versions of these tools.
You could customize the part
do wget $1
with appropriate switches to meet your specific needs, such as recursive, spider, verbosity, etc. Insert those switches between wget and $1.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.