简体   繁体   中英

How to ignore specific type of files to download in wget?

How do I ignore .jpg , .png files in wget as I wanted to include only .html files.

I am trying:

wget  -R index.html,*tiff,*pdf,*jpg -m http://example.com/

but it's not working.

Use the

 --reject jpg,png  --accept html

options to exclude/include files with certain extensions, see http://www.gnu.org/software/wget/manual/wget.html#Recursive-Accept_002fReject-Options .

Put patterns with wildcard characters in quotes, otherwise your shell will expand them, see http://www.gnu.org/software/wget/manual/wget.html#Types-of-Files

# -r : recursive    
# -nH : Disable generation of host-prefixed directories
# -nd : all files will get saved to the current directory
# -np : Do not ever ascend to the parent directory when retrieving recursively. 
# -R : don't download files with this files pattern
# -A : get only *.html files (for this case)

For instance:

wget -r -nH -nd -np -A "*.html" -R "*.gz, *.tar"  http://www1.ncdc.noaa.gov/pub/data/noaa/1990/

Worked example to download all files excluding archives:

wget -r -k -l 7 -E -nc \
 -R "*.gz, *.tar, *.tgz, *.zip, *.pdf, *.tif, *.bz, *.bz2, *.rar, *.7z" \
 -erobots=off \
 --user-agent="Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" \
 http://misis.ru/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM