简体   繁体   中英

How do I tell wget to download text files (in this case) which contain a specific string in the middle of the text files

I am in the middle of a software development class and am trying to practice "DRY" principles in all things software dev, so for practice, I want to get wget to download all the files in this (http://fusionplant.com/archive/textfiles/) directory which contain the word "offensive".

Here's an example of one of them: http://fusionplant.com/archive/textfiles/gnu_fortune/gnu_fortune_offensive_astrology

Are there any methods to accomplish this? I imagine they would use regular expressions, but I can't find any sufficiently comparable examples online to get it done.

here's a command I tried to use, it's wrong. Not even close, but here it is:

    wget -A '*offensive*.txt' http://fusionplant.com/archive/textfiles/gnu_fortune

It didn't return an error message, but just downloaded the index file

wget -A '*offensive*.txt' http://fusionplant.com/archive/textfiles/gnu_fortune
--2012-06-15 11:15:07--  http://fusionplant.com/archive/textfiles/gnu_fortune
Resolving fusionplant.com... 216.254.119.231
Connecting to fusionplant.com|216.254.119.231|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://fusionplant.com/archive/textfiles/gnu_fortune/ [following]
--2012-06-15 11:15:07--  http://fusionplant.com/archive/textfiles/gnu_fortune/
Reusing existing connection to fusionplant.com:80.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: “gnu_fortune”

[  <=>                                  ] 14,576      50.4K/s   in 0.3s    

2012-06-15 11:15:08 (50.4 KB/s) - “gnu_fortune” saved [14576]

You can't do it like this. You will have to download the files and then check whether the files contain the string. You can't send a request to the server for it to do this for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM