简体   繁体   中英

Using regEx to download the entire directory using wget

I want to download multiple pdfs from urls such as this - https://dummy.site.com/aabbcc/xyz/2017/09/15/2194812/O7ca217a71ac444eda516d8f78c29091a.pdf

If I do wget on complete URL then it downloads the file wget https://dummy.site.com/aabbcc/xyz/2017/09/15/2194812/O7ca217a71ac444eda516d8f78c29091a.pdf

But if I try to recursively download the entire folder then it returns 403(forbidden access)

wget -r https://dummy.site.com/aabbcc/xyz/

I have tried by setting user agent, rejecting robots.txt and bunch of other solutions from the internet, but I'm coming back to same point.

So I want to form the list of all possible URLs considering the given URL as common pattern, and have no idea how to do that.

I just know that I can pass that file as input to wget which will download the files recursively. So seeking the help for forming the URL list using regEx here. Thank You!

You can't download using wildcard the files you can't see. If the host do not support directory listing you have no idea what the filenames/paths are. Also as you do not know the algorithm to generate filenames you can't generate and get them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM