简体   繁体   中英

How to retrieve a directory of files from a remote server?

If I have a directory on a remote web server that allows directory browsing, how would I go about to fetch all those files listed there from my other web server? I know I can use urllib2.urlopen to fetch individual files, but how would I get a list of all the files in that remote directory?

If the webserver has directory browsing enabled, it will return a HTML document with links to all the files. You could parse the HTML document and extract all the links. This would give you the list of files.

You can use the HTMLParser class to extract the elements you're interested in. Something like this will work:

from HTMLParser import HTMLParser
import urllib

class AnchorParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
            if tag =='a':
                    for key, value in attrs.iteritems()):
                            if key == 'href':
                                    print value

parser = AnchorParser()
data = urllib.urlopen('http://somewhere').read()
parser.feed(data)

Why don't you use curl or wget to recursively download the given page, and limit it upto 1 level. You will save all the trouble of writing the script.

eg something like

wget -H -r --level=1 -k -p www.yourpage/dir

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM