简体   繁体   中英

fast download xml file from server python

Im downloading big database of phishing/virus sites from clean-mx

phishing database = http://support.clean-mx.de/clean-mx/xmlphishing.php

virus database = http://support.clean-mx.de/clean-mx/xmlviruses.php

Now the problem is that those xml files size is about +30Mb for each, and It takes about 1 minute to download them and I need to download them faster... I download them by using urllib.urlretrieve .

I need those files to build xml database that contains the urls inside those databases, I've tried to read them, hoping that it should be faster than donwload them, with urllib.urlopen but it even slower then download them.

Do you have an Idea to use those files (download or read) to build my database with faster performance?

Note: Just need to use those files, I already write code that build my database fast

I tried downloading the virus XML via Firefox in OS X and Linux (running in a VM), and using the excellent requests module (which I much prefer over urllib ), and all methods took a very long time to download the 47M file - in fact, some processes froze or crashed. I have a 60 Mbit/s internet connection, and downloading a similar-sized file from an unthrottled server would only usually take 10-15s. So, I suspect that your results won't improve much, as it seems to be a server issue. I'd recommend contacting the owners of the website and seeing if they'd be willing to work with you to diagnose the connectivity issues.

EDIT

OK, this is weird. I restarted my Linux VM and ran the following in Terminal:

import requests
url = "http://support.clean-mx.de/clean-mx/xmlviruses.php?"
r = requests.get(url).content
print(r)

The download finished in less than 15s. So, I'm not sure at all what's going on...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM