Im downloading big database of phishing/virus sites from clean-mx
phishing database = http://support.clean-mx.de/clean-mx/xmlphishing.php
virus database = http://support.clean-mx.de/clean-mx/xmlviruses.php
Now the problem is that those xml files size is about +30Mb for each, and It takes about 1 minute to download them and I need to download them faster... I download them by using urllib.urlretrieve
.
I need those files to build xml database that contains the urls inside those databases, I've tried to read them, hoping that it should be faster than donwload them, with urllib.urlopen
but it even slower then download them.
Do you have an Idea to use those files (download or read) to build my database with faster performance?
Note: Just need to use those files, I already write code that build my database fast
I tried downloading the virus XML via Firefox in OS X and Linux (running in a VM), and using the excellent requests
module (which I much prefer over urllib
), and all methods took a very long time to download the 47M file - in fact, some processes froze or crashed. I have a 60 Mbit/s internet connection, and downloading a similar-sized file from an unthrottled server would only usually take 10-15s. So, I suspect that your results won't improve much, as it seems to be a server issue. I'd recommend contacting the owners of the website and seeing if they'd be willing to work with you to diagnose the connectivity issues.
EDIT
OK, this is weird. I restarted my Linux VM and ran the following in Terminal:
import requests
url = "http://support.clean-mx.de/clean-mx/xmlviruses.php?"
r = requests.get(url).content
print(r)
The download finished in less than 15s. So, I'm not sure at all what's going on...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.