I would like to read latest file from http folder
'releases' folder should be like 0001.tgz, 0002.tgz, 0003.tgz how to make 0003 will be select?
import urllib2
url = "http://example.com/releases"
html = urllib2.urlopen(url).read()
...
Thanks. Give me some example.
You can use BeautifulSoup
or lxml
to parse the directory index and find the latest file, which is presumably last in the index, based on your naming convention.
Something like this:
from bs4 import BeautifulSoup
import urllib2
url = "http://example.com/releases"
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)
last_link = soup.find_all('a', href=True)[-1]
latest_content = urllib2.urlopen(last_link['href']).read()
# do stuff
If that won't work, grab all of the links using find_all
and do some more careful parsing based on the filenames.
If the .tgz files are sequential, then count down from the maximum and stop the loop when you get to the first (newest) file.
import urllib2
for counter in xrange(9999,0,-1):
fyle = str(counter).zfill(4) # pad zeros
url = "http://example.com/releases/"+fyle+".tgz"
ret = urllib2.urlopen(url)
if ret.code == 200:
print "Exists:",fyle
break
html = urllib2.urlopen(url).read()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.