I'm trying to downloads bounding box files (stored as gzipped tar archives) from image-net.org. When I print(resp.read())
, rather than a stream of bytes representing the archive, I get the HTML b'<meta http-equiv="refresh" content="0;url=/downloads/bbox/bbox/[wnid].tar.gz" />\\n
where [wnid]
refers to a particular wordnet identification string. This leads to the error tarfile.ReadError: file could not be opened successfully
. Any thoughts on what exactly is the issue and/or how to fix it? Code is below ( images
is a pandas
data frame).
def get_boxes(images, nthreads=1000):
def parse_xml(xml):
return 0
def read_tar(data, wnid):
bytes = io.BytesIO(data)
tar = tarfile.open(fileobj=bytes)
return 0
async def fetch_boxes(wnid, client):
url = ('http://www.image-net.org/api/download/imagenet.bbox.'
'synset?wnid={}').format(wnid)
async with client.get(url) as resp:
res = await loop.run_in_executor(executor, read_tar,
await resp.read(), wnid)
return res
async def main():
async with aiohttp.ClientSession(loop=loop) as client:
tasks = [asyncio.ensure_future(fetch_boxes(wnid, client))
for wnid in images['wnid'].unique()]
return await asyncio.gather(*tasks)
loop = asyncio.get_event_loop()
executor = ThreadPoolExecutor(nthreads)
shapes, boxes = zip(*loop.run_until_complete(main()))
return pd.concat(shapes, axis=0), pd.concat(boxes, axis=0)
EDIT: I understand now that this is a meta refresh used as a redirect. Would this be considered a "bug" in `aiohttp?
This is ok.
Some services have redirects from user-friendly web-page to a zip-file. Sometimes it is implemented using HTTP status (301 or 302, see example below) or using page with meta tag that contains redirect like in your example.
HTTP/1.1 302 Found
Location: http://www.iana.org/domains/example/
aiohttp
can handle first case - automatically (when allow_redirects = True
by default).
But in the second case library retrieves simple HTML and can't handle that automatically.
I run into the same problem \\n when I tried to download using wget from the same url as you did http://www.image-net.org/api/download/imagenet.bbox.synset?wnid=n01729322
but it works if you input this directly www.image-net.org/downloads/bbox/bbox/n01729322.tar.gz
ps. n01729322 is the wnid
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.