ContentTypeError: 0, message='Attempt to decode JSON with unexpected mimetype: application/octet-stream', url=URL('https://api-reports-prod-usamazon.s3.amazonaws.com/atvpdr-a2vzay-report-data-7aaf8bfa-5cb5-4c76-b430-01d68cd7024b.json.gz?X
When i try to get the gzip
file located in a S3 Bucket
through Python - asyncio method i'm getting above Error.
Synchronous Code [Working]
report = requests.get(location, headers=headers)
data = json.loads(gzip.decompress(report.content))
Asynchronous Code [Not Working]
async def get_data(session, url):
async with session.get(url,headers=headers) as resp:
data = await resp.json()
return data
async def main(req_url):
async with aiohttp.ClientSession() as session:
tasks = []
url = req_url
tasks.append(asyncio.ensure_future(get_data(session, url)))
data = await asyncio.gather(*tasks)
start_time1 = time.time()
nest_asyncio.apply()
keyword_list = asyncio.run(main(location))
print("--- %s seconds ---" % (time.time() - start_time1))
Thanks in Advance.
Tried
async def get_data(session, url):
async with session.get(url,headers=headers) as resp:
data = json.loads(gzip.decompress(resp.content))
return data
Which is throwing an error
Traceback (most recent call last):
File "<ipython-input-397-2f3527a7a82e>", line 20, in <module>
keyword_list = asyncio.run(main(location))
File "C:\Users\anaconda3\lib\site-packages\nest_asyncio.py", line 32, in run
return loop.run_until_complete(future)
File "C:\Users\anaconda3\lib\site-packages\nest_asyncio.py", line 70, in run_until_complete
return f.result()
File "C:\Users\anaconda3\lib\asyncio\futures.py", line 178, in result
raise self._exception
File "C:\Users\anaconda3\lib\asyncio\tasks.py", line 280, in __step
result = coro.send(None)
File "<ipython-input-397-2f3527a7a82e>", line 15, in main
data = await asyncio.gather(*tasks)
File "C:\Users\anaconda3\lib\asyncio\tasks.py", line 349, in __wakeup
future.result()
File "C:\Users\anaconda3\lib\asyncio\tasks.py", line 280, in __step
result = coro.send(None)
File "<ipython-input-397-2f3527a7a82e>", line 3, in get_data
data = json.loads(gzip.decompress(resp.content))
File "C:\Users\anaconda3\lib\gzip.py", line 547, in decompress
with GzipFile(fileobj=io.BytesIO(data)) as f:
TypeError: a bytes-like object is required, not 'StreamReader'
'''
If you use synchronous code
report = requests.get(location, headers=headers)
data = json.loads(gzip.decompress(report.content))
then you should do similar in asynchronous code
After digging I found that it needs await resp.read() instead of resp.content
async with session.get(url,headers=headers) as resp:
data = json.loads(gzip.decompress(await resp.read()))
return data
You forgot return data
in main()
I don't have access to gzip file with JSON data so I tested it on JSON from https://httpbin.org/get
import asyncio
import aiohttp
import time
# --- functions ---
async def get_data(session, url):
async with session.get(url, headers=headers) as resp:
#return await resp.json()
#return json.loads(gzip.decompress(await resp.read()))
return await resp.read()
async def main(url):
async with aiohttp.ClientSession() as session:
tasks = asyncio.ensure_future(get_data(session, url))
data = await asyncio.gather(tasks)
return data
# --- main ---
headers = {}
location = 'https://httpbin.org/get'
start_time = time.time()
keyword_list = asyncio.run(main(location))
print(keyword_list)
end_time = time.time()
diff_time = end_time - start_time
print("---", diff_time, "seconds ---")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.