简体   繁体   中英

Python ASYNCIO ContentTypeError: 0, message='Attempt to decode JSON with unexpected mimetype

ContentTypeError: 0, message='Attempt to decode JSON with unexpected mimetype: application/octet-stream', url=URL('https://api-reports-prod-usamazon.s3.amazonaws.com/atvpdr-a2vzay-report-data-7aaf8bfa-5cb5-4c76-b430-01d68cd7024b.json.gz?X

When i try to get the gzip file located in a S3 Bucket through Python - asyncio method i'm getting above Error.

Synchronous Code [Working]

report = requests.get(location, headers=headers)
data = json.loads(gzip.decompress(report.content))

Asynchronous Code [Not Working]

async def get_data(session, url):
    async with session.get(url,headers=headers) as resp:
        data = await resp.json()
        return data 
  
async def main(req_url):
    async with aiohttp.ClientSession() as session:
        tasks = []
        url = req_url
        tasks.append(asyncio.ensure_future(get_data(session, url)))
        data = await asyncio.gather(*tasks)

start_time1 = time.time()
nest_asyncio.apply()
keyword_list = asyncio.run(main(location))
print("--- %s seconds ---" % (time.time() - start_time1))

Thanks in Advance.

Tried

async def get_data(session, url):
    async with session.get(url,headers=headers) as resp:
        data = json.loads(gzip.decompress(resp.content))
        return data 

Which is throwing an error

Traceback (most recent call last):

  File "<ipython-input-397-2f3527a7a82e>", line 20, in <module>
    keyword_list = asyncio.run(main(location))

  File "C:\Users\anaconda3\lib\site-packages\nest_asyncio.py", line 32, in run
    return loop.run_until_complete(future)

  File "C:\Users\anaconda3\lib\site-packages\nest_asyncio.py", line 70, in run_until_complete
    return f.result()

  File "C:\Users\anaconda3\lib\asyncio\futures.py", line 178, in result
    raise self._exception

  File "C:\Users\anaconda3\lib\asyncio\tasks.py", line 280, in __step
    result = coro.send(None)

  File "<ipython-input-397-2f3527a7a82e>", line 15, in main
    data = await asyncio.gather(*tasks)

  File "C:\Users\anaconda3\lib\asyncio\tasks.py", line 349, in __wakeup
    future.result()

  File "C:\Users\anaconda3\lib\asyncio\tasks.py", line 280, in __step
    result = coro.send(None)

  File "<ipython-input-397-2f3527a7a82e>", line 3, in get_data
    data = json.loads(gzip.decompress(resp.content))

  File "C:\Users\anaconda3\lib\gzip.py", line 547, in decompress
    with GzipFile(fileobj=io.BytesIO(data)) as f:

TypeError: a bytes-like object is required, not 'StreamReader'

'''

If you use synchronous code

report = requests.get(location, headers=headers)
data = json.loads(gzip.decompress(report.content))

then you should do similar in asynchronous code

After digging I found that it needs await resp.read() instead of resp.content

async with session.get(url,headers=headers) as resp:
    data = json.loads(gzip.decompress(await resp.read()))
    return data 

You forgot return data in main()


I don't have access to gzip file with JSON data so I tested it on JSON from https://httpbin.org/get

import asyncio
import aiohttp
import time

# --- functions ---

async def get_data(session, url):
    async with session.get(url, headers=headers) as resp:
        #return await resp.json()
        #return json.loads(gzip.decompress(await resp.read()))
        return await resp.read()

async def main(url):
    async with aiohttp.ClientSession() as session:
        tasks = asyncio.ensure_future(get_data(session, url))
        data = await asyncio.gather(tasks)
    return data
    
# --- main ---

headers = {}

location = 'https://httpbin.org/get'

start_time = time.time()

keyword_list = asyncio.run(main(location))
print(keyword_list)

end_time = time.time()

diff_time = end_time - start_time

print("---", diff_time, "seconds ---")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM