简体   繁体   中英

Retrieve data from url without creating a file

I've got an issue with a little project I have, I'm trying to downlaod large amounts of data from a website to store them and work on them later, however I need to make small changes to them in order to make them work.

I'm currently using urllib.request.urlretrieve(url, folder) to download the data, and then I open it, make the necessary changes and save it again.

However, it feels like the write and read operation is unecessary, as I'm saving the data on my disk just to open it again, especially as I end up downloading a lot of data.

I tried using the request module that I don't know really well for this, but I ran into trouble as the data is initially compressed as gzip file.

download = requests.get("https://tickdata.fxcorporate.com/EURUSD/2015/1.csv.gz", stream=True) 
decoded_content = download.content.decode('gzip')

This doesn't work, as he neither recognizes gz or gzip as a valid encoding. I think that the data behind the gzip is in utf-8, but if I try to use utf-8 as an encoding parameter, it doesn't work either.

Would anyone have an idea on how to make it read the file?

Ps: I'm not sure if it's usefull for this issue, but this is the operation I do to the file when I've downloaded them:

pair = 'EUR_USD'

for year in range(2015,2016):
    for week in range(1,53):

        ref= 'E:\Finance_Data\\' + pair + '\Tick\\' + str(year) + '\\' + str(week) + '.csv.gz'
        dest = 'E:\Finance_Data\\' + pair + '\Tick\\' + str(year) + '\\' + str(week) + '_clean.csv'

        with gzip.open(ref, 'rb') as f:
            data = f.read()

        with gzip.open(dest, 'wb') as f:
            f.write(data.decode('utf-8').replace('\x00', '').encode('utf-8'))

Use with a package io.BytesIO

for example:

import requests
from io import BytesIO
import gzip

a = requests.get('https://tickdata.fxcorporate.com/EURUSD/2015/1.csv.gz', stream=True)
f = gzip.open(BytesIO(a.content), mode="rt")
print(f.read())
f.close()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM