I've got an issue with a little project I have, I'm trying to downlaod large amounts of data from a website to store them and work on them later, however I need to make small changes to them in order to make them work.
I'm currently using urllib.request.urlretrieve(url, folder)
to download the data, and then I open it, make the necessary changes and save it again.
However, it feels like the write and read operation is unecessary, as I'm saving the data on my disk just to open it again, especially as I end up downloading a lot of data.
I tried using the request module that I don't know really well for this, but I ran into trouble as the data is initially compressed as gzip file.
download = requests.get("https://tickdata.fxcorporate.com/EURUSD/2015/1.csv.gz", stream=True)
decoded_content = download.content.decode('gzip')
This doesn't work, as he neither recognizes gz
or gzip
as a valid encoding. I think that the data behind the gzip
is in utf-8, but if I try to use utf-8
as an encoding parameter, it doesn't work either.
Would anyone have an idea on how to make it read the file?
Ps: I'm not sure if it's usefull for this issue, but this is the operation I do to the file when I've downloaded them:
pair = 'EUR_USD'
for year in range(2015,2016):
for week in range(1,53):
ref= 'E:\Finance_Data\\' + pair + '\Tick\\' + str(year) + '\\' + str(week) + '.csv.gz'
dest = 'E:\Finance_Data\\' + pair + '\Tick\\' + str(year) + '\\' + str(week) + '_clean.csv'
with gzip.open(ref, 'rb') as f:
data = f.read()
with gzip.open(dest, 'wb') as f:
f.write(data.decode('utf-8').replace('\x00', '').encode('utf-8'))
Use with a package io.BytesIO
for example:
import requests
from io import BytesIO
import gzip
a = requests.get('https://tickdata.fxcorporate.com/EURUSD/2015/1.csv.gz', stream=True)
f = gzip.open(BytesIO(a.content), mode="rt")
print(f.read())
f.close()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.