简体   繁体   English

从url检索数据而不创建文件

[英]Retrieve data from url without creating a file

I've got an issue with a little project I have, I'm trying to downlaod large amounts of data from a website to store them and work on them later, however I need to make small changes to them in order to make them work. 我有一个小项目时遇到了问题,我正在尝试从网站上下载大量数据以存储它们并稍后再使用它们,但是我需要对其进行一些细微改动才能使其正常工作。

I'm currently using urllib.request.urlretrieve(url, folder) to download the data, and then I open it, make the necessary changes and save it again. 我当前正在使用urllib.request.urlretrieve(url, folder)下载数据,然后打开它,进行必要的更改并再次保存。

However, it feels like the write and read operation is unecessary, as I'm saving the data on my disk just to open it again, especially as I end up downloading a lot of data. 但是,感觉像是不必要的读写操作,因为我只是将数据保存在磁盘上以再次打开它,尤其是当我最终下载大量数据时。

I tried using the request module that I don't know really well for this, but I ran into trouble as the data is initially compressed as gzip file. 我尝试使用对此不太了解的请求模块,但是由于数据最初被压缩为gzip文件,因此遇到了麻烦。

download = requests.get("https://tickdata.fxcorporate.com/EURUSD/2015/1.csv.gz", stream=True) 
decoded_content = download.content.decode('gzip')

This doesn't work, as he neither recognizes gz or gzip as a valid encoding. 这行不通,因为他也不认为gzgzip是有效的编码。 I think that the data behind the gzip is in utf-8, but if I try to use utf-8 as an encoding parameter, it doesn't work either. 我认为gzip背后的gzip位于utf-8中,但是如果我尝试将utf-8用作编码参数,那么它也不起作用。

Would anyone have an idea on how to make it read the file? 有人对如何使其读取文件有想法吗?

Ps: I'm not sure if it's usefull for this issue, but this is the operation I do to the file when I've downloaded them: 附:我不确定它是否对这个问题有用,但这是我下载文件后对文件执行的操作:

pair = 'EUR_USD'

for year in range(2015,2016):
    for week in range(1,53):

        ref= 'E:\Finance_Data\\' + pair + '\Tick\\' + str(year) + '\\' + str(week) + '.csv.gz'
        dest = 'E:\Finance_Data\\' + pair + '\Tick\\' + str(year) + '\\' + str(week) + '_clean.csv'

        with gzip.open(ref, 'rb') as f:
            data = f.read()

        with gzip.open(dest, 'wb') as f:
            f.write(data.decode('utf-8').replace('\x00', '').encode('utf-8'))

Use with a package io.BytesIO 与软件包io.BytesIO一起使用

for example: 例如:

import requests
from io import BytesIO
import gzip

a = requests.get('https://tickdata.fxcorporate.com/EURUSD/2015/1.csv.gz', stream=True)
f = gzip.open(BytesIO(a.content), mode="rt")
print(f.read())
f.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM