简体   繁体   English

从 URL 中读取压缩的 Stata 文件到 Pandas

[英]Read a zipped Stata file from URL into pandas

Is it possible to read a .zip file that includes only a .dta file from URL?是否可以从 URL 读取仅包含.dta文件的 .zip 文件?

For example, https://www.federalreserve.gov/econres/files/scfp2016s.zip contains one file: rscfp2016.dta , but pandas.read_stata doesn't work for it:例如, https : rscfp2016.dta包含一个文件: rscfp2016.dta ,但pandas.read_stata不适用于它:

import pandas as pd
pd.read_stata('https://www.federalreserve.gov/econres/files/scfp2016s.zip')

ValueError: Version of given Stata file is not 104, 105, 108, 111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), or 118 (Stata 14) ValueError: 给定 Stata 文件的版本不是 104、105、108、111 (Stata 7SE)、113 (Stata 8/9)、114 (Stata 10/11)、115 (Stata 12)、117 (Stata 13) 或118 (Stata 14)

read_csv supports reading zipped files if the zip only includes the csv, via the compression argument which defaults to inferring the compression.如果 zip 仅包含 csv,则read_csv支持读取压缩文件,通过compression参数默认推断压缩。 read_stata lacks this option. read_stata缺少此选项。

I could do it by downloading and unzipping the file, then reading it, but this is messy.我可以通过下载和解压缩文件,然后阅读它来做到这一点,但这很麻烦。

!wget https://www.federalreserve.gov/econres/files/scfp2016s.zip
!unzip scfp2016s.zip
df = pd.read_stata('rscfp2016.dta')

Any better way?有什么更好的办法吗?

read_stata accepts file-like objects, so you can do this: read_stata接受类似文件的对象,因此您可以这样做:

import pandas as pd
from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen

url = 'https://www.federalreserve.gov/econres/files/scfp2016s.zip'
with urlopen(url) as request:
    data = BytesIO(request.read())

with ZipFile(data) as archive:
    with archive.open(archive.namelist()[0]) as stata:
        df = pd.read_stata(stata)

You can try it with requests:您可以尝试使用请求:

import io
import zipfile
import requests

response = requests.get('https://www.federalreserve.gov/econres/files/scfp2016s.zip')                                                                                                                                             
a = zipfile.ZipFile(io.BytesIO(response.content))
b = a.read(a.namelist()[0]) 
pd.read_stata(io.BytesIO(b)) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM