简体   繁体   中英

How to create Panda Dataframe from csv that is compressed in tar.gz?

How can I create the pandas DataFrame from csv file that's compressed in tar.gz? I found this code which does that but with zip file. What should I change in the following code to make it work with tar.gz without downloading the tar.gz and csv file.

import pandas, requests, zipfile, StringIO
r =requests.get('http://data.octo.dc.gov/feeds/crime_incidents/archive/crime_incidents_2013_CSV.zip')
z = zipfile.ZipFile(StringIO.StringIO(r.content))
df=pandas.read_csv(z.open('sample_CSV.csv'))

My file is https://ghtstorage.blob.core.windows.net/downloads/mysql-2016-06-16.tar.gz

Can you try below for extracting tar.gz as below :

import tarfile
tar = tarfile.open(fname, "r:gz")
tar.extractall()
tar.close()

Try simply supply your .tar.gz file as the file name
to read_csv and it will automatically decompress and open it,
since this is the default behavior for gz files.

Make sure the extension is in lower case.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM