1) How should I read the data from all the csv files in the tar.gz file on website and write them to the CSVs on a folder in the most memory and space efficient way? 2) How can I loop it to go over all the CSVs in the tar.gz file? 3) Since the CSV files are huge, how can I loop it to read and write, let's say, 1 million rows at a time?
I have gone only so far using the codes on other stackoverflow answers!
import pandas as pd
import urllib2
import tarfile
url='https://ghtstorage.blob.core.windows.net/downloads/mysql-2016-08-01.tar.gz'
r=urllib2.Request(url)
o=urllib2.urlopen(r)
thetarfile=tarfile.open(o, mode='r:gz')
thetarfile.close()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.