简体   繁体   English

将大熊猫数据框保存到excel

[英]Save large pandas dataframe to excel

I'm generating a large dataframe (1.5 GB when saved in CSV format) and need to store it an worksheet of an Excel file along with a second (much smaller) dataframe which is saved in a separate worksheet.我正在生成一个大数据框(以 CSV 格式保存时为 1.5 GB),并且需要将其存储为 Excel 文件的工作表以及保存在单独工作表中的第二个(小得多)数据框。

print('Reading temporaty files for variable {}:'.format(Var))
print(' Reading stations')
s=pd.read_csv(StatFile,sep=':',dtype={'ID': 'str'},encoding='utf-8')
print(' Reading data')
d=pd.read_csv(DataFile,sep=':',dtype='str',encoding='utf-8').transpose()
d.columns = d.iloc[0]
d=d[1:].astype('float')
d.reindex_axis(sorted(d.columns), axis=1)
print('Writing out Excel file for variable {}'.format(Var))
writer = pd.ExcelWriter(Path + Var + '.xlsx', engine='xlsxwriter')
d.to_excel(writer, sheet_name='Data')
OutStatCol=['ID','Name','Longitude','Latitude','GRS','OriginalVariable','VariableUnits','URL','JsonNode']
s.to_excel(writer, columns=OutStatCol, index=False, sheet_name='Stations')
writer.save()

My code works fine for smaller dataframes, but with the large ones I get the following error:我的代码适用于较小的数据帧,但对于较大的数据帧,我收到以下错误:

Traceback (most recent call last):
  File "./Test2.py", line 29, in <module>
    writer.save()
  File "/home/user/miniconda2/lib/python2.7/site-packages/pandas/io/excel.py", line 1413, in save
    return self.book.close()
  File "/home/user/miniconda2/lib/python2.7/site-packages/xlsxwriter/workbook.py", line 297, in close
    self._store_workbook()
  File "/home/user/miniconda2/lib/python2.7/site-packages/xlsxwriter/workbook.py", line 624, in _store_workbook
    xlsx_file.write(os_filename, xml_filename)
  File "/home/user/miniconda2/lib/python2.7/zipfile.py", line 1148, in write
    self._writecheck(zinfo)
  File "/home/user/miniconda2/lib/python2.7/zipfile.py", line 1114, in _writecheck
    " would require ZIP64 extensions")
zipfile.LargeZipFile: Filesize would require ZIP64 extensions

Is there any way I can specify something like allowZip64=True in the ExcelWriter declaration or in the to_excel() method?有什么方法可以在 ExcelWriter 声明或 to_excel() 方法中指定类似allowZip64=True的内容吗?

Thanks!谢谢!

This took some source code digging, but...这需要挖掘一些源代码,但是......

print('Reading temporaty files for variable {}:'.format(Var))
print(' Reading stations')
s=pd.read_csv(StatFile,sep=':',dtype={'ID': 'str'},encoding='utf-8')
print(' Reading data')
d=pd.read_csv(DataFile,sep=':',dtype='str',encoding='utf-8').transpose()
d.columns = d.iloc[0]
d=d[1:].astype('float')
d.reindex_axis(sorted(d.columns), axis=1)
print('Writing out Excel file for variable {}'.format(Var))
writer = pd.ExcelWriter(Path + Var + '.xlsx', engine='xlsxwriter')

#THIS
writer.book.use_zip64()

d.to_excel(writer, sheet_name='Data')
OutStatCol=['ID','Name','Longitude','Latitude','GRS','OriginalVariable','VariableUnits','URL','JsonNode']
s.to_excel(writer, columns=OutStatCol, index=False, sheet_name='Stations')
writer.save()

should work应该工作

figuring out that the writer didn't inherit from workbook took me longer than it should have.弄清楚writer没有从工作簿中继承,我花了比应有的时间更长的时间。 writer.book is directly a workbook instance... d'oh writer.book直接是一个工作簿实例......天啊

我刚刚在函数.to_excel()添加了engine='xlsxwriter'并解决了这个问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM