简体   繁体   中英

Export dataframe in pyspark to excel file given the 'openpyxl' module is not installed

I am trying to write my spark dataframes in an excel file to generate desired reports by changing them in pandas dataframe and then using

panda_df = df.toPandas()
writer = pd.ExcelWriter(filename)
panda_df.to_excel(writer,'Sheet1', startcol = 0, startrow = 0)

this gives an error saying

File "/usr/lib64/python2.6/site-packages/pandas/io/excel.py", line 350, in __init__
from openpyxl.workbook import Workbook
ImportError: No module named openpyxl.workbook

I am running this on a remote server and hence do not have admin rights to use sudo apt-get as it says "Sudo: apt-get: command not found" and I have also tried using pip to no usage as it is not installed either. Is there any other way I can write my dataframes in excel?

You can proceed as follows.

You can clone the library from it's source repository here:

git clone https://bitbucket.org/openpyxl/openpyxl

Go into the openpyxl directory, then run the following to install it for your user without admin permission:

python setup.py install --user

Then, you can add the path to the openpyxl to your code as follows:

import sys
sys.path.append('/path/to/openpyxl/folder')

panda_df = df.toPandas()
writer = pd.ExcelWriter(filename)
panda_df.to_excel(writer,'Sheet1', startcol = 0, startrow = 0)

Alternatively, you can use the Spark2 datasource of the HadoopOffice library (supports also Python). You can read/write Excel files that encrypted, linked to other workbooks, have metadata etc. Furthermore, it has a low footprint mode, which enables you quickly writing of larger Excel files without requiring large memory amounts or CPUs: https://github.com/ZuInnoTe/spark-hadoopoffice-ds

The datasource is based on the HadoopOffice library enabling virtually any Hadoop application to read/write Excel files, because it has corresponding Hadoop FileInputFormats and FileOutputFormats: https://github.com/ZuInnoTe/hadoopoffice

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM