简体   繁体   中英

How to convert .xls(x) files into PDF in Python or NodeJS (without Windows)?

I'm building a project in AWS using Lambda Functions, and I need to convert some .xls(x) files into pdfs. I found that this kind of functionality is always attached with Microsoft Office libraries, so if I want to transform an Office's file, I need to be in Windows. There is some way to implement this functionality (Python3.x or NodeJS) without a third party vendor (in terms of price...)?

As a part of my working path, I tried to get the info with Python Pandas and Python xlrd looking to create the pdf file by my own, I tried with some Node and Python libraries too (but all depend by Windows). I saw the prices for some services too.

Any suggestion?

I'm still looking for help, but I found an approach that helps in just in part (it's not my solution but could help to somebody).

I'm using the libraries xhtml2pdf and Pandas. I get the xls(x) content with Pandas, then I export the file to HTML and finally I create a PDF from it.

The main problem is the structure, I lost the layout, colors, fonts, all the pretty view, however I kept the values from the cells.

from xhtml2pdf import pisa
import pandas as pd

xl = pd.ExcelFile("myExcelFile.xlsx")
df = xl.parse("sheet_name")

# Some cleaning
df.dropna(how="all") # If the row is completely NaN
df.dropna(how="all", axis="columns") # If the column is completely NaN
df.fillna("") # I delete the NaN values (just for beautify)

df.to_html('htmlFile.html', border=0)

with open("htmlFile.html", "r") as htmlFile:
    with open("pdfFile.pdf", "w+b") as resultFile:
        pisaStatus = pisa.CreatePDF(htmlFile, dest=resultFile)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM