I'm building a project in AWS using Lambda Functions, and I need to convert some .xls(x) files into pdfs. I found that this kind of functionality is always attached with Microsoft Office libraries, so if I want to transform an Office's file, I need to be in Windows. There is some way to implement this functionality (Python3.x or NodeJS) without a third party vendor (in terms of price...)?
As a part of my working path, I tried to get the info with Python Pandas and Python xlrd looking to create the pdf file by my own, I tried with some Node and Python libraries too (but all depend by Windows). I saw the prices for some services too.
Any suggestion?
I'm still looking for help, but I found an approach that helps in just in part (it's not my solution but could help to somebody).
I'm using the libraries xhtml2pdf and Pandas. I get the xls(x) content with Pandas, then I export the file to HTML and finally I create a PDF from it.
The main problem is the structure, I lost the layout, colors, fonts, all the pretty view, however I kept the values from the cells.
from xhtml2pdf import pisa
import pandas as pd
xl = pd.ExcelFile("myExcelFile.xlsx")
df = xl.parse("sheet_name")
# Some cleaning
df.dropna(how="all") # If the row is completely NaN
df.dropna(how="all", axis="columns") # If the column is completely NaN
df.fillna("") # I delete the NaN values (just for beautify)
df.to_html('htmlFile.html', border=0)
with open("htmlFile.html", "r") as htmlFile:
with open("pdfFile.pdf", "w+b") as resultFile:
pisaStatus = pisa.CreatePDF(htmlFile, dest=resultFile)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.