是否可以从数据集中生成 pdf 并保存到代工厂

Question

FPDF is a library that allows to convert a pandas dataframe to nicely formatted pdf reports. FPDF是一个允许将 pandas dataframe 转换为格式良好的 pdf 报告的库。 Is there a feature in foundry code repo or code workbook to write pdf files into foundry from a spark or pandas dataframe?代工厂代码仓库或代码工作簿中是否有将 pdf 文件从 spark 或 pandas dataframe 写入代工厂的功能？

i have a requirement to create a nicely formatted pdf report from a foundry dataset filtered to few rows.我需要从过滤到几行的铸造数据集中创建格式良好的 pdf 报告。

Answer 1

While I'm not familiar with the FPDF library specifically, Foundry supports generating files from datasets in transforms or Code Workbooks.虽然我并不特别熟悉 FPDF 库，但 Foundry 支持从转换或代码工作簿中的数据集生成文件。

To create a single Pandas-based PDF from your dataset, convert your dataset to Pandas and get an output file handle from Foundry such as.要从数据集中创建单个基于 Pandas 的 PDF，请将数据集转换为 Pandas 并从 Foundry 获取 output 文件句柄，例如。 In Code Workbooks,在代码工作簿中，

def pdf_dataset(input_df):
    output = Transforms.get_output()
    pd = input_df.toPandas()
    output_fs = output.filesystem()
        with output_fs.open(output_file_path, "wb") as f:
            # use FDPF as needed

Alternatively, you can create a PDF per-row in parallel via Spark.或者，您可以通过 Spark 并行创建每行 PDF。 This can be done most easily by structuring your data such that the parameters needed to generate each PDF are colocated in rows and from there you can run a Python function on to generate the PDF and write it out of Python memory to the destination dataset. This can be done most easily by structuring your data such that the parameters needed to generate each PDF are colocated in rows and from there you can run a Python function on to generate the PDF and write it out of Python memory to the destination dataset.

In a Code Workbook this would resemble在代码工作簿中，这类似于

def pdf_dataset(input_df):
    output = Transforms.get_output()

    def generate_pdf(row):
        output_fs = output.filesystem()
        with output_fs.open(output_file_path, "wb") as f:
            # use FDPF as needed
            
    input_df.rdd.foreach(generate_pdf)

是否可以从数据集中生成 pdf 并保存到代工厂

问题描述

1 个解决方案

解决方案1
0 2022-01-12 19:10:51

是否可以从数据集中生成 pdf 并保存到代工厂

问题描述

1 个解决方案

解决方案1 0 2022-01-12 19:10:51

解决方案1
0 2022-01-12 19:10:51