简体   繁体   English

Python:如何替换pdf中的文本

[英]Python: How to replace text in pdf

I have a pdf file and i want to replace some text in pdf file and generate new pdf.我有一个 pdf 文件,我想替换 pdf 文件中的一些文本并生成新的 pdf。 How can i do that in python?我怎么能在python中做到这一点? I have tried reportlab , reportlab does not have any fucntion to search text and replace it.我试过 reportlab ,reportlab 没有任何功能来搜索文本和替换它。 What other module can i use?我可以使用什么其他模块?

You can try Aspose.PDF Cloud SDK for Python , Aspose.PDF Cloud is a REST API PDF Processing solution.你可以试试Aspose.PDF Cloud SDK for Python ,Aspose.PDF Cloud 是一个 REST API PDF 处理解决方案。 It is paid API and its free package plan provides 50 credits per month.它是付费 API,其免费套餐计划每月提供 50 个积分。

I'm developer evangelist at Aspose.我是 Aspose 的开发人员布道者。

import os
import asposepdfcloud
from asposepdfcloud.apis.pdf_api import PdfApi

# Get App key and App SID from https://cloud.aspose.com
pdf_api_client = asposepdfcloud.api_client.ApiClient(
    app_key='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
    app_sid='xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxx')

pdf_api = PdfApi(pdf_api_client)
filename = '02_pages.pdf'
remote_name = '02_pages.pdf'
copied_file= '02_pages_new.pdf'
#upload PDF file to storage
pdf_api.upload_file(remote_name,filename)

#upload PDF file to storage
pdf_api.copy_file(remote_name,copied_file)

#Replace Text
text_replace = asposepdfcloud.models.TextReplace(old_value='origami',new_value='polygami',regex='true')
text_replace_list = asposepdfcloud.models.TextReplaceListRequest(text_replaces=[text_replace])

response = pdf_api.post_document_text_replace(copied_file, text_replace_list)
print(response)

Have a look in THIS thread for one of the many ways to read text from a PDF.查看此线程,了解从 PDF 中读取文本的多种方法之一。 Then you'll need to create a new pdf, as they will, as far as I know, not retrieve any formatting for you.然后你需要创建一个新的 pdf,据我所知,他们不会为你检索任何格式。

The CAM::PDF Perl Library can output text that's not too hard to parse (it seems to fairly randomly split lines of text). CAM::PDF Perl 库可以输出不太难解析的文本(它似乎相当随机地分割文本行)。 I couldn't be bothered to learn too much Perl, so I wrote these really basic Perl command line scripts, one that reads a single page pdf to a text file perl read.pl pdfIn.pdf textOut.txt and one that writes the text (that you can modify in the meantime) to a pdf perl write.pl pdfIn.pdf textIn.txt pdfOut.pdf .我懒得学太多 Perl,所以我写了这些非常基本的 Perl 命令行脚本,一个将单页 pdf 读取到文本文件perl read.pl pdfIn.pdf textOut.txt和一个写入文本的perl read.pl pdfIn.pdf textOut.txt (您可以同时修改)到 pdf perl write.pl pdfIn.pdf textIn.txt pdfOut.pdf

#!/usr/bin/perl
use Module::Load;
load "CAM::PDF";

$pdfIn = $ARGV[0];
$textOut = $ARGV[1];

$pdf = CAM::PDF->new($pdfIn);
$page = $pdf->getPageContent(1);

open(my $fh, '>', $textOut);
print $fh $page;
close $fh;

exit;

and

#!/usr/bin/perl
use Module::Load;
load "CAM::PDF";

$pdfIn = $ARGV[0];
$textIn = $ARGV[1];
$pdfOut = $ARGV[2];

$pdf = CAM::PDF->new($pdfIn);

my $page;
   open(my $fh, '<', $textIn) or die "cannot open file $filename";
   {
       local $/;
       $page = <$fh>;
   }
close($fh);

$pdf->setPageContent(1, $page);

$pdf->cleanoutput($pdfOut);

exit;

You can call these with python either side of doing some regex etc stuff on the outputted text file.您可以在输出的文本文件上执行一些正则表达式等内容的任一侧使用 python 调用这些。

If you're completely new to Perl (like I was), you need to make sure that Perl and CPAN are installed, then run sudo cpan , then in the prompt install "CAM::PDF";如果您完全不熟悉 Perl(就像我一样),您需要确保安装了 Perl 和 CPAN,然后运行sudo cpan ,然后在提示中install "CAM::PDF"; , this will install the required modules. ,这将安装所需的模块。

Also, I realise that I should probably be using stdout etc, but I was in a hurry :-)另外,我意识到我可能应该使用 stdout 等,但我很着急:-)

Also also, any ideas what the format CAM-PDF outputs is?另外,任何想法CAM-PDF输出的格式是什么? is there any doc for it?有什么文档吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM