[英]How can I extract text from textboxes within a PDF in Python?
I'm not having any luck with pyPDF2
or PDFMiner
. 我没有pyPDF2
或PDFMiner
运气。 The tools always return _______________
for the textboxes even if they are filled in. Does anyone have any idea on how to extract the text within the textbox fields? 即使填写了文本,这些工具也总是返回_______________
。是否有人对如何提取文本框字段中的文本有任何想法?
You need to extract text fields, not a text. 您需要提取文本字段,而不是文本。 So you need something like this: 所以你需要这样的东西:
import sys
import six
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdftypes import resolve1
fp = open("c:\\tmp\\test.pdf", "rb")
parser = PDFParser(fp)
doc = PDFDocument(parser)
fields = resolve1(doc.catalog["AcroForm"])["Fields"]
for i in fields:
field = resolve1(i)
name, value = field.get("T"), field.get("V")
print ("{0}:{1}".format(name,value))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.