如何在Python中使用PDF从文本框中提取文本？

Question

I'm not having any luck with pyPDF2 or PDFMiner . 我没有pyPDF2或PDFMiner运气。 The tools always return _______________ for the textboxes even if they are filled in. Does anyone have any idea on how to extract the text within the textbox fields? 即使填写了文本，这些工具也总是返回_______________ 。是否有人对如何提取文本框字段中的文本有任何想法？

Answer 1

You need to extract text fields, not a text. 您需要提取文本字段，而不是文本。 So you need something like this: 所以你需要这样的东西：

import sys
import six
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdftypes import resolve1

fp = open("c:\\tmp\\test.pdf", "rb")

parser = PDFParser(fp)
doc = PDFDocument(parser)
fields = resolve1(doc.catalog["AcroForm"])["Fields"]
for i in fields:
    field = resolve1(i)
    name, value = field.get("T"), field.get("V")
    print ("{0}:{1}".format(name,value))

如何在Python中使用PDF从文本框中提取文本？

问题描述

1 个解决方案

解决方案1
0 2018-05-25 08:10:09

如何在Python中使用PDF从文本框中提取文本？

问题描述

1 个解决方案

解决方案1 0 2018-05-25 08:10:09

解决方案1
0 2018-05-25 08:10:09