[英]Using PDF2Image in Code Repository on Palantir Foundry
I am trying to use the library pdf2image in a Code Repository on Palantir Foundry and getting the error我正在尝试在 Palantir Foundry 的代码存储库中使用库 pdf2image 并收到错误
pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. pdf2image.exceptions.PDFInfoNotInstalledError:无法获取页数。 Is poppler installed and in PATH?
poppler 是否已安装并位于 PATH 中?
when using the function convert_from_bytes.使用 function convert_from_bytes 时。
Does anyone know how to reference the poppler path and get rid of this error?有谁知道如何引用 poppler 路径并消除此错误?
Thanks!谢谢!
Here is the code:这是代码:
def extract_pdf_text(input_bytes, language='eng', dpi=200):
pages = convert_from_bytes(input_bytes, dpi)
pdf_pages = ''
for page_index, page in enumerate(pages):
pdf_page = pytesseract.image_to_string(page, lang=language)
pdf_pages = pdf_pages + pdf_page
return pdf_pages
And the meta.yaml for reference:和 meta.yaml 供参考:
# If you need to modify the runtime requirements for your package,
# update the 'requirements.run' section in this file
package:
name: "{{ PACKAGE_NAME }}"
version: "{{ PACKAGE_VERSION }}"
source:
path: ../src
requirements:
# Tools required to build the package. These packages are run on the build system and include
# things such as revision control systems (Git, SVN) make tools (GNU make, Autotool, CMake) and
# compilers (real cross, pseudo-cross, or native when not cross-compiling), and any source pre-processors.
# https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#build
build:
- python 3.8.*
- setuptools
# Packages required to run the package. These are the dependencies that are installed automatically
# whenever the package is installed.
# https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#run
run:
- python 3.8.*
- transforms {{ PYTHON_TRANSFORMS_VERSION }}
- transforms-expectations
- transforms-verbs
- pytesseract
- pdfplumber
- googletrans
- regex
- pdf2image
- langdetect
- pandas
- numpy
- selenium
- requests
- pypdf2
- poppler
build:
script: python setup.py install --single-version-externally-managed --record=record.txt
I found the problem when inspecting the CI-Checks.我在检查 CI-Checks 时发现了问题。 They failed before poppler was pulled.
他们在 poppler 被拉出之前就失败了。 After I cleaned up meta.yaml and the checks succeded everything seems to work fine.
在我清理了 meta.yaml 并且检查成功之后,一切似乎都运行良好。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.