简体   繁体   English

如何使用Python 3(OSx)在Anaconda上安装Textract?

[英]How to install textract on anaconda using python 3 (OSx)?

I am trying to convert a PDF file into text readable for python 3. This is so I can find the most common words in the file for a wordcloud. 我正在尝试将PDF文件转换为python 3可读的文本。这是为了在wordcloud中找到文件中最常见的单词。

I have already tried using pip install textract, received the same error message below. 我已经尝试使用pip install textract,在下面收到了相同的错误消息。 I am now trying conda install and still receiving the same error message. 我现在正在尝试conda安装,仍然收到相同的错误消息。

! pip install PyPDF2 # convert text-based PDF file to text readable by python
! conda config --add channels conda-forge
! conda install textract # convert non-trivial, scanned PDF file into text readable by python
! pip install nltk # clean and convert phrases into keywords
! pip install regex # find keywords

import PyPDF2
import textract
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

print ("Libraries have been imported.")

The error I am receiving is: "ModuleNotFoundError: No module named 'textract' ". 我收到的错误是:“ ModuleNotFoundError:没有名为'textract'的模块”。

This might be a workaround. 这可能是一种解决方法。

1.Uninstall Anaconda and re-install it. 1.卸载Anaconda,然后重新安装。

2.Do not create any python 2.7 environment in anaconda and re-install textract using pip along with all the other dependencies in the base anaconda command prompt. 2.不要在anaconda中创建任何python 2.7环境,并在基本anaconda命令提示符下使用pip以及所有其他依赖项重新安装textract。

3.Try importing textract 3.尝试导入textract

or 要么

1.Open terminal 1.打开终端

python -m venv env 
source ./env/bin/activate
sudo apt update
sudo apt install python-pip && pip install --upgrade pip
sudo apt install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev swig
pip install textract

if you face any more errors: 如果您遇到其他错误:

pip install https://pypi.python.org/packages/ce/c7/ab6cd0d00ddf8dc3b537cfb922f3f049f8018f38c88d71fd164f3acb8416/SpeechRecognition-3.6.3-py2.py3-none-any.whl
sudo apt install libpulse-dev
pip install textract

and try importing textract 并尝试导入textract

See here 这里

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM