I have been trying topic modelling using gensim in Python. I have the following dataset:
Docs
"Sugar is bad to consume. My sister likes to have sugar, but not my father."
"My father spends a lot of time driving my sister around to dance practice."
"Doctors suggest that driving may cause increased stress and blood pressure."
"Sometimes I feel pressure to perform well at school, but my father never seems to drive my sister to do better."
"Health experts say that Sugar is not good for your lifestyle."
I tried to lemmatise it as follows:
texts = map(gensim.utils.lemmatize,Docs)
and run LDA:
dictionary = gensim.corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(doc) for doc in texts]
Lda = gensim.models.ldamodel.LdaModel
ldamodel = Lda(corpus, num_topics=3, id2word = dictionary, passes=50)
ldamodel.print_topics()
However I am getting an error. Do you know how to fix it?
thanks
Error:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-15-b36df3b5374b> in <module>
----> 1 import pattern
2
3 dictionary = gensim.corpora.Dictionary(Docs)
4 corpus = [dictionary.doc2bow(doc) for doc in Docs]
5 Lda = gensim.models.ldamodel.LdaModel
ModuleNotFoundError: No module named 'pattern'
The whole error message:
---> 3 dictionary = gensim.corpora.Dictionary(Docs)
4 corpus = [dictionary.doc2bow(doc) for doc in Docs]
5 Lda = gensim.models.ldamodel.LdaModel
/anaconda3/lib/python3.7/site-packages/gensim/corpora/dictionary.py in __init__(self, documents, prune_at)
82
83 if documents is not None:
---> 84 self.add_documents(documents, prune_at=prune_at)
85
86 def __getitem__(self, tokenid):
/anaconda3/lib/python3.7/site-packages/gensim/corpora/dictionary.py in add_documents(self, documents, prune_at)
195
196 """
--> 197 for docno, document in enumerate(documents):
198 # log progress & run a regular check for pruning, once every 10k docs
199 if docno % 10000 == 0:
/anaconda3/lib/python3.7/site-packages/gensim/utils.py in lemmatize(content, allowed_tags, light, stopwords, min_length, max_length)
1676 if not has_pattern():
1677 raise ImportError(
-> 1678 "Pattern library is not installed. Pattern library is needed in order to use lemmatize function"
1679 )
1680 from pattern.en import parse
ImportError: Pattern library is not installed. Pattern library is needed in order to use lemmatize function
Try installing pattern package. This needs to be present.
pip install pattern
Gensim utils.py uses this validation function:
def has_pattern():
"""Check whether the `pattern <https://github.com/clips/pattern>`_ package is installed.
Returns
-------
bool
Is `pattern` installed?
"""
try:
from pattern.en import parse # noqa:F401
return True
except ImportError:
return False
I did notice this package isn't validated during pip install gensim
which is not clear.
Collecting gensim
Using cached https://files.pythonhosted.org/packages/70/cf/87b25b265d23498b2b70ce873495cf7ef91394c4baff240210e26f3bc18a/gensim-3.8.3-cp37-cp37m-macosx_10_9_x86_64.whl
Requirement already satisfied: numpy>=1.11.3 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from gensim) (1.17.2)
Requirement already satisfied: scipy>=0.18.1 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from gensim) (1.3.1)
Requirement already satisfied: six>=1.5.0 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from gensim) (1.12.0)
Collecting smart-open>=1.8.1 (from gensim)
Collecting boto3 (from smart-open>=1.8.1->gensim)
Using cached https://files.pythonhosted.org/packages/c4/24/b9facc760789cf844880c178b64d26d9f4a0ef06af3e99473f38fba94657/boto3-1.14.56-py2.py3-none-any.whl
Requirement already satisfied: requests in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from smart-open>=1.8.1->gensim) (2.22.0)
Requirement already satisfied: boto in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from smart-open>=1.8.1->gensim) (2.49.0)
Collecting jmespath<1.0.0,>=0.7.1 (from boto3->smart-open>=1.8.1->gensim)
Using cached https://files.pythonhosted.org/packages/07/cb/5f001272b6faeb23c1c9e0acc04d48eaaf5c862c17709d20e3469c6e0139/jmespath-0.10.0-py2.py3-none-any.whl
Collecting s3transfer<0.4.0,>=0.3.0 (from boto3->smart-open>=1.8.1->gensim)
Using cached https://files.pythonhosted.org/packages/69/79/e6afb3d8b0b4e96cefbdc690f741d7dd24547ff1f94240c997a26fa908d3/s3transfer-0.3.3-py2.py3-none-any.whl
Collecting botocore<1.18.0,>=1.17.56 (from boto3->smart-open>=1.8.1->gensim)
Using cached https://files.pythonhosted.org/packages/b1/82/499909b818bddde1a4fc1228389d9d29cc2ede766a2a7370aed033dd07f9/botocore-1.17.56-py2.py3-none-any.whl
Requirement already satisfied: certifi>=2017.4.17 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (2019.9.11)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (1.24.2)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (3.0.4)
Requirement already satisfied: idna<2.9,>=2.5 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (2.8)
Requirement already satisfied: docutils<0.16,>=0.10 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from botocore<1.18.0,>=1.17.56->boto3->smart-open>=1.8.1->gensim) (0.15.2)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from botocore<1.18.0,>=1.17.56->boto3->smart-open>=1.8.1->gensim) (2.8.0)
Installing collected packages: jmespath, botocore, s3transfer, boto3, smart-open, gensim
Successfully installed boto3-1.14.56 botocore-1.17.56 gensim-3.8.3 jmespath-0.10.0 s3transfer-0.3.3 smart-open-2.1.1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.