[英]Topic modelling with gensim
我一直在嘗試使用 Python 中的 gensim 進行主題建模。 我有以下數據集:
文檔
"Sugar is bad to consume. My sister likes to have sugar, but not my father."
"My father spends a lot of time driving my sister around to dance practice."
"Doctors suggest that driving may cause increased stress and blood pressure."
"Sometimes I feel pressure to perform well at school, but my father never seems to drive my sister to do better."
"Health experts say that Sugar is not good for your lifestyle."
我嘗試將其詞形還原如下:
texts = map(gensim.utils.lemmatize,Docs)
並運行 LDA:
dictionary = gensim.corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(doc) for doc in texts]
Lda = gensim.models.ldamodel.LdaModel
ldamodel = Lda(corpus, num_topics=3, id2word = dictionary, passes=50)
ldamodel.print_topics()
但是我收到一個錯誤。 你知道如何解決嗎?
謝謝
錯誤:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-15-b36df3b5374b> in <module>
----> 1 import pattern
2
3 dictionary = gensim.corpora.Dictionary(Docs)
4 corpus = [dictionary.doc2bow(doc) for doc in Docs]
5 Lda = gensim.models.ldamodel.LdaModel
ModuleNotFoundError: No module named 'pattern'
整個錯誤信息:
---> 3 dictionary = gensim.corpora.Dictionary(Docs)
4 corpus = [dictionary.doc2bow(doc) for doc in Docs]
5 Lda = gensim.models.ldamodel.LdaModel
/anaconda3/lib/python3.7/site-packages/gensim/corpora/dictionary.py in __init__(self, documents, prune_at)
82
83 if documents is not None:
---> 84 self.add_documents(documents, prune_at=prune_at)
85
86 def __getitem__(self, tokenid):
/anaconda3/lib/python3.7/site-packages/gensim/corpora/dictionary.py in add_documents(self, documents, prune_at)
195
196 """
--> 197 for docno, document in enumerate(documents):
198 # log progress & run a regular check for pruning, once every 10k docs
199 if docno % 10000 == 0:
/anaconda3/lib/python3.7/site-packages/gensim/utils.py in lemmatize(content, allowed_tags, light, stopwords, min_length, max_length)
1676 if not has_pattern():
1677 raise ImportError(
-> 1678 "Pattern library is not installed. Pattern library is needed in order to use lemmatize function"
1679 )
1680 from pattern.en import parse
ImportError: Pattern library is not installed. Pattern library is needed in order to use lemmatize function
嘗試安裝模式包。 這需要存在。
pip install pattern
Gensim utils.py 使用這個驗證函數:
def has_pattern():
"""Check whether the `pattern <https://github.com/clips/pattern>`_ package is installed.
Returns
-------
bool
Is `pattern` installed?
"""
try:
from pattern.en import parse # noqa:F401
return True
except ImportError:
return False
我確實注意到在pip install gensim
期間沒有驗證這個包,這不清楚。
Collecting gensim
Using cached https://files.pythonhosted.org/packages/70/cf/87b25b265d23498b2b70ce873495cf7ef91394c4baff240210e26f3bc18a/gensim-3.8.3-cp37-cp37m-macosx_10_9_x86_64.whl
Requirement already satisfied: numpy>=1.11.3 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from gensim) (1.17.2)
Requirement already satisfied: scipy>=0.18.1 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from gensim) (1.3.1)
Requirement already satisfied: six>=1.5.0 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from gensim) (1.12.0)
Collecting smart-open>=1.8.1 (from gensim)
Collecting boto3 (from smart-open>=1.8.1->gensim)
Using cached https://files.pythonhosted.org/packages/c4/24/b9facc760789cf844880c178b64d26d9f4a0ef06af3e99473f38fba94657/boto3-1.14.56-py2.py3-none-any.whl
Requirement already satisfied: requests in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from smart-open>=1.8.1->gensim) (2.22.0)
Requirement already satisfied: boto in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from smart-open>=1.8.1->gensim) (2.49.0)
Collecting jmespath<1.0.0,>=0.7.1 (from boto3->smart-open>=1.8.1->gensim)
Using cached https://files.pythonhosted.org/packages/07/cb/5f001272b6faeb23c1c9e0acc04d48eaaf5c862c17709d20e3469c6e0139/jmespath-0.10.0-py2.py3-none-any.whl
Collecting s3transfer<0.4.0,>=0.3.0 (from boto3->smart-open>=1.8.1->gensim)
Using cached https://files.pythonhosted.org/packages/69/79/e6afb3d8b0b4e96cefbdc690f741d7dd24547ff1f94240c997a26fa908d3/s3transfer-0.3.3-py2.py3-none-any.whl
Collecting botocore<1.18.0,>=1.17.56 (from boto3->smart-open>=1.8.1->gensim)
Using cached https://files.pythonhosted.org/packages/b1/82/499909b818bddde1a4fc1228389d9d29cc2ede766a2a7370aed033dd07f9/botocore-1.17.56-py2.py3-none-any.whl
Requirement already satisfied: certifi>=2017.4.17 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (2019.9.11)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (1.24.2)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (3.0.4)
Requirement already satisfied: idna<2.9,>=2.5 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (2.8)
Requirement already satisfied: docutils<0.16,>=0.10 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from botocore<1.18.0,>=1.17.56->boto3->smart-open>=1.8.1->gensim) (0.15.2)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from botocore<1.18.0,>=1.17.56->boto3->smart-open>=1.8.1->gensim) (2.8.0)
Installing collected packages: jmespath, botocore, s3transfer, boto3, smart-open, gensim
Successfully installed boto3-1.14.56 botocore-1.17.56 gensim-3.8.3 jmespath-0.10.0 s3transfer-0.3.3 smart-open-2.1.1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.