简体   繁体   中英

Topic modelling with gensim

I have been trying topic modelling using gensim in Python. I have the following dataset:

Docs

"Sugar is bad to consume. My sister likes to have sugar, but not my father."
"My father spends a lot of time driving my sister around to dance practice."
"Doctors suggest that driving may cause increased stress and blood pressure."
"Sometimes I feel pressure to perform well at school, but my father never seems to drive my sister to do better."
"Health experts say that Sugar is not good for your lifestyle."

I tried to lemmatise it as follows:

texts = map(gensim.utils.lemmatize,Docs)

and run LDA:

dictionary = gensim.corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(doc) for doc in texts]
Lda = gensim.models.ldamodel.LdaModel
ldamodel = Lda(corpus, num_topics=3, id2word = dictionary, passes=50)
ldamodel.print_topics()

However I am getting an error. Do you know how to fix it?

thanks

Error:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-15-b36df3b5374b> in <module>
----> 1 import pattern
      2 
      3 dictionary = gensim.corpora.Dictionary(Docs)
      4 corpus = [dictionary.doc2bow(doc) for doc in Docs]
      5 Lda = gensim.models.ldamodel.LdaModel

ModuleNotFoundError: No module named 'pattern'

The whole error message:

---> 3 dictionary = gensim.corpora.Dictionary(Docs)
      4 corpus = [dictionary.doc2bow(doc) for doc in Docs]
      5 Lda = gensim.models.ldamodel.LdaModel

/anaconda3/lib/python3.7/site-packages/gensim/corpora/dictionary.py in __init__(self, documents, prune_at)
     82 
     83         if documents is not None:
---> 84             self.add_documents(documents, prune_at=prune_at)
     85 
     86     def __getitem__(self, tokenid):

/anaconda3/lib/python3.7/site-packages/gensim/corpora/dictionary.py in add_documents(self, documents, prune_at)
    195 
    196         """
--> 197         for docno, document in enumerate(documents):
    198             # log progress & run a regular check for pruning, once every 10k docs
    199             if docno % 10000 == 0:

/anaconda3/lib/python3.7/site-packages/gensim/utils.py in lemmatize(content, allowed_tags, light, stopwords, min_length, max_length)
   1676     if not has_pattern():
   1677         raise ImportError(
-> 1678             "Pattern library is not installed. Pattern library is needed in order to use lemmatize function"
   1679         )
   1680     from pattern.en import parse

ImportError: Pattern library is not installed. Pattern library is needed in order to use lemmatize function

Try installing pattern package. This needs to be present.

pip install pattern

Gensim utils.py uses this validation function:

def has_pattern():
    """Check whether the `pattern <https://github.com/clips/pattern>`_ package is installed.
    Returns
    -------
    bool
        Is `pattern` installed?
    """
    try:
        from pattern.en import parse  # noqa:F401
        return True
    except ImportError:
        return False

I did notice this package isn't validated during pip install gensim which is not clear.

Collecting gensim
  Using cached https://files.pythonhosted.org/packages/70/cf/87b25b265d23498b2b70ce873495cf7ef91394c4baff240210e26f3bc18a/gensim-3.8.3-cp37-cp37m-macosx_10_9_x86_64.whl
Requirement already satisfied: numpy>=1.11.3 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from gensim) (1.17.2)
Requirement already satisfied: scipy>=0.18.1 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from gensim) (1.3.1)
Requirement already satisfied: six>=1.5.0 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from gensim) (1.12.0)
Collecting smart-open>=1.8.1 (from gensim)
Collecting boto3 (from smart-open>=1.8.1->gensim)
  Using cached https://files.pythonhosted.org/packages/c4/24/b9facc760789cf844880c178b64d26d9f4a0ef06af3e99473f38fba94657/boto3-1.14.56-py2.py3-none-any.whl
Requirement already satisfied: requests in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from smart-open>=1.8.1->gensim) (2.22.0)
Requirement already satisfied: boto in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from smart-open>=1.8.1->gensim) (2.49.0)
Collecting jmespath<1.0.0,>=0.7.1 (from boto3->smart-open>=1.8.1->gensim)
  Using cached https://files.pythonhosted.org/packages/07/cb/5f001272b6faeb23c1c9e0acc04d48eaaf5c862c17709d20e3469c6e0139/jmespath-0.10.0-py2.py3-none-any.whl
Collecting s3transfer<0.4.0,>=0.3.0 (from boto3->smart-open>=1.8.1->gensim)
  Using cached https://files.pythonhosted.org/packages/69/79/e6afb3d8b0b4e96cefbdc690f741d7dd24547ff1f94240c997a26fa908d3/s3transfer-0.3.3-py2.py3-none-any.whl
Collecting botocore<1.18.0,>=1.17.56 (from boto3->smart-open>=1.8.1->gensim)
  Using cached https://files.pythonhosted.org/packages/b1/82/499909b818bddde1a4fc1228389d9d29cc2ede766a2a7370aed033dd07f9/botocore-1.17.56-py2.py3-none-any.whl
Requirement already satisfied: certifi>=2017.4.17 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (2019.9.11)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (1.24.2)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (3.0.4)
Requirement already satisfied: idna<2.9,>=2.5 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from requests->smart-open>=1.8.1->gensim) (2.8)
Requirement already satisfied: docutils<0.16,>=0.10 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from botocore<1.18.0,>=1.17.56->boto3->smart-open>=1.8.1->gensim) (0.15.2)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /Users/username/opt/anaconda3/lib/python3.7/site-packages (from botocore<1.18.0,>=1.17.56->boto3->smart-open>=1.8.1->gensim) (2.8.0)
Installing collected packages: jmespath, botocore, s3transfer, boto3, smart-open, gensim
Successfully installed boto3-1.14.56 botocore-1.17.56 gensim-3.8.3 jmespath-0.10.0 s3transfer-0.3.3 smart-open-2.1.1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM