简体   繁体   English

Python LDA gensim“弃用警告:无效的转义序列”

[英]Python LDA gensim "DeprecationWarning: invalid escape sequence"

I am new to stackoverflow and python so please bear with me.我是stackoverflow和python的新手,所以请多多包涵。 I am trying to run an Latent Dirichlet Analysis on a text corpora with the gensim package in python using PyCharm editor.我正在尝试使用 PyCharm 编辑器在 python 中使用 gensim 包对文本语料库运行潜在狄利克雷分析。 I prepared the corpora in R and exported it to a csv file using this R command:我在 R 中准备了语​​料库,并使用以下 R 命令将其导出到 csv 文件:

write.csv(testdf, "C://...//test.csv", fileEncoding = "utf-8") 

Which creates the following csv structure (though with much longer and already preprocessed texts):它创建了以下 csv 结构(尽管具有更长且已经预处理的文本):

,"datetimestamp","id","origin","text"
1,"1960-01-01","id_1","Newspaper1","Test text one"
2,"1960-01-02","id_2","Newspaper1","Another text"
3,"1960-01-03","id_3","Newspaper1","Yet another text"
4,"1960-01-04","id_4","Newspaper2","Four Five Six"
5,"1960-01-05","id_5","Newspaper2","Alpha Bravo Charly"
6,"1960-01-06","id_6","Newspaper2","Singing Dancing Laughing"

I then try the following essential python code (based on the gensim tutorials ) to perform simple LDA analysis:然后我尝试使用以下基本 Python 代码(基于gensim 教程)来执行简单的 LDA 分析:

import gensim
from gensim import corpora, models, similarities, parsing
import pandas as pd
from six import iteritems
import os
import pyLDAvis.gensim

class MyCorpus(object):
     def __iter__(self):
             for row in pd.read_csv('//mpifg.local/dfs/home/lu/Meine Daten/Imagined Futures and Greek State Bonds/Topic Modelling/Python/test.csv', index_col=False, header = 0 ,encoding='utf-8')['text']:
                 # assume there's one document per line, tokens separated by whitespace
                 yield dictionary.doc2bow(row.split())

if __name__ == '__main__':
    dictionary = corpora.Dictionary(row.split() for row in pd.read_csv(
        '//.../test.csv', index_col=False, encoding='utf-8')['text'])
    print(dictionary)
    dictionary.save(
        '//.../greekdict.dict')  # store the dictionary, for future reference

    ## create an mmCorpus
    corpora.MmCorpus.serialize('//.../greekcorpus.mm', MyCorpus())
    corpus = corpora.MmCorpus('//.../greekcorpus.mm')

    dictionary = corpora.Dictionary.load('//.../greekdict.dict')
    corpus = corpora.MmCorpus('//.../greekcorpus.mm')

    # train model
    lda = gensim.models.ldamodel.LdaModel(corpus=corpus, id2word=dictionary, num_topics=50, iterations=1000)

I get the following error codes and the code exits:我收到以下错误代码并且代码退出:

...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:832: DeprecationWarning: invalid escape sequence \d ...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:832: DeprecationWarning: invalid escape sequence \d

\...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:2736: DeprecationWarning: invalid escape sequence \d \...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:2736: DeprecationWarning: invalid escape sequence \d

\...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:2914: DeprecationWarning: invalid escape sequence \g \...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:2914: DeprecationWarning: invalid escape sequence \g

\...\Python\venv\lib\site-packages\pyLDAvis_prepare.py:387: DeprecationWarning: .ix is deprecated. \...\Python\venv\lib\site-packages\pyLDAvis_prepare.py:387: DeprecationWarning: .ix 已弃用。 Please use .loc for label based indexing or .iloc for positional indexing请使用 .loc 进行基于标签的索引或使用 .iloc 进行位置索引

I cannot find any solution and to be honest neither have any clue where exactly the problem comes from.我找不到任何解决方案,老实说,我也不知道问题到底出在哪里。 I spent hours making sure that the encoding of the csv is utf-8 and exported (from R) and imported (in python) correctly.我花了几个小时确保 csv 的编码是 utf-8 并正确导出(从 R)和导入(在 python 中)。

What am I doing wrong or where else could I look at?我做错了什么或者我还能在哪里看? Cheers!干杯!

DeprecationWarining is exactly that - warning about a feature being deprecated which is supposed to prompt the user to use some other functionality instead to maintain the compatibility in the future. DeprecationWarining正是这样 - 警告有关不推荐使用的功能,该功能应该提示用户使用其他一些功能,而不是在将来保持兼容性。 So in your case I would just watch for the update of libraries that you use.因此,在您的情况下,我只会关注您使用的库的更新。

Starting with the last warning it look like it is originating from pandas and has been logged against pyLDAvis here .从最后一个警告开始,它看起来像是来自pandas并已在此处针对pyLDAvis进行记录。

The remaining ones come from pyparsing module but it does not seem that you are importing it explicitly.其余的来自pyparsing模块,但您似乎没有明确导入它。 Maybe one of the libraries you use has a dependency and uses some relatively old and deprecated functionality.也许您使用的某个库具有依赖关系,并且使用了一些相对较旧且已弃用的功能。 To eradicate the warning for the start I would check if upgrading does not help.为了消除一开始的警告,我会检查升级是否没有帮助。 Good luck!祝你好运!

import warnings   
warnings.filterwarnings("ignore")  
pyLDAvis.enable_notebook()

Try using this尝试使用这个

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM