[英]Create empty Corpus in textacy
I want to create an empty corpus in textacy and later on fill it up with data via我想在文本中创建一个空的语料库,然后通过以下方式用数据填充它
corpus.add(doc)
But everytime I try to create an empty corpus I am not able to save it and instead I get this error:但是每次我尝试创建一个空的语料库时,我都无法保存它,而是收到此错误:
IndexError: list index out of range
I tried both not giving any data when creating the corpus or giving None as data:我尝试在创建语料库时不提供任何数据或提供 None 作为数据:
corpus = textacy.Corpus(lang=locale)
corpus = textacy.Corpus(lang=locale, data=None)
corpus.save(path) # this line results in the index error
It would be nice if anybody could help me :)如果有人可以帮助我,那就太好了:)
I have just tried this out myself.我自己刚刚试过这个。 What is locale
exactly?什么是locale
? I have performed the following:我执行了以下操作:
nlp = spacy.load("de_core_news_lg")
corpus = textacy.Corpus(nlp)
After that I was able to iterate through my documents and add them item per item.之后,我能够遍历我的文档并为每个项目添加项目。
However, I would not recommend doing this.但是,我不建议这样做。 I have performed two scenarios to process 15k short comments:我已经执行了两个场景来处理 15k 条简短评论:
textacy.Corpus(nlp, data=preprocessed_list)
.我首先将我的文档作为列表进行预处理,并将其直接放入textacy.Corpus(nlp, data=preprocessed_list)
。 That took me around 22 s
.我花了大约22 s
。1 min 26 s
.执行相同的逻辑,但通过创建一个空的语料库并将每个项目添加到其中持续了1 min 26 s
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.