在 textacy 中创建空语料库

Question

I want to create an empty corpus in textacy and later on fill it up with data via我想在文本中创建一个空的语料库，然后通过以下方式用数据填充它

corpus.add(doc)

But everytime I try to create an empty corpus I am not able to save it and instead I get this error:但是每次我尝试创建一个空的语料库时，我都无法保存它，而是收到此错误：

IndexError: list index out of range

I tried both not giving any data when creating the corpus or giving None as data:我尝试在创建语料库时不提供任何数据或提供 None 作为数据：

corpus = textacy.Corpus(lang=locale)
corpus = textacy.Corpus(lang=locale, data=None)
corpus.save(path) # this line results in the index error

It would be nice if anybody could help me :)如果有人可以帮助我，那就太好了:)

Answer 1

I have just tried this out myself.我自己刚刚试过这个。 What is locale exactly?什么是locale ？ I have performed the following:我执行了以下操作：

created spacy language object for german language with为德语创建了 spacy 语言对象

nlp = spacy.load("de_core_news_lg")

and then passed it to然后将其传递给

corpus = textacy.Corpus(nlp)

After that I was able to iterate through my documents and add them item per item.之后，我能够遍历我的文档并为每个项目添加项目。

However, I would not recommend doing this.但是，我不建议这样做。 I have performed two scenarios to process 15k short comments:我已经执行了两个场景来处理 15k 条简短评论：

I first preprocessed my documents as a list and put it directly into textacy.Corpus(nlp, data=preprocessed_list) .我首先将我的文档作为列表进行预处理，并将其直接放入textacy.Corpus(nlp, data=preprocessed_list) 。 That took me around 22 s .我花了大约22 s 。
Performing the same logic, but by creating an empty corpus and adding each one item to it lasted 1 min 26 s .执行相同的逻辑，但通过创建一个空的语料库并将每个项目添加到其中持续了1 min 26 s 。

在 textacy 中创建空语料库

问题描述

1 个解决方案

解决方案1
0 2020-10-26 12:06:01

在 textacy 中创建空语料库

问题描述

1 个解决方案

解决方案1 0 2020-10-26 12:06:01

解决方案1
0 2020-10-26 12:06:01