R：如何计算语料库中的令牌总数？

Question

I have created a Quanteda corpus called readtext_corpus with 190 types of text.我创建了一个名为 readtext_corpus 的 Quanteda 语料库，其中包含 190 种文本。 I would like to count the total number of tokens or words in the corpus.我想计算语料库中标记或单词的总数。 I tried the function ntoken which gives a number of words per text not the total number of words for all 190 texts.我尝试了 function ntoken，它给出了每个文本的单词数，而不是所有 190 个文本的总单词数。

Answer 1

you can just use the sum() function which is really simple.你可以只使用 sum() function 这真的很简单。 I left an example:我留下了一个例子：

test <- c("testing string number 1","testing string number 2")

sum(quanteda::ntoken(test))

Result:结果：

> quanteda::ntoken(test)
text1 text2 
    4     4 
> sum(quanteda::ntoken(test))
[1] 8
>

In case you are using pipes, which is pretty common with quanteda如果您使用的是管道，这在 quanteda 中很常见

> quanteda::ntoken(test) %>% sum()
[1] 8

R：如何计算语料库中的令牌总数？

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-02-01 00:26:47

R：如何计算语料库中的令牌总数？

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-02-01 00:26:47

解决方案1
2 已采纳 2022-02-01 00:26:47