[英]Counting Specific Word Frequency In R
I have a data set where I have split text from the journal abstracts to print 1 word per column.我有一个数据集,其中我将期刊摘要中的文本拆分为每列打印 1 个单词。 This has lead to over 5 million rows, but I just want certain the word counts of certain words.这导致超过 500 万行,但我只想确定某些单词的字数。 Below is an example of the data:以下是数据示例:
So in that example let's say I want just the rna counts, I would get 3 and that's it.所以在那个例子中,假设我只想要 rna 计数,我会得到 3,就是这样。 I have done that word count on the whole set but this is not as useful to me.我已经完成了整个系列的字数计算,但这对我来说没有那么有用。
wordCount <- m3 %>% count(word, sort = TRUE) wordCount <- m3 %>% count(word, sort = TRUE)
Since many of the words aren't helpful for what I am trying to get to.由于许多单词对我想要达到的目标没有帮助。
Any help would be welcome.欢迎任何帮助。
You can group_by
the word and count occurrences of each unique word and then subset the ones you want.您可以按单词group_by
并计算每个唯一单词的出现次数,然后对您想要的单词进行子集化。
library(tidyverse)
data <- data.frame(word = c("rna",
"synthesis",
"resembles",
"copy",
"choice",
"rna",
"recombination",
"process",
"nascent",
"rna"))
counts <- data %>%
group_by(word) %>%
count()
counts[which(counts$word == "rna"),]
# A tibble: 1 x 2
# Groups: word [1]
word n
<fct> <int>
1 rna 3
or using dplyr subsetting:或使用 dplyr 子集:
counts %>% filter(word == "rna")
# A tibble: 1 x 2
# Groups: word [1]
word n
<fct> <int>
1 rna 3
Piping it all through at once:一次性全部完成:
data %>%
group_by(word) %>%
count() %>%
filter(word == "rna")
A one liner with data.table solution: data.table 解决方案:
library(data.table)
setDT(data)
data[word == "rna", .N, by = word]
word N
1: rna 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.