如何提取R中单词子集的单词频率？

Question

I have a dataframe with about 10,000 words in one column and their corresponding frequencies in another. 我有一个数据框，其中一列包含大约10,000个单词，另一列包含相应的频率。 I also have a vector with about 600 words. 我也有一个约600个字的向量。 Each of the 600 words is a word in the data frame. 600个单词中的每个单词都是数据帧中的一个单词。 How do I look up the frequencies for the 600-word vector from the 10,000 word data frame? 如何从10,000个字的数据帧中查找600字向量的频率？

Answer 1

One of the many solutions, with df$words being the column of your data.frame with the words and wordsvector being the vector: 许多解决方案之一，其中df$words是data.frame的列，而words和wordsvector是向量：

library(plyr)
freqwords <- ddply(df, .(words), summarize, n = length(words)) #shows frequency of all the words in the data.frame
freqwords[freqwords$words %in% wordsvector,] #keeping only the words that appear in your vector

Next time it would be helpful if you provide some dummy data so we can help you better. 下次如果您提供一些虚拟数据会很有帮助，以便我们更好地帮助您。

Answer 2

use dplyr 's join functions. 使用dplyr的join函数。

# make the 600 vector into a dataframe
600_df <- as.data.frame(600_vec)

# left join the two dataframes
df <- left_join(x = 600_df, y = 10000_df, by = "word")

where the "word" is the variable name constant between the two dataframes 其中“单词”是两个数据帧之间的变量名常量

如何提取R中单词子集的单词频率？

问题描述

2 个解决方案

解决方案1
0 2017-08-10 19:33:32

解决方案2
0 已采纳 2017-08-11 01:18:41

如何提取R中单词子集的单词频率？

问题描述

2 个解决方案

解决方案1 0 2017-08-10 19:33:32

解决方案2 0 已采纳 2017-08-11 01:18:41

解决方案1
0 2017-08-10 19:33:32

解决方案2
0 已采纳 2017-08-11 01:18:41