简体   繁体   English

如何提取R中单词子集的单词频率?

[英]how to extract word frequency for a subset of words in R?

I have a dataframe with about 10,000 words in one column and their corresponding frequencies in another. 我有一个数据框,其中一列包含大约10,000个单词,另一列包含相应的频率。 I also have a vector with about 600 words. 我也有一个约600个字的向量。 Each of the 600 words is a word in the data frame. 600个单词中的每个单词都是数据帧中的一个单词。 How do I look up the frequencies for the 600-word vector from the 10,000 word data frame? 如何从10,000个字的数据帧中查找600字向量的频率?

One of the many solutions, with df$words being the column of your data.frame with the words and wordsvector being the vector: 许多解决方案之一,其中df$words是data.frame的列,而words和wordsvector是向量:

library(plyr)
freqwords <- ddply(df, .(words), summarize, n = length(words)) #shows frequency of all the words in the data.frame
freqwords[freqwords$words %in% wordsvector,] #keeping only the words that appear in your vector

Next time it would be helpful if you provide some dummy data so we can help you better. 下次如果您提供一些虚拟数据会很有帮助,以便我们更好地帮助您。

use dplyr 's join functions. 使用dplyr的join函数。

# make the 600 vector into a dataframe
600_df <- as.data.frame(600_vec)

# left join the two dataframes
df <- left_join(x = 600_df, y = 10000_df, by = "word")

where the "word" is the variable name constant between the two dataframes 其中“单词”是两个数据帧之间的变量名常量

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM