I have many tweets as a text.
I would like to know the frequency of words after a specific word. For instance, I have these tweets and I want to know the frequency after "love":
My love is...
My love is...
the love was...
the love were...
to get this result:
word next word frequency
Love is 2
Love was 1
Love were 1
or to all words
word next word frequency
My Love 2
the love 2
Love is 2
Love was 1
Love were 1
The following procedure might help.
Step1 (optional): Creating some example data
example <- c("my love is","my love is","banana","apple","the love was","the love were")
This vector looks like
"my love is" "my love is" "banana" "apple" "the love was" "the love were"
Step2: Taking all entries of the vector which include the word "love"
ex2 <- example[grep("love",example)]
which gives you
"my love is" "my love is" "the love was" "the love were"
Step3: Constructing a table of the word which comes after the word "love"
ex3 <- table(gsub(".*love","",ex2))
which gives you
is was were
2 1 1
As you are dealing with several word combinations (first X second), I don't see any way to avoid using a loop. The function below should do what you want:
phrase <- c("My love is... ","My love is...","A love was...","the dogs were...")
SPLIT <- matrix(unlist(strsplit(phrase," ")),nrow=length(phrase),byrow=T)
vect <- as.data.frame(cbind(unique(expand.grid(SPLIT[,1],SPLIT[,2])),freq=NA))
to.find <- paste(vect[,1],vect[,2],sep=" ")
for (i in 1:length(to.find)) {
vect[i,3] <- length(grep(to.find[i],phrase))}
vect <- subset(vect,freq>0)
vect
vect
Var1 Var2 freq
1 My love 2
3 A love 1
16 the dogs 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.