I have a dataset of tweets from 2013 to 2017. I have coded for certain message features(coded 0 as absence and 1 as presence), and was trying to figure out if there is a trend (ie, the occurrence of message feature going up/down gradually) in my dataset. How should I do it in R?
You could try a linear model, such as one of the answer given in here .
#for reproducing
set.seed(200)
library(ggplot2)
#simple example. Assume your data is simple binomial variable with probability 0.3
data <- data.frame(time = 1:200, val=sample(c(0,1), size = 200, replace = T, prob = c(0.3, 0.7)))
#plot using ggplot and add linear regression and confidence interval
ggplot(data, aes(x = time, y=val)) + geom_smooth(method=lm) +geom_point()
#Now we can try to create linear regression
y = data$time
x = data$val
fitData <- lm(x ~ y)
predict(fitData, newdata = data.frame(y=201:224), interval="confidence")
#You can also take advantage of geom_smooth that find the best model if your don't specify any:
ggplot(data, aes(x = time, y=val)) + geom_smooth() +geom_point()
#Here, it seems that loess would be better
Some code to do a loess regression in R here .
Good luck!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.