简体   繁体   中英

time series multivariate regression - irregular time entry / multiple variables

I've read a tonne of help and online booking including http://a-little-book-of-r-for-time-series.readthedocs.io/en/latest/src/timeseries.html but can't seem to find a similar example to what I need. I've got time entries in a dataset that are not regular. I am tracking tweets. Here is my sample dataframes: Tweets dataframe: (tweet count is always 1 - it is a dummy)

datetime            tweetcount retweets  hashtags_used atmention likes
02-01-2016 02:34      1          3          1              2       1
04-01-2016 13:45      1          1          1              1       0
04-01-2016 17:55      1          5          2              4       2

Follow_dat (this is a separate dataframe and follow count is always 1 - dummy)

datetime            followcount 
02-01-2016 02:34      1         
04-01-2016 13:45      1         
04-01-2016 17:55      1         

I've tried several things: For instance, I used the cut command to cut the data into hours, but this is not accurate because a follower may still preceed a tweet during that hour - I'm not sure if that makes sense.

What I am trying to do is find out which tweet variables/factors the followers are related to across time. By cutting by hour to create a more refined table I'm reducing the accuracy, but I can't seem to find another way to do a regression and fit a model to this, or find which factors are important.

df$week <- as.Date(cut(df$datetime, breaks = "week", start.on.monday = FALSE)) 

Is the code I used to cut the data. I then aggregated into another table. From there I can run ARIMA but... a) this means that a follow and tweet can be associated in the same week, even though a follow happens before a tweet. I need to ensure the follow comes after. b) if a follow happens in the next week, then it's not associated.

It's a reasonable approach to recode the datetime into several factors like year, month, weekday, minute or second and supply appropriate aggregation, especially if you are trying to determine seasonality or trend.

Can you explain in a bit more detail what you are trying to forecast/accomplish?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM