I am storing analytics data in an MySQL database as a table with a timestamp and some data , and want to downsample (ie group it within a time range) this data (by counting the number of entries) for displaying on an admin console, and I was wondering if it would be more efficient to select the data and downsample it with an R script, or if it would be better to use
GROUP BY UNIX_TIMESTAMP(timestamp) DIV <some time>
and do it on the database layer. Any other tips would also be appreciated.
If you can use dplyr
, you could do it with something like the following:
library(dplyr)
yay <-
# Specify username and password in my.cnf
src_mysql(host = "blah.com") %>%
tbl("some_table") %>%
# You will need to compute a grouping variable
mutate(group = unix_timestamp(timestamp)) %>%
group_by(group) %>%
# This will return the number of rows in each group
summarise(n = n()) %>%
# This will execute the query and return a data.frame
collect
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.