Downsampling analytics data in MySQL or in R

Question

I am storing analytics data in an MySQL database as a table with a timestamp and some data , and want to downsample (ie group it within a time range) this data (by counting the number of entries) for displaying on an admin console, and I was wondering if it would be more efficient to select the data and downsample it with an R script, or if it would be better to use

GROUP BY UNIX_TIMESTAMP(timestamp) DIV <some time>

and do it on the database layer. Any other tips would also be appreciated.

Answer 1

If you can use dplyr , you could do it with something like the following:

library(dplyr)

yay <- 
  # Specify username and password in my.cnf
  src_mysql(host = "blah.com") %>%
  tbl("some_table") %>%
  # You will need to compute a grouping variable
  mutate(group = unix_timestamp(timestamp)) %>%
  group_by(group) %>%
  # This will return the number of rows in each group
  summarise(n = n()) %>%
  # This will execute the query and return a data.frame
  collect

Downsampling analytics data in MySQL or in R

Question

1 answers

solution1
0 ACCPTED 2016-09-24 20:55:57

Downsampling analytics data in MySQL or in R

Question

1 answers

solution1 0 ACCPTED 2016-09-24 20:55:57

solution1
0 ACCPTED 2016-09-24 20:55:57