如何有效地将基于时间的数据集分组以计算组平均值和中位数

Question

I have a table with 100000+ records as below. 我有一个包含100000多个记录的表，如下所示。 According to the system, this table might have daily transactions and it will be growing day by day. 根据系统，此表可能每天都有交易，并且每天都在增长。

+---------------------+-----------+
|        Date         | Value     |
+---------------------+-----------+
| 2018-12-21 11:17:00 | 85.8      |
| 2018-12-28 15:07:00 | 16.2      |
| 2019-01-28 08:05:00 | 24.8      |
| 2019-02-28 12:07:00 | 13.9      |
| 2019-05-28 10:48:00 | 8         |
| 2019-05-28 09:17:00 | 40.6      |
| 2019-08-28 10:06:00 | 71.9      |
| 2019-08-16 17:28:00 | 36        |
| 2019-08-28 10:07:00 | 1922      |
| …                   | …         |
+---------------------+-----------+

I want to group the data by quarters and get the quarterly average and median to show in graphs as follows. 我想按季度对数据进行分组，并获得季度平均值和中位数，以图形方式显示如下。

Average - example graph average X= Quarter, Y = Value 平均值-示例图平均值 X =季度，Y =值

Median - example graph median X= Quarter, Y = Value 中位数-示例图中位数 X =季度，Y =值

I am using PHP Laravel 5.8 and a MySQL database. 我正在使用PHP Laravel 5.8和MySQL数据库。

In my approach, first I created an array of keys [year-quarter] depending on the start date and end date [“2018 3”, “2018 4”, “2019 1”, “2019 2”, “2019 3”] 在我的方法中，首先，我根据开始日期和结束日期创建了一组键[year-quarter] [“2018 3”, “2018 4”, “2019 1”, “2019 2”, “2019 3”]

And then I used a foreach loop to read through the 100000+ records and put values in subarrays under relevant key. 然后，我使用了一个foreach循环来读取100000多个记录，并将值放在相关键下的子数组中。 This hits 100% CPU usage in an apache, 2 core server with 4GB RAM for one user access. 这在apache，2核心服务器和4GB RAM中的一个用户访问中达到100％的CPU使用率。 The foreach loop was observed to be consuming a huge CPU power. 观察到foreach循环正在消耗大量CPU能力。

// prepare all array keys        
$chunkedData = array();
while (Carbon::parse($startDate)<=Carbon::parse($endDate)) {
        $chunkedData[Carbon::parse($startDate)->isoFormat('Y Q')] = array();
            $startDate = Carbon::parse($startDate)->addMonths(3);
        }
// foreach loop to read all the records
foreach ($arrScanData as $scanData) {
    $key = Carbon::parse($scanData->qr_generated_date)->isoFormat('Y Q');
    array_push($chunkedData[$key], (float)$scanData->value);
}

I appreciate if you can give me few solutions (logical and architecture) to overcome my problem. 如果您能给我一些解决方案（逻辑和体系结构）来解决我的问题，我将不胜感激。

Answer 1

I would let MySQL do the bulk of the processing. 我会让MySQL做大部分的处理。

The following would give you the quarters with their values: 以下内容将为您提供这些季度的价值：

SELECT CONCAT(YEAR(date) , ' ', QUARTER(date)) AS quarter, value FROM yourtable

The averages could be calculated directly with: 可以直接使用以下方法计算平均值：

SELECT CONCAT(YEAR(date) , ' ', QUARTER(date)) AS quarter, AVG(value) AS average FROM yourtable GROUP BY quarter

The median calculation would take some processing based on the first query. 中位数计算将根据第一个查询进行一些处理。 I have no experience in that. 我没有经验。 MariaDB does have a MEDIAN function. MariaDB确实具有MEDIAN函数。

Answer 2

I would suggest you to change your approach as bellow. 我建议您将方法更改为波纹管。

Do not read 100000+ records once. 不要一次读取100000+条记录。 Just read start and end date from the table to build your quarter array. 只需从表中读取开始和结束日期即可构建四分之一数组。 Then you can loop $chunkedData array and read data for each quarter by providing quarter start and end dates or may be just with quarter. 然后，您可以循环提供$ chunkedData数组，并通过提供季度开始和结束日期或仅与季度一起来读取每个季度的数据。

如何有效地将基于时间的数据集分组以计算组平均值和中位数

问题描述

2 个解决方案

解决方案1
0 2019-08-28 11:54:19

解决方案2
0 2019-08-28 12:24:58

如何有效地将基于时间的数据集分组以计算组平均值和中位数

问题描述

2 个解决方案

解决方案1 0 2019-08-28 11:54:19

解决方案2 0 2019-08-28 12:24:58

解决方案1
0 2019-08-28 11:54:19

解决方案2
0 2019-08-28 12:24:58