简体   繁体   English

使用group by时如何计算内部联接字段的中位数?

[英]How to calculate median of an inner join field when using group by?

I have the following query, where I retrieve the number of sales, and the average price of those sales for each day, for a particular item. 我有以下查询,可在其中查询特定项目的销售数量以及每天这些销售的平均价格。

SELECT COUNT(1) AS num_sales, DATE_FORMAT(sales.created_at, '%Y-%m-%d') AS date, AVG(prices.price) AS avg_price
FROM sales INNER JOIN prices ON prices.id = sales.price_id
WHERE prices.item_id = 7503 AND (`prices`.`source` = 0 or (`prices`.`price` >= 400 and `prices`.`source` > 0))
GROUP BY date
ORDER BY date ASC

I also have a for-loop that does a separate query for each day to get the median price (let's assume the number of results are even): 我也有一个for循环,每天都会做一个单独的查询来获取中位数价格(假设结果数是偶数):

SELECT prices.price FROM sales INNER JOIN prices ON prices.id = sales.price_id
WHERE prices.item_id = 7503 
AND (`prices`.`source` = 0 or (`prices`.`price` >= 400 and `prices`.`source` > 0))
AND DATE(sales.created_at) = "<THE DATE OF THE CURRENT FOR-LOOP OBJECT>"
ORDER BY prices.price ASC
LIMIT 1 OFFSET <NUMBER OF THE MIDDLE ROW>

As you can imagine, this is very slow, as in some cases hundreds of queries must be done on a large table (the sales table has a few hundred million rows). 可以想象,这非常慢,因为在某些情况下,必须在一个大表上(销售表有几亿行)执行数百个查询。

How do you rewrite the first SQL query so that it also calculates the median of prices.price , similar to AVG(prices.price) ? 如何重写第一个SQL查询,以便它也可以计算prices.price的中位数,类似于AVG(prices.price) I've looked at answers such as this one but can't wrap my head around how to adapt it for my specific scenario. 我看了答案,比如这一个 ,但不能换我围绕如何去适应它为我的特定方案的头。

I've spent hours trying to accomplish this, but my SQL knowledge simply isn't good enough. 我花了几个小时来尝试完成此任务,但是我的SQL知识还不够好。 Any help would be greatly appreciated! 任何帮助将不胜感激!

root@ns525077:~# mysql -V
mysql  Ver 14.14 Distrib 5.7.13, for Linux (x86_64) using  EditLine wrapper

Table schemas: 表模式:

CREATE TABLE `prices` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `item_id` int(11) unsigned NOT NULL,
 `price` decimal(8,2) NOT NULL,
 `net_price` decimal(8,2) NOT NULL,
 `source` tinyint(4) NOT NULL,
 `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 PRIMARY KEY (`id`),
 UNIQUE KEY `id` (`id`),
 KEY `prices_ibfk_1` (`item_id`),
 CONSTRAINT `prices_ibfk_1` FOREIGN KEY (`item_id`) REFERENCES `items` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=4861375 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

CREATE TABLE `sales` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `price_id` int(11) unsigned DEFAULT NULL,
 `item_key` varchar(40) COLLATE utf8_unicode_ci NOT NULL,
 `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 PRIMARY KEY (`id`),
 UNIQUE KEY `id` (`id`),
 UNIQUE KEY `item_key` (`item_key`),
 KEY `price_id` (`price_id`),
 KEY `created_at` (`created_at`),
 KEY `price_id__created_at__IX` (`price_id`,`created_at`),
 CONSTRAINT `sales_ibfk_1` FOREIGN KEY (`price_id`) REFERENCES `prices` (`id`) ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=386156944 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Example of output from my first query: 我的第一个查询的输出示例:

我的第一个查询的输出示例

I found the answer to my question here , after extensive searching. 经过广泛搜索,我在这里找到了问题的答案。 Perhaps I didn't word my question right initially. 也许我一开始没有说出我的问题。

I have adapted the solution to my own case, and here's the working query: 我已经根据自己的情况调整了解决方案,以下是工作查询:

SELECT COUNT(1) AS num_sales,
       DATE_FORMAT(sales.created_at, '%Y-%m-%d') AS date,
       AVG(prices.price) AS avg_price,
       CASE(COUNT(1) % 2)
       WHEN 1 THEN SUBSTRING_INDEX(
           SUBSTRING_INDEX(
               group_concat(prices.price
                            ORDER BY prices.price SEPARATOR ',')
               , ',', (count(*) + 1) / 2)
           , ',', -1)
       ELSE (SUBSTRING_INDEX(
                 SUBSTRING_INDEX(
                     group_concat(prices.price
                                  ORDER BY prices.price SEPARATOR ',')
                     , ',', count(*) / 2)
                 , ',', -1)
             + SUBSTRING_INDEX(
                 SUBSTRING_INDEX(
                     group_concat(prices.price
                                  ORDER BY prices.price SEPARATOR ',')
                     , ',', (count(*) + 1) / 2)
                 , ',', -1)) / 2
       END median_price
FROM sales
  INNER JOIN prices ON prices.id = sales.price_id
WHERE prices.item_id = 7381
      AND (`prices`.`source` = 0
           OR (`prices`.`price` >= 400
               AND `prices`.`source` > 0))
GROUP BY date
ORDER BY date ASC;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM