简体   繁体   English

计算联接表的列

[英]Count columns of joined table

I am writing a query to summarize the data in a Postgres database: 我正在写一个查询来汇总Postgres数据库中的数据:

SELECT products.id, 
   products.NAME, 
   product_types.type_name AS product_type, 
   delivery_types.delivery, 
   products.required_selections, 
   Count(s.id)                AS selections_count, 
   Sum(CASE 
         WHEN ss.status = 'WARNING' THEN 1 
         ELSE 0 
       END)                AS warning_count 
FROM   products 
   JOIN product_types 
     ON product_types.id = products.product_type_id 
   JOIN delivery_types 
     ON delivery_types.id = products.delivery_type_id 
   LEFT JOIN selections_products sp 
          ON products.id = sp.product_id 
   LEFT JOIN selections s 
          ON s.id = sp.selection_id 
   LEFT JOIN selection_statuses ss 
          ON ss.id = s.selection_status_id 
   LEFT JOIN listings l 
          ON ( s.listing_id = l.id 
               AND l.local_date_time BETWEEN 
                   To_timestamp('2014/12/01', 'YYYY/mm/DD' 
                   ) AND 
                   To_timestamp('2014/12/30', 'YYYY/mm/DD') ) 
GROUP  BY products.id, 
      product_types.type_name, 
      delivery_types.delivery 

Basically we have a product with selections, these selections have listings and the listings have a local_date . 基本上,我们有一个带有选择项的产品,这些选择项具有清单,并且清单具有local_date I need a list of all products and how many listings they have between the two dates. 我需要所有产品的清单,以及两个日期之间有多少清单。 No matter what I do, I get a count of all selections (a total). 无论我做什么,我都会统计所有选择(总计)。 I feel like I'm overlooking something. 我觉得自己正在忽略某些东西。 The same concept goes for warning_count . 相同的概念用于warning_count Also, I don't really understand why Postgres requires me to add a group by here. 另外,我不太了解为什么Postgres要求我在此处添加一个group by

The schema looks like this (the parts you would care about anyway): 模式如下所示(无论如何,您都会关心的部分):

products
  name:string
, product_type:fk
, required_selections:integer
, deliver_type:fk

selections_products
  product_id:fk
, selection_id:fk

selections
  selection_status_id:fk
, listing_id:fk

selection_status
  status:string

listing
 local_date:datetime

The way you have it you LEFT JOIN to all selections irregardless of listings.local_date_time . 您拥有的方式,无论是否有listings.local_date_time ,都可以LEFT JOIN所有选择。

There is room for interpretation, we would need to see actual table definitions with all constraints and data types to be sure. 有解释的空间,我们需要查看具有所有约束和数据类型的实际表定义,以确保确定。 Going out on a limb, my educated guess is you can fix your query with the use of parentheses in the FROM clause to prioritize joins: 大胆尝试一下,我有根据的猜测是,您可以使用FROM子句中的括号来对连接进行优先级排序,从而解决查询问题:

SELECT p.id
     , p.name
     , pt.type_name AS product_type
     , dt.delivery
     , p.required_selections
     , count(s.id) AS selections_count
     , sum(CASE WHEN ss.status = 'WARNING' THEN 1 ELSE 0 END) AS warning_count
FROM   products       p
JOIN   product_types  pt ON pt.id = p.product_type_id
JOIN   delivery_types dt ON dt.id = p.delivery_type_id
LEFT   JOIN (  -- LEFT JOIN!
          selections_products sp
   JOIN   selections s  ON s.id  = sp.selection_id  -- INNER JOIN!
   JOIN   listings   l  ON l.id  = s.listing_id     -- INNER JOIN!
                       AND l.local_date_time >= '2014-12-01'
                       AND l.local_date_time <  '2014-12-31'
   LEFT   JOIN selection_statuses ss ON ss.id = s.selection_status_id
   ) ON sp.product_id = p.id
GROUP  BY p.id, pt.type_name, dt.delivery;

This way, you first eliminate all selections outside the given time frame with [INNER] JOIN before you LEFT JOIN to products, thus keeping all products in the result, including those that aren't in any applicable selection. 这样,您首先将[INNER] JOIN消除给定时间范围之外的所有选择, 然后再将 LEFT JOIN到产品,从而使所有产品都处于结果中,包括那些不在任何适用选择中的产品。

Related: 有关:

While selecting all or most products , this can be rewritten to be faster : 选择全部或大多数产品时 ,可以将其重写为更快

SELECT p.id
     , p.name
     , pt.type_name AS product_type
     , dt.delivery
     , p.required_selections
     , COALESCE(s.selections_count, 0) AS selections_count
     , COALESCE(s.warning_count, 0)    AS warning_count
FROM   products       p
JOIN   product_types  pt ON pt.id = p.product_type_id
JOIN   delivery_types dt ON dt.id = p.delivery_type_id
LEFT   JOIN (
   SELECT sp.product_id
        , count(*) AS selections_count
        , count(*) FILTER (WHERE ss.status = 'WARNING') AS warning_count
   FROM   selections_products sp
   JOIN   selections          s  ON s.id  = sp.selection_id
   JOIN   listings            l  ON l.id  = s.listing_id
   LEFT   JOIN selection_statuses ss ON ss.id = s.selection_status_id
   WHERE  l.local_date_time >= '2014-12-01'
   AND    l.local_date_time <  '2014-12-31'
   GROUP  BY 1
   ) s ON s.product_id = p.id;

It's cheaper to aggregate and count selections and warnings per product_id first, and then join to products. 首先汇总每个product_id选择和警告并对其进行计数, 然后再加入产品,这样比较便宜。 (Unless you only retrieve a small selection of products, then it's cheaper to reduce related rows first.) (除非只检索一小部分产品,否则先减少相关行会比较便宜。)

Related: 有关:


Also, I don't really understand why Postgres requires me to add a group by here. 另外,我不太了解为什么Postgres要求我在此处添加一个小组。

Since Postgres 9.1, the PK column in GROUP BY covers all columns of the same table. 从Postgres 9.1开始, GROUP BY的PK列覆盖同一表的所​​有列。 That does not cover columns of other tables, even if they are functionally dependent. 包括其他表的列,即使他们在功能上是相关的。 You need to list those explicitly in GROUP BY if you don't want to aggregate them. 如果您不想汇总它们,则需要在GROUP BY明确列出。

My second query avoids this problem on the outset by aggregating before the join. 我的第二个查询从一开始就通过在联接之前进行聚合来避免此问题。


Aside: chances are, this doesn't do what you want: 撇开:机会是,这并不能满足您的要求:

l.local_date_time BETWEEN To_timestamp('2014/12/01', 'YYYY/mm/DD')
                      AND To_timestamp('2014/12/30', 'YYYY/mm/DD')

Since date_time seems to be of type timestamp (not timestamptz !), you would include '2014-12-30 00:00', but exclude the rest of the day '2014-12-30'. 由于date_time似乎是timestamp类型(不是timestamptz !),因此您应包括 “ 2014-12-30 00:00”,但排除一天的其余时间“ 2014-12-30”。 And it's always better to use ISO 8601 format for dates and timestamps, which is means the same with every locale and datestyle setting. 对于日期和时间戳,最好使用ISO 8601格式,这与每种语言环境和日期datestyle设置相同。 Hence: 因此:

WHERE  l.local_date_time >= '2014-12-01'
AND    l.local_date_time <  '2014-12-31'

This includes all of '2014-12-30', and nothing else. 其中包括所有 “ 2014-12-30”,仅此而已。 No idea why you chose to exclude '2014-12-31'. 不知道为什么您选择排除“ 2014-12-31”。 Maybe you really want to include all of Dec. 2014? 也许您真的想包括2014年12月的全部内容?

WHERE  l.local_date_time >= '2014-12-01'
AND    l.local_date_time <  '2015-01-01'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM