简体   繁体   English

计算ID在数据库表中的出现次数

[英]Counting occurrences of an id in a database table

I'm having some trouble with an Oracle database query and the related subqueries. 我在使用Oracle数据库查询和相关子查询时遇到了麻烦。 At it's core, the problem is count the number of times an ID from one table occurs in another table. 从根本上讲,问题在于计算一个表中的ID在另一表中出现的次数。

The Problem: I have two tables, an orders table, which stores information on ordered items from a web service. 问题:我有两个表,一个订单表,该表存储有关Web服务中已订购商品的信息。 Data from that table is run through a process (which I have no control over) and the result is placed into a fulfilled table. 该表中的数据通过一个过程(我无法控制)运行,结果被放入一个已实现的表中。

Order numbers are not unique to one item. 订单号不是一件商品唯一的。 Each order can have a large number of items, and each item is stored on a line . 每个订单可以包含大量商品,并且每个商品都存储在一行中 Items, however, can actually be a combo/package and that is what the process handles. 但是,项目实际上可以是组合/包装,这就是流程要处理的内容。 An item, GAME_PACK for example, can come into the orders table and out on the other end comes out GAME1, GAME2, GAME3 and are associated by the order number. 一个项目,例如GAME_PACK,可以进入订单表,而另一端则出现GAME1,GAME2,GAME3,并由订单号关联。

表示订购过程的简单图

The problem is, occasionally these items don't come out of the process correctly and then a line_item may not be associated with a fulfilled item. 问题是,有时这些项目无法正确地从流程中退出,然后line_item可能不会与已完成的项目相关联。 The only way I can, with the resources available, determine if there is an issue is by getting the maximum line_number and comparing that to the number of fulfilled_item groups. 我可以利用可用资源来确定是否存在问题的唯一方法是,获取最大的line_number并将其与committed_item组的数量进行比较。

What I've tried: At first I thought it would be fairly simple to do, simply using a rownumber() or denserank() analytical function over a partition by the order number, but it has become much more confusing than that. 我尝试过的方法:最初,我认为这样做很简单,只需在按顺序编号的分区上使用rownumber()denserank()分析函数,但它变得比这更加混乱。 This is currently the query I am working with: 当前是我正在使用的查询:

select * 
from(
    select max (item_index) over (partition by tbl.item_number) item_count, tbl.*
        from (
            select i.item_fulfill_number, i.order_number, row_number()over(partition by i.item_number, i.order_number order by i.order_number) item_index 
            from fulfilled_items i ) tbl
            ) results 
            inner join (
                select * 
                from (
                    select orderinfo.order_number as order_order_number, orderinfo.line_number, orderinfo.ordered_item, row_number() over(partition by orderinfo.order_number order by orderinfo.line_number desc) order_row 
                    from orderinfo
                    ) 
                where order_row <= 1
                )
            on results.order_number = order_order_number
where results.item_count = results.item_index and ordered_item like 'GAME%'

note that right now I am pulling when the counts match, this logic will be reversed when I am certain the query works 请注意,现在我在计数匹配时进行提取,当我确定查询有效时,此逻辑将被逆转

Constraints 约束条件

  • I do not have access to the process that splits the items 无权拆分项目
  • The query should run quickly, we are working with upwards of 50,000 possible records 查询应该可以快速运行,我们正在处理多达50,000条记录
  • The query was tested at 22 seconds to over 2 minutes execution time 在22秒到2分钟的执行时间内对查询进行了测试
  • Pagination is going to be used, if you answer, don't worry about including it, but it is something to consider because it can greatly help or hurt the speed of the query 将使用分页,如果您回答,则不必担心将其包括在内,但这是要考虑的事情,因为它可以极大地帮助或损害查询的速度
  • I cannot touch table structure 我无法触摸桌子结构

Table Structure and Graphic Representation 表结构和图形表示 流程运行后,订单表和已完成项目表之间的关系 (Maximum line number represents the number of fulfilled_item groups) (最大行号表示complied_item组的数量)

Thank you for taking the time to read this. 感谢您抽出时间来阅读。

EDIT Results should look something like this: 编辑结果应如下所示: 查询的样本输出

where item comes from the orders table, and result is OK, BAD based on whether or not the fulfilled groups matches the max line number. 其中项目来自订单表,结果为OK,则根据已完成的组是否与最大行数匹配来确定BAD。

If I'm understanding correctly, each order should have the same number of fulfillment groups as there are line numbers in the order. 如果我理解正确,则每个订单应具有与订单中的行号相同的履行组数量。 Each fulfillment group will be of unknown size and be represented by a unique fulfillment number. 每个履行组的大小均未知,并由唯一的履行编号表示。 Based on that, I think the query should be as simple as this: 基于此,我认为查询应该像这样简单:

SELECT 
  main.*, 
  'BAD' AS result
FROM (
    SELECT DISTINCT
      o.order_number,
      COUNT(o.line_number) OVER (PARTITION BY o.order_number) AS order_lines,
      (SELECT COUNT(DISTINCT item_fulfill_number) FROM fulfilled_items f WHERE f.order_number = o.order_number) AS fulfilled_groups
    FROM orders o
) main
WHERE order_lines != fulfilled_groups

The subquery counts the number of lines (just in case a line number gets skipped, but you could change this back to a max on the line number if you really want to) and the number of distinct fulfillment groups. 子查询计算行数(以防万一跳过行号,但是如果您确实愿意,可以将其更改回最大行数)以及不同的实现组数。 The overall query returns those orders where the two counts are not equal. 整体查询返回两个计数不相等的那些订单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM