简体   繁体   English

SQL 查找表中第一个出现的数据集

[英]SQL to find the first occurrence of sets of data in a table

Say if I have a table:假设我有一张桌子:

CREATE TABLE T
(
    TableDTM  TIMESTAMP  NOT NULL,
    Code      INT        NOT NULL
);

And I insert some rows:我插入一些行:

INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:00:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:10:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:20:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:30:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:40:00', 0);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:50:00', 1);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:00:00', 1);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:10:00', 1);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:20:00', 0);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:30:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:40:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:50:00', 3);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 12:00:00', 3);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 12:10:00', 3);

So I end up with a table similar to:所以我最终得到了一个类似于以下的表:

2011-01-13 10:00:00, 5
2011-01-13 10:10:00, 5
2011-01-13 10:20:00, 5
2011-01-13 10:30:00, 5
2011-01-13 10:40:00, 0
2011-01-13 10:50:00, 1
2011-01-13 11:00:00, 1
2011-01-13 11:10:00, 1
2011-01-13 11:20:00, 0
2011-01-13 11:30:00, 5
2011-01-13 11:40:00, 5
2011-01-13 11:50:00, 3
2011-01-13 12:00:00, 3
2011-01-13 12:10:00, 3

How can I select the first date of each set of identical numbers, so I end up with this:我怎样才能 select 每组相同数字的第一个日期,所以我最终得到这个:

2011-01-13 10:00:00, 5
2011-01-13 10:40:00, 0
2011-01-13 10:50:00, 1
2011-01-13 11:20:00, 0
2011-01-13 11:30:00, 5
2011-01-13 11:50:00, 3

I've been messing about with sub queries and the like for most of the day and for some reason I can't seem to crack it.在一天的大部分时间里,我一直在处理子查询之类的问题,但出于某种原因,我似乎无法破解它。 I'm sure there's a simple way somewhere!我敢肯定在某个地方有一个简单的方法!

I would probably want to exclude the 0's from the results, but that's not important for now..我可能想从结果中排除 0,但现在这并不重要。

Revised 15 Jan 11 1月15日修订

I'm sure there's a simple way somewhere! 我确定在某个地方有一个简单的方法!

Yes, there is. 就在这里。 But first, two Issues. 但首先是两个问题。

  1. The table is not a Relational Database table. 该表不是关系数据库表。 It does not have an unique key, which is demanded by the RM and Normalisation (specifically that each row must have an unique identifier; not necessarily a PK). 它没有唯一的密钥,这是RM和规范化所要求的(具体地说,每一行必须具有唯一的标识符;不一定是PK)。 Therefore SQL, a standard language, for operating on Relational Database tables, cannot perform basic operations on it. 因此,SQL(一种用于在关系数据库表上操作的标准语言)无法对其执行基本操作。

    • it is a Heap (data structure, inserted and deleted in chronological order), with records not rows. 它是一个堆(数据结构,按时间顺序插入和删除),记录不是行。
    • any and all operations using SQL will be horribly slow, and will not be correct 使用SQL的任何和所有操作都会非常慢,并且不正确
    • SET ROWCOUNT to 1, perform row processing, and SQL will work on the Heap just fine 将ROWCOUNT设置为1,执行行处理,SQL就可以在Heap上运行了
    • your best bet is use any unix utiliy to operate on it (awk, cut, chop). 你最好的选择是使用任何unix utiliy来操作它(awk,cut,chop)。 They are blindingly fast. 他们的速度非常快。 The awk script required to answer your requirement would take 3 mins to write and it will run in seconds for millions of records (I wrote a few last week). 回答你的要求所需的awk脚本需要3分钟才能写入,并且它会在几秒钟内运行数百万条记录(我上周写了一些)。
      .

    So the question really is SQL to find the first occurrence of sets of data in a non-relational Heap . 所以问题实际上是SQL在非关系堆中找到第一组数据集

    Now if your question was SQL to find the first occurrence of sets of data in a Relational table , implying of course some unique row identifier, that would be (a) easy in SQL, and (b) fast in any flavour of SQL ... 现在,如果你的问题是SQL来查找Relational表中第一次出现的数据集,当然暗示一些唯一的行标识符,这将是(a)在SQL中很容易,以及(b)快速的任何SQL的风格。 。

    • except Oracle, which is known to handle subqueries badly (specifically Tony Andrews' comments, he is a well-known authority on Oracle). 除了Oracle之外,已知它会严重处理子查询 (特别是Tony Andrews的评论,他是Oracle的知名权威)。 In which case, use Materialised Views. 在这种情况下,请使用物化视图。
      .
  2. The question is very generic (no complaint). 这个问题非常通用(没有投诉)。 But many of these specific needs are usually applied within a larger context, and the context has requirements which are absent from the specification here. 但是,这些特定需求中的许多通常在更大的上下文中应用,并且上下文具有本说明书中不存在的要求。 Generally the need is for a simple Subquery (but in Oracle use a Materialised View to avoid the subquery). 通常需要一个简单的子查询(但在Oracle中使用物化视图来避免子查询)。 And the subquery, too, depends on the outer context, the outer query. 子查询也取决于外部上下文,外部查询。 Therefore the answer to the small generic question will not contain the answer to the actual specific need. 因此,小通用问题的答案将不包含实际特定需求的答案。


Anyway, I do not wish to avoid the question. 无论如何,我不想回避这个问题。 Why don't we use a real world example, rather than a simple generic one; 为什么我们不使用现实世界的例子,而不是简单的通用例子; and find the first or last occurrence, or minimum or maximum value, of a set of data, within another set of data, in a Relational table ? 在Relational表中查找另一组数据中的一组数据的第一个或最后一个或最小值或最大值

Main Query 主要查询

Let's use the ▶Data Model◀ from your previous question. 让我们使用上一个问题中的▶数据模型◀

Report all Alerts since a certain date, with the peak Value for the duration, that are not Acknowledged 报告自特定日期以来的所有Alerts ,其持续时间的峰值为未Acknowledged

Since you will be using exactly the same technique (with different table and column names) for all your temporal and History requirements, you need to fully understand the basic construct of a Subquery, and its different applications. 由于您将使用完全相同的技术(具有不同的表和列名称)来满足您的所有时间和历史要求,因此您需要完全理解子查询的基本构造及其不同的应用程序。

Introduction 介绍

Note that you have, not only a pure 5NF Database, with Relational Identifiers (composite keys), you have full Temporal capability throughout, and the temporal requirement is rendered without breaking 5NF (No Update Anomalies), which means the ValidToDateTime for periods and durations is derived, and not duplicated in data. 请注意,您不仅拥有纯5NF数据库和关系标识符(复合键),而且您具有完整的Temporal功能,并且在不破坏5NF(无更新异常)的情况下呈现时间要求,这意味着ValidToDateTime和持续时间的ValidToDateTime是派生的,而不是在数据中重复。 Point is, that complicates things, hence this is not the best example for a tutorial on Subqueries . 点是,这使事情变得复杂,因此这 不是子查询教程的最佳示例

  • Remember the SQL engine is a set-processor, so we approach the problem with a set-oriented mindset 请记住,SQL引擎是一个集处理器,因此我们采用面向集合的思维方式解决问题
    • do not dumb the engine down to row-processing; 不要把发动机笨到行处理; that is very slow 很慢
    • and more important, unnecessary 更重要的是, 不必要的
  • Subqueries are normal SQL. 子查询是普通的SQL。 The syntax I am using is straight ISO/IEC/ANSI SQL. 我使用的语法是直接的ISO / IEC / ANSI SQL。
    • if you cannot code subqueries in SQL, you will be very limited; 如果你不能在SQL中编写子查询,那么你将非常有限; and then need to introduce data duplication or use large result sets as Materialised Views or temporary tables or all manner of additional data and additional processing, which will be slow to very slow , not to mention completely unnecessary 然后需要引入数据复制或使用较大的结果集物化视图或临时表或其他数据和附加处理的所有方式,这将是缓慢速度很慢 ,更不用说完全不必要
    • if there is anything you cannot do in a truly Relational Database (and my Data Models always are) without switching to row-processing or inline views or temp tables, ask for help, which is what you have done here. 如果在真正的关系数据库(以及我的数据模型总是如此)中无法做任何事情而没有切换到行处理或内联视图或临时表,请求帮助,这就是你在这里所做的。
  • You need to fully understand the first Subquery (simpler) before attempting to understand the second; 在尝试理解第二个子查询之前,您需要完全理解第一个子查询(更简单); etc. 等等

Method 方法

First build the Outer query using minimum joins, etc, based on the structure of the result set that you need, and nothing more. 首先根据您需要的结果集的结构 ,使用最小连接等构建外部查询,仅此而已。 It is very important that the structure of the outer query is resolved first; 首先解析外部查询的结构是非常重要的; otherwise you will go back and forth trying to make the subquery fit the outer query, and vice versa. 否则你会来回试图使子查询适合外部查询,反之亦然。

  • That happens to require a Subquery as well. 这恰好需要子查询。 So leave that part out for now, and pick that up later. 所以现在就把这个部分留下来,然后再把它拿出来。 For now, the Outer query gets all (not un-acknowledged) Alerts after a certain date 现在,外部查询在特定日期之后获得所有(不是未确认的) Alerts

The ▶SQL code◀ required is on page 1 (sorry, the SO edit features are horrible, it destroys the formatting, and the code is already formatted). 需要▶SQL代码◀是第1页(抱歉,SO编辑功能很糟糕,它会破坏格式化,代码已经格式化)。

Then build the Subquery to fill each cell. 然后构建子查询以填充每个单元格。

Subquery (1) Derive Alert.Value 子查询(1)导出 Alert.Value

That is a simple derived datum, select the Value from the Reading that generated the Alert . 这是一个简单的派生数据,从生成AlertReading中选择Value The tables are related, the cardinality is 1::1, so it is a straight join on the PK. 这些表是相关的,基数是1 :: 1,所以它是PK上的直接连接。

  • The type of Subquery required here is a Correlated Subquery , we need to correlate a table in the Outer query to a table in the (inner) Subquery. 这里需要的子查询类型是相关子查询 ,我们需要将外部查询中的表与(内部)子查询中的表相关联。
    • in order to do that, we need an Alias for the table in the Outer query, to correlate it to a table in the Subquery. 为了做到这一点,我们需要外部查询中的表的别名,以将其与子查询中的表相关联。
    • to make the distinction, I have used aliases only for such required correlation, and fully qualified names for plain joins 为了区分,我只使用别名来进行这种必需的关联,并使用普通连接的完全限定名
  • Subqueries are very fast in any engine (except Oracle) 任何引擎中的子查询都非常快(Oracle除外)
  • SQL is a cumbersome language. SQL是一种繁琐的语言。 But that's all we have. 但这就是我们所拥有的一切。 So get used to it. 所以习惯它。

The ▶SQL code◀ required is on page 2. 需要▶SQL代码◀是第2页。

I have purposely given you a mix of joins in the Outer Query vs obtaining data via Subquery, so that you can learn (you could alternately obtain Alert.Value via a join, but that would be even more cumbersome ). 我有意在外部查询中添加了多个连接,并通过子查询获取数据,这样您就可以学习(您可以通过连接交替获取Alert.Value ,但这会更加麻烦 )。

The next Subquery we need derives Alert.PeakValue . 我们需要的下一个子查询派生Alert.PeakValue For that we need to determine the Temporal Duration of the Alert . 为此,我们需要确定Alert的时间持续时间。 We have the beginning of the Alert Duration; 我们有Alert持续时间的开始; we need to determine the end of the Duration, which is the next (temporally) Reading.Value that is within range . 我们需要确定持续时间的结束,这是范围内下一个 (暂时) Reading.Value That requires a Subquery as well, which we better handle first. 这也需要一个Subquery,我们最好先处理它。

  • Work the logic from the inside, outward. 从内到外工作逻辑。 Good old BODMAS. 好老BODMAS。

Subquery (2) Derive Alert.EndDtm 子查询(2)导出 Alert.EndDtm

A slightly more complex Suquery to select the first Reading.ReadingDtm , that is greater than or equal to the Alert.ReadingDtm , that has a Reading.Value which is less than or equal to its Sensor.UpperLimit . 稍微复杂Suquery以选择第一Reading.ReadingDtm ,即大于或等于Alert.ReadingDtm ,具有Reading.Value小于或等于其Sensor.UpperLimit

Handling 5NF Temporal Data 处理5NF时态数据

For handling temporal requirements in a 5NF Database (in which EndDateTime is not stored, as is duplicate data), we work on a StartDateTime only, and the EndDateTime is derived : it is the next StartDateTime . 为了处理5NF数据库中的时间要求(其中存储EndDateTime ,如同重复数据),我们仅处理StartDateTime ,并导出 EndDateTime :它是下一个 StartDateTime This is the Temporal notion of Duration . 这是持续时间的时间概念。

  • Technically, it is one millisec (whatever the resolution for the Datatype used) less. 从技术上讲,它是一毫秒(无论所使用的数据类型的分辨率是多少)。
  • However, in order to be reasonable, we can speak of, and report, EndDateTime as simply the Next.StartDateTime , and ignore the one millisecond issue. 但是,为了合理,我们可以说EndDateTime并将其报告为Next.StartDateTime ,并忽略一毫秒的问题。
  • The code should always use >= This.StartDateTime and < Next.StartDateTime . 代码应始终使用> = This.StartDateTime< Next.StartDateTime
    • That eliminates a slew of avoidable bugs 这消除了大量可避免的错误
    • Note that these comparison operators, which bracket the Temporal Duration, and should be used in a conventional manner throughout as per above, are quite independent of similar comparison operators related to business logic, eg. 注意,这些比较运算符包括时间持续时间,并且应该按照上面的传统方式使用,它们完全独立于与业务逻辑相关的类似比较运算符,例如。 Sensor.UpperLimit (ie. watch for it, because both are often located in one WHERE clause, and it is easy to mix them up or get confused). Sensor.UpperLimit (即监视它,因为它们通常都位于一个WHERE子句中,很容易将它们混淆或混淆)。

The ▶SQL code◀ required, along with test data used, is on page 3. 所需的▶SQL代码◀以及使用的测试数据在第3页。

Subquery (3) Derive Alert.PeakValue 子查询(3)导出 Alert.PeakValue

Now it is easy. 现在很容易。 Select the MAX(Value) from Readings between Alert.ReadingDtm and Alert.EndDtm , the duration of the Alert . Alert.ReadingDtmAlert.EndDtm之间的Readings选择MAX(Value) ,即Alert的持续时间。

The ▶SQL code◀ required is on page 4. 需要▶SQL代码◀是第4页。

Scalar Subquery 标量子查询

In addition to being Correlated Subqueries, the above are all Scalar Subqueries , as they return a single value; 除了是相关子查询之外,以上都是标量子查询 ,因为它们返回单个值; each cell in the grid can be filled with only one value. 网格中的每个单元格只能填充一个值。 (Non-Scalar Subqueries, that return multiple values, are quite legal, but not for the above.) (返回多个值的非标量子查询非常合法,但不适用于上述情况。)

Subquery (4) Acknowledged Alerts 子查询(4)已确认的警报

Ok, now that you have a handle on the above Correlated Scalar Subqueries, those that fill cells in a set, a set that is defined by the Outer query, let's look at a Subquery that can be used to constrain the Outer query. 好了,现在你已经掌握了上面的相关标量子查询,那些填充集合中单元格的子集,一个由外部查询定义的集合,让我们看一下可以用来约束外部查询的子查询。 We do not really want all Alerts (above), we want Un-Acknowledged Alerts : the Identifiers that exist in Alert , that do not exist in Acknowledgement . 我们并不真正想要所有 Alerts (上图),我们需要Un-Acknowledged AlertsAlert中存在的标识符,在Acknowledgement不存在。 That is not filling cells, that is changing the content of the Outer set. 那不是填充单元格,即改变外部集合的内容 Of course, that means changing the WHERE clause. 当然,这意味着更改WHERE子句。

  • We are not changing the structure of the Outer set, so there is no change to the FROM and existing WHERE clauses. 我们不会更改外部集的结构 ,因此FROM现有 WHERE子句没有变化。

Simply add a WHERE condition to exclude the set of Acknowledged Alerts . 只需添加WHERE条件即可排除已Acknowledged Alerts 1::1 cardinality, straight Correlated join. 1 :: 1基数,直相关联接。

The ▶SQL code◀ required is on page 5. 需要▶SQL代码◀是第5页。

The difference is, this is a non-Scalar Subquery , producing a set of rows (one column). 不同的是,这是一个非标量子查询 ,产生一组行(一列)。 We have an entire set of Alerts (the Outer set) matched against an entire set of Acknowledgements . 我们有一整套Alerts (外部集)与一整套Acknowledgements相匹配。

  • The matching is processed because we have told the engine that the Subquery is Correlated , by using an alias (no need for cumbersome joins to be identified) 处理匹配是因为我们通过使用别名告知引擎子查询是相关的(不需要识别繁琐的连接)
  • Use 1 , because we are performing an existence check. 使用1 ,因为我们正在执行存在检查。 Visualise it as a column added onto the Alert set defined by the Outer query. 将其可视化为添加到外部查询定义的Alert集上的列。
  • Never use * because we do not need the entire set of columns, and that will be slower 永远不要使用*,因为我们不需要整个列集,而且速度会慢一些
  • Likewise, failing to use a correlation, means a WHERE NOT IN () is required, but again, that constructs the defined column set, then compares the two sets. 同样,未使用关联,意味着需要WHERE NOT IN () ,但同样,构造定义的列集,然后比较两个集。 Much slower. 慢得多。

Subquery (5) Actioned Alerts 子查询(5) Actioned Alerts

As an alternative constraint on the Outer query, for un-actioned Alerts , instead of (4), exclude the set of Actioned Alerts . 作为外部查询的替代约束,对于未执行的Alerts ,而不是(4),排除一组Actioned Alerts Straight Correlated join. 直接相关联接。

The ▶SQL code◀ required is on page 5. 需要▶SQL代码◀是第5页。

This code has been tested on Sybase ASE 15.0.3 using 1000 Alerts and 200 Acknowledgements , of different combinations; 此代码已在Sybase ASE 15.0.3上使用1000个Alerts和200个已Acknowledgements的不同组合进行了测试; and the Readings and Alerts identified in the document. 以及文件中确定的ReadingsAlerts Zero milliseconds execution time (0.003 second resolution) for all executions. 所有执行的零毫秒执行时间(0.003秒分辨率)。

If you need it, here is the ▶SQL Code in Text Format◀ . 如果需要,可以使用文本格式▶SQL代码◀

Response to Comments 对评论的回应

(6) ▶Register Alert from Reading◀ (6) ▶从阅读◀注册提醒
This code executes in a loop (provided), selecting new Readings which are out-of-range, and creating Alerts , except where applicable Alerts already exist. 此代码在循环(提供)中执行,选择超出范围的新Readings ,并创建Alerts ,除非适用的Alerts已存在。

(7) ▶Load Alert From Reading◀ (7) ▶从阅读◀加载警报
Given that you have a full set of test data for Reading , this code uses a modified form of (6) to load the applicable Alerts . 鉴于您有一整套用于Reading的测试数据,此代码使用修改后的(6)形式加载适用的Alerts

Common Problem 常见问题

It is "simple" when you know how. 当你知道如何时,它是“简单的”。 I repeat, writing SQL without the ability to write Subqueries is very limiting; 我再说一遍,编写没有编写子查询能力的SQL是非常有限的; it is essential for handling Relational Databases, which is what SQL was designed for. 它对于处理关系数据库至关重要,这是SQL的设计目标。

  • Half the reason developers implement unnormalised data heaps (massive data duplication) is because they cannot write the subqueries required for Normalised structures 开发人员实现非标准化数据堆(海量数据重复)的一半原因是因为它们无法编写规范化结构所需的子查询
    • it is not that they have "denormalised for performance"; 并不是说他们“为表现而非正常化”; it is that they cannot code for Normalised. 它们无法编码为Normalized。 I have seen it a hundred times. 我已经看过它一百次了。
    • Case in point here: you have a fully Normalised Relational Database, and the difficulty is coding for it, and you were contemplating duplicating tables for processing purposes. 举个例子:你有一个完全标准化的关系数据库,困难就是为它编码,你正在考虑复制表以便进行处理。
  • And that is not counting the added complexity of a temporal database; 这并不包括时间数据库增加的复杂性; or a 5NF temporal database. 或5NF时态数据库。
  • Normalisation means Never Duplicate Anything , more recently known as Don't Repeat Yourself 规范化意味着永远不要复制任何东西 ,最近被称为“ 不要重复自己”
  • Master Suqueries and you will be in the 98th percentile: Normalised, true Relational Databases; Master Suqueries,你将处于第98个百分点:规范化,真实的关系数据库; zero data duplication; 零数据重复; very high performance. 非常高的性能。

I think you can figure out the remaining queries you have. 我想你可以找出你剩下的查询。

Relational Identifier 关系标识符

Note, this example also happens to demonstrate the power of using Relational Identifiers , in that several tables in-between the ones we want do not have to be joined (yes! the truth is Relational Identifiers means less, not more, joins, than Id keys). 注意,这个例子也恰好证明了使用关系标识符的能力 ,因为我们想要的几个表之间不必连接(是的!事实是关系标识符意味着更少,而不是更多,连接,而不是Id键)。 Simply follow the solid lines. 只需按照实线。

  • Your temporal requirement demands keys containing DateTime . 您的时间要求需要包含DateTime键。 Imagine trying to code the above with Id PKs, there would be two levels of processing: one for the joins (and there would be far more of them), and another for the data processing. 想象一下尝试用Id PKs编写上面的代码,会有两个级别的处理:一个用于连接(并且会有更多的连接),另一个用于数据处理。

Label 标签

I try to stay away from colloquial labels ("nested", "inner", etc) because they are not specific, and stick to specific technical terms. 我试图远离口语标签(“嵌套”,“内部”等)因为它们不具体,并坚持特定的技术术语。 For completeness and understanding: 为了完整和理解:

  • a Subquery after the FROM clause, is a Materialised View , a result set derived in one query and then fed into the FROM clause of another query, as a "table". FROM子句之后的子查询,是一个物化视图 ,一个查询中派生的结果集,然后作为“表”输入另一个查询的FROM子句。
    • The Oracle types call this Inline View. Oracle类型调用此内联视图。
    • In most cases, you can write Correlated Subqueries as Materialised Views, but that is massively more I/O and processing (since Oracles handling of subqueries is abyssmal, for Oracle only, Materialised Views are "faster"). 在大多数情况下,您可以将相关子查询编写为物化视图,但这大大增加了I / O和处理(因为Oracles处理子查询非常简单,仅对Oracle而言,物化视图“更快”)。
      .
  • A Subquery in the WHERE clause is a Predicate Subquery , because it changes the content of the result set (that which it is predicated upon). WHERE子句中的子查询谓词子查询 ,因为它更改了结果集的内容(它所基于的内容)。 It can return either a Scalar (one value) or non-Scalar (many values). 它可以返回标量(一个值)或非标量(多个值)。

    • for Scalars, use WHERE column = , or any scalar operator 对于Scalars,请使用WHERE column =或任何标量运算符

    • for non-Scalars, use WHERE [NOT] EXISTS , or WHERE column [NOT] IN 对于非Scalars,使用WHERE [NOT] EXISTSWHERE column [NOT] IN

  • A Suquery in the WHERE clause does not need to be Correlated; WHERE子句中的Suquery 不需要相关; the following works just fine. 以下工作正常。 Identify all superfluous appendages: 识别所有多余的附属物:

     SELECT [Never] = FirstName, [Acted] = LastName FROM User WHERE UserId NOT IN ( SELECT DISTINCT UserId FROM Action ) 

PostgreSQL supports window functions, have a look at this PostgreSQL支持窗口函数,看看这个

[EDIT] Try the following: [编辑]尝试以下方法:

SELECT TableDTM, Code FROM
(
    SELECT TableDTM,
           Code,
           LAG(Code, 1, NULL) OVER (ORDER BY TableDTM) AS PrevCode
    FROM   T
)
WHERE PrevCode<>Code OR PrevCode IS NULL;

Try this: 试试这个:

SELECT MIN(TableDTM) TableDTM, Code
FROM
(
    SELECT T1.TableDTM, T1.Code, MIN(T2.TableDTM) XTableDTM
    FROM T T1
    LEFT JOIN T T2
    ON T1.TableDTM <= T2.TableDTM
    AND T1.Code <> T2.Code
    GROUP BY T1.TableDTM, T1.Code
) X
GROUP BY XTableDTM, Code
ORDER BY 1;

could you try something like 你可以尝试一下吗?

"SELECT DISTINCT Code, (SELECT MIN(TableDTM) FROM T AS Q WHERE Q.Code = T.Code) As TableDTM FROM T;"

and if you need to exclude the 0, change it in: 如果您需要排除0,请将其更改为:

 SELECT DISTINCT Code, (SELECT MIN(TableDTM) FROM T AS Q WHERE Q.Code = T.Code) As TableDTM FROM T WHERE Code <> 0;

Maybe I don't understand the question.也许我不明白这个问题。 But I don't see any mention of Common Table Expression or Analytic Functions.但我没有看到任何关于公用表表达式或分析函数的提及。 These are my weapons of choice for most problems, and when they can't handle it I start resorting to temporary tables.这些是我解决大多数问题的首选武器,当它们无法处理时,我开始求助于临时表。

I think, I recently solve a similar problem where I want to get the data of the first occurrence of an error when processing a daily interface file.我想,我最近解决了一个类似的问题,在处理一个日常的接口文件时,想获取第一次出错的数据。 Records on the interface that cause a problem are removed to a set of holding table so the rest of the records can be processed.接口上出现问题的记录被移除到一组保持表中,以便处理rest条记录。

-- EE with errors removed from most recent batch
with current_batch as (
      select employee_number, PVL.ADDITIONAL_INFORMATION
      from PERSONNEL_VALIDATION_LOG_X PVL
      where PVL.PERSONNEL_BATCH_ID = EMPSRV.CURRENTPERSONNELBATCH(6,900)
)
, hist as (
  select 
    row_number() over (
      partition by X.EMPLOYEE_NUMBER, X.ADDITIONAL_INFORMATION
      order by B.BATCH_STATUS_DATE
    ) as RN,
    B.PERSONNEL_BATCH_ID BatchId,
    B.SUBMITTAL_DATE,
    X.EMPLOYEE_NUMBER EMPNUM,
    MX.LAST_NAME,
    MX.FIRST_NAME,
    X.ADDITIONAL_INFORMATION
  from PERSONNEL_VALIDATION_LOG_X X
  join current_batch C on
    X.Employee_number = C.EMPLOYEE_NUMBER
    and X.additional_information = C.ADDITIONAL_INFORMATION
  join empsrv.personnel_batch B 
    on B.PERSONNEL_BATCH_ID = X.PERSONNEL_BATCH_ID
  join EMPSRV.PERSONNEL_MEMBER_DATA_X MX
    on X.PERSONNEL_BATCH_ID = MX.PERSONNEL_BATCH_ID
      and X.EMPLOYEE_NUMBER = MX.EMPLOYEE_NUMBER
)
select 
  batchId, 
  to_char(submittal_date, 'mm/dd/yyyy') First_Reported,
  EmpNum, 
  Last_name, 
  first_name, 
  additional_information
from hist where rn = 1
order by submittal_date desc;

The first CTE just limits the population to current errors.第一个 CTE 只是将总体限制为当前错误。 The hist CTE goes through the logs and picks up the first occurrence of that error (ie. ame EE and messge) This isn't perfect because maybe the error went away and came back, I would get the oldest occurrence and not the start of the most recent sequence. hist CTE 遍历日志并找出该错误的第一次出现(即 ame EE 和 messge)这并不完美,因为错误可能消失并返回,我会得到最旧的出现而不是开始最近的序列。 But this is good enough and not likely due to the shape of the error message itself.但这已经足够好了,而且不太可能是由于错误消息本身的形状。 The finally query just picks off the top row of each group which will be the first occurrence. finally 查询只是选择每个组的第一行,这将是第一次出现。

The query takes a few seconds to run, but my logs are not especially large, so performance is almost never an issue for me ever.查询需要几秒钟才能运行,但我的日志不是特别大,所以性能对我来说几乎从来都不是问题。 I also don't pay much attent to the dates on the questions.我也不太注意问题的日期。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM