简体   繁体   English

MySQL,返回X小时内的所有测量和结果

[英]MySQL, return all measurements and results within X last hours

This question is very much related to my previous question: MySQL, return all results within X last hours altough with additional significant constraint: 这个问题与我之前的问题非常相关: MySQL,在过去几小时内返回X内的所有结果 ,另外还有一个重要的约束:

Now i have 2 tables, one for measurements and one for classified results for part of the measurements. 现在我有2个表,一个用于测量,一个用于部分测量的分类结果。

measurements are constantly arrive so as result, that are constantly added after classification of new measurements. 测量结果不断到达,结果是在新测量分类后不断添加。

results will not necessarily be stored in the same order of measurement's arrive and store order! 结果不一定以测量的到达和存储顺序的相同顺序存储!

I am interested only to present the last results. 我只对介绍最后的结果感兴趣。 By last i mean to take the max time (the time is a part of the measurement structure) of last available result call it Y and a range of X seconds , and present the measurements together with the available results in the range beteen Y and YX. 最后我的意思是将最后一次可用结果的最大时间(时间是测量结构的一部分) 称为Y和X秒的范围,并将测量结果与Y和YX范围内的可用结果一起呈现。 。

The following are the structure of 2 tables: 以下是2个表的结构:

event table: 事件表:

CREATE TABLE `event_data` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `Feature` char(256) NOT NULL,
  `UnixTimeStamp` int(10) unsigned NOT NULL,
  `Value` double NOT NULL,

  KEY `ix_filter` (`Feature`),
  KEY `ix_time` (`UnixTimeStamp`),
  KEY `id_index` (`id`)
) ENGINE=MyISAM

classified results table: 分类结果表:

CREATE TABLE `event_results` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `level` enum('NORMAL','SUSPICIOUS') DEFAULT NULL,
  `score` double DEFAULT NULL,
  `eventId` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `eventId_index` (`eventId`)
) ENGINE=MyISAM

I can't query for the last measurements timestamp first since i want to present measurements for which there are currently results, and since measurements arrive constantly, results may still not be available. 我不能首先查询最后的测量时间戳,因为我想要呈现当前结果的测量结果,并且由于测量结果不断到达,结果可能仍然不可用。

Therefore i thought of joining the two tables using 因此,我想加入两个表使用
event_results.eventId=event_data.id and than selecting the max time of event_data.UnixTimeStamp as maxTime , after i have the maxTime, i need to do the same opearation again (joining 2 tables) and adding in a where clause a condition event_results.eventId=event_data.id并且选择event_results.eventId=event_data.id的最大时间event_data.UnixTimeStamp as maxTime ,在我有了maxTime之后,我需要再次进行相同的操作(连接2个表)并在where子句中添加一个条件

WHERE event_data.UnixTimeStamp >= maxTime + INTERVAL -X SECOND

It seems to be not efficient to execute 2 joins only to achieve what i am asking, Do you have more ef 执行2个连接似乎效率不高只是为了实现我的要求,你有更多的ef

From my understanding, you are using an aggregate function, MAX . 根据我的理解,您使用的是聚合函数MAX This will produce a record set of size one as a result, which is the highest time from which you will perform. 这将生成一个大小为1的记录集,这是您执行的最长时间。 Therefore, it needs to be broken out into a sub query (As you say, nested select). 因此,需要将其分解为子查询(如您所说,嵌套选择)。 You HAVE to do 2 queries at some point. 你必须在某个时候做2个查询。 (Your answer to the last question has 2 queries in it, by having subqueries/nested selects). (通过子查询/嵌套选择,您对上一个问题的答案中有2个查询)。

The main time sub queries cause problems is when you perform the subquery in the select part of the query, as it performs the subquery for each time there is a row, which will make the query run exponentially slower as the resultset grows. 子查询的主要时间导致问题是在查询的select部分中执行子查询时,因为它在每次有行时执行子查询,这将使查询在结果集增长时以指数方式运行。 Lets take the answer to your last question and write it in a horrible, inefficient way: 让我们回答你的上一个问题并以一种可怕的,低效的方式写下来:

SELECT timeStart, 
       (SELECT max(timeStart) FROM events) AS maxTime
FROM events
WHERE   timeStart > (maxTime + INTERVAL -1 SECOND)

This will perform a select query for each time there is an eventTime record, for the max eventtime. 对于max eventtime,这将在每次有eventTime记录时执行select查询。 It should produce the same result, but this is slow. 它应该产生相同的结果,但这很慢。 This is where the fear of subqueries comes from. 这就是对子查询的恐惧来自的地方。

It also performs the aggregate function MAX on each row, which will return the same answer each time. 它还在每一行上执行聚合函数MAX ,每次都会返回相同的答案。 So, you perform that sub query ONCE rather than on each row. 因此,您执行该子查询ONCE而不是每行。

However, in the case of the answer of your last question, the MAX sub query part is ran once, and used to filter on the where, of which that select is ran once. 但是,对于上一个问题的答案, MAX子查询部分运行一次,并用于过滤选择运行一次的位置。 So, in total, 2 queries are ran. 因此,总共运行了2个查询。

2 super fast queries are faster ran one after the other than 1 super slow query that is super slow. 2超快速查询比1超级慢查询更快跑了一个超慢。

I'm not entirely sure what resultset you want returned, so I am going to make some assumptions. 我不完全确定你想要返回什么结果集,所以我将做一些假设。 Please feel free to correct any assumptions I've made. 请随意纠正我所做的任何假设。

It sounds (to me) like you want ALL rows from event_data that are within an hour (or however many seconds) of the absolute "latest" timestamp, and along with those rows, you also want to return any related rows from event_results , if any matching rows are available. 听起来(对我而言)就像你希望event_data中的所有行都在绝对“最新”时间戳的一小时(或多秒)内,以及这些行,你还希望从event_results返回任何相关的行,如果任何匹配的行都可用。

If that's the case, then using an inline view to retrieve the maximum value of timestamp is the way to go. 如果是这种情况,那么使用内联视图来检索时间戳的最大值是要走的路。 (That operation will be very efficient, since the query will be returning a single row, and it can be efficiently retrieved from an existing index.) (该操作将非常高效,因为查询将返回单行,并且可以从现有索引中有效地检索它。)

Since you want all rows from a specified period of time (from the "latest time" back to "latest time minus X seconds"), we can go ahead and calculate the starting timestamp of the period in that same query. 由于您需要指定时间段内的所有行(从“最新时间”返回“最新时间减去X秒”),我们可以继续计算同一查询中句点的起始时间戳。 Here we assume you want to "go back" one hour (=60*60 seconds): 在这里,我们假设你想“回去”一小时(= 60 * 60秒):

SELECT MAX(UnixTimeStamp) - 3600 FROM event_data

NOTE: the expression in the SELECT list above is based on UnixTimeStamp column defined as integer type, rather than as a DATETIME or TIMESTAMP datatype. 注意:上面SELECT列表中的表达式基于定义为整数类型的UnixTimeStamp列,而不是DATETIME或TIMESTAMP数据类型。 If the column were defined as DATETIME or TIMESTAMP datatype, we would likely express that with something like this: 如果列被定义为DATETIME或TIMESTAMP数据类型,我们可能会用以下内容表示:

SELECT MAX(mydatetime) + INTERVAL -3600 SECONDS

(We could specify the interval units in minutes, hours, etc.) (我们可以用分钟,小时等来指定间隔单位)

We can use the result from that query in another query. 我们可以在另一个查询中使用该查询的结果。 To do that in the same query text, we simply wrap that query in parentheses, and reference it as a rowsource, as if that query were an actual table. 要在相同的查询文本中执行此操作,我们只需将该查询包装在括号中,并将其作为行源引用,就好像该查询是实际的表一样。 This allows us to get all the rows from event_data that are within in the specified time period, like this: 这允许我们从指定时间段内的event_data中获取所有行,如下所示:

SELECT d.id
     , d.Feature
     , d.UnixTimeStamp
     , d.Value
  JOIN ( SELECT MAX(l.UnixTimeStamp) - 3600 AS from_unixtimestamp
           FROM event_data l
       ) m
  JOIN event_data d
    ON d.UnixTimetamp >= m.from_unixtimestamp

In this particular case, there's no need for an upper bound predicate on UnixTimeStamp column in the outer query. 在这种特殊情况下,外部查询中的UnixTimeStamp列不需要上限谓词。 This is because we already know there are no values of UnixTimeStamp that are greater than the MAX(UnixTimeStamp), which is the upper bound of the period we are interested in. 这是因为我们已经知道没有UnixTimeStamp的值大于MAX(UnixTimeStamp),这是我们感兴趣的时期的上限。

(We could add an expression to the SELECT list of the inline view, to return MAX(l.UnixTimeStamp) AS to_unixtimestamp , and then include a predicate like AND d.UnixTimeStamp <= m.to_unixtimestamp in the outer query, but that would be unnecessarily redundant.) (我们可以MAX(l.UnixTimeStamp) AS to_unixtimestamp联视图的SELECT列表中添加一个表达式,返回MAX(l.UnixTimeStamp) AS to_unixtimestamp ,然后在外部查询中包含AND d.UnixTimeStamp <= m.to_unixtimestamp类的谓词,但那将是不必要地多余。)

You also specified a requirement to return information from the event_results table. 您还指定了从event_results表返回信息的要求。

I believe you said that you wanted any related rows that are "available". 我相信你说你想要任何“可用”的相关行。 This suggests (to me) that if no matching row is "available" from event_results , you still want to return the row from the event_data table. 这表明(对我而言)如果event_results没有匹配的行“可用”,您仍然希望从event_data表返回该行。

We can use a LEFT JOIN operation to get that to happen: 我们可以使用LEFT JOIN操作来实现这一点:

SELECT d.id
     , d.Feature
     , d.UnixTimeStamp
     , d.Value
     , r.id
     , r.level
     , r.score
     , r.eventId
  JOIN ( SELECT MAX(l.UnixTimeStamp) - 3600 AS from_unixtimestamp
           FROM event_data l
       ) m
  JOIN event_data d
    ON d.UnixTimetamp >= m.from_unixtimestamp
  LEFT
  JOIN event_results r
    ON r.eventId = d.id

Since there is no unique constraint on the eventID column in the event_results table, there is a possibility that more than one "matching" row from event_results will be found. 由于是在没有唯一约束eventIDevent_results表中,有一种可能性,即从event_results不止一个“匹配”行会被发现。 Whenever that happens, the row from event_data table will be repeated, once for each matching row from event_results . 每当发生这种情况时, event_data表中的行将重复一次,对于来自event_results每个匹配行。

If there is no matching row from event_results , then the row from event_data will still be returned, but with the columns from the event_results table set to NULL. 如果event_results没有匹配的行,则仍会返回event_data的行,但是event_results表中的列设置为NULL。

For performance, remove any columns from the SELECT list that you don't need returned, and be judicious in your choice of expressions in an ORDER BY clause. 为了提高性能,请从SELECT列表中删除不需要返回的任何列,并在ORDER BY子句中选择表达式时明智。 (The addition of a covering index may improve performance.) (添加覆盖索引可以提高性能。)

For the statement as written above, MySQL is likely to use the ix_time index on the event_data table, and the eventId_index index on the event_results table. 对于如上书面声明,MySQL是可能使用ix_time上的索引event_data表和eventId_index上的索引event_results表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM