简体   繁体   中英

Azure Stream Analytics query with tumbling window using BY TIMESTAMP works fine on test but is off in running job results

I'm working on a IoT project. I've a Raspberry pi which sends smart meter data to an IoT eventhub on Azure. I read the data using an Azure Stream Analytics Job. Meter readings are sent every 10 seconds.

One of the queries I created calculates the electricity consumption per hour using a tumbling window. It calculates the max meterreading from this window minus the max meterreading of the previous window using lag. I am using the timestamp from the IOT device instead of the default arrival time.

WITH onehourwindow AS
(
SELECT
    max(total_low) * 1000 as max_low,
    max(total_high) * 1000 as max_high,
    max(gas) * 1000 as max_gas,
    round(avg(current_consumption), 1) as avg_consumption,
    max(timestamp) as max_timestamp
FROM
    eventhuninputsmartmeter TIMESTAMP BY timestamp
GROUP BY TUMBLINGWINDOW(hour, 1)
)

SELECT 
    (max_low - LAG(max_low) OVER (LIMIT DURATION(hour, 1))) / 1000 as total_consumption_low,
    (max_high - LAG(max_high) OVER (LIMIT DURATION(hour, 1))) / 1000 as total_consumption_high,    
    (max_gas - LAG(max_gas) OVER (LIMIT DURATION(hour, 1))) / 1000 as total_consumption_gas,
    avg_consumption,
    max_timestamp
INTO 
    MeterReadingSQLDB
FROM
    onehourwindow

The query returns the expected results in test. Here is an example of the timestamp and consumption in the test results. As expected the latest timestamp (max) is the last meter reading of the hour at 59 minutes and 50 or so seconds.

|----------------------------|---------------------------|
|      max_timestamp         |    total_consumption_high |
|----------------------------|---------------------------|
|2020-02-28T22:59:52.1794730 |         1.171             |
|2020-02-28T21:59:51.6680430 |         0.500             |
|----------------------------|---------------------------|

When I run the query job I get the following results written to my SQL DB. Now the latest timestamp (max) is not the last meter reading of the (clock) hour, but instead it is at 54 minutes. If I do the consumption calculation by hand I can see that the window used is one hour, it just doesn't start at 00 but at 55 minutes every hour.

|----------------------------|---------------------------|
|      max_timestamp         |    total_consumption_high |
|----------------------------|---------------------------|
|2020-02-28T22:54:52.1300000 |         1.353             |
|2020-02-28T21:54:51.6830000 |         0.510             |
|----------------------------|---------------------------|

How to solve this? I have tried a lot of things but can't seem to fix it. The answer to the post below looked promising, but my events don't come in late, definately not 6 minutes late. Thus still using the default event ordering policies.

Azure Stream Analytics 'TimeStamp By' in query doesn't works on job but works fine on test

Any thoughts on fixing this such that I get the max timestamp of the window at 59 minutes and 50 seconds or so?

Thanks!

Thomas

According to your detailed description:your test job is good and output results as you want,i think the sql works as expected probably.You may consider the causes of stream data source side and the settings of ASA job.

I am using the timestamp from the IOT device instead of the default arrival time.

Based on the statements from ASA TIMESTAMP BY document,custom timestamp may cause delay result from network or other factors.

在此处输入图片说明

I would suggest you setting offsize parameter in Windows Function to balance this gap.

Because you are processing based on the timestamp in your event payload, it is likely that the event order policies you have for you job is filtering out some inputs which might be causing this problem. You can look at 2 metrics to understand if this is the case:

  1. Late Input Events
  2. Out-of-Order Events

If there are in fact input events that are late or out-of-order, ensure that your event ordering policies are configured appropriately to either include or drop such events. You should also try to see samples of the input data to see the timestamps are as expected.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM