简体   繁体   English

Flink SQL CURRENT_TIMESTAMP 总是返回相同的值

[英]Flink SQL CURRENT_TIMESTAMP always return the same value

I am using the Flink SQL API in Flink 1.8.我在 Flink 1.8 中使用 Flink SQL API。 I have two stream tables Table1 and Table2.我有两个流表 Table1 和 Table2。

If we define receivedTime as the time where the data was received in a Table, I want to join Table1 and Table2 (on some id ) and keep only the rows where Table1.receivedTime > Table2.receivedTime .如果我们将receivedTime定义为表中接收数据的时间,我想加入 Table1 和 Table2(在某些id )并仅保留Table1.receivedTime > Table2.receivedTime的行。

First, I tried to do this using Flink SQL CURRENT_TIMESTAMP :首先,我尝试使用 Flink SQL CURRENT_TIMESTAMP来做到这一点:

NEW_TABLE1 : SELECT *, CURRENT_TIMESTAMP as receivedTime FROM TABLE1
NEW_TABLE2 : SELECT *, CURRENT_TIMESTAMP as receivedTime FROM TABLE2
RESULT     : SELECT * FROM NEW_TABLE1 JOIN NEW_TABLE2 
                    WHERE NEW_TABLE1.id = NEW_TABLE2.id 
                    AND NEW_TABLE1.receivedTime > NEW_TABLE2.receivedTime

But it look like the CURRENT_TIMESTAMP always return the timestamp of when the query was evaluated.但看起来CURRENT_TIMESTAMP总是返回评估查询时的时间戳。 (It looks like the CURRENT_TIMESTAMP is replaced with the current date at this time and is not a dynamic value). (看起来 CURRENT_TIMESTAMP 此时被当前日期替换,并且不是动态值)。 I find this behavior weird, is it normal ?我觉得这种行为很奇怪,这正常吗?

The second solution I tried is to use Flink's processing time :我尝试的第二种解决方案是使用 Flink 的处理时间:

NEW_TABLE1 : SELECT *, proctime as receivedTime FROM TABLE1
NEW_TABLE2 : SELECT *, proctime as receivedTime FROM TABLE2
RESULT     : SELECT * FROM NEW_TABLE1 JOIN NEW_TABLE2 
                    WHERE NEW_TABLE1.id = NEW_TABLE2.id 
                    AND NEW_TABLE1.receivedTime > NEW_TABLE2.receivedTime

But in this case, it look like the processing time is evaluated at the time the query is executed.但在这种情况下,处理时间似乎是在执行查询时评估的。 And then, in my JOIN query, the two processing times are always equals.然后,在我的 JOIN 查询中,两个处理时间总是相等的。

What is the correct way to do what I want ?做我想做的事情的正确方法是什么?

Flink and Flink SQL support two different notions of time: processing time is the time when an event is being processed (or in other words, the time when your query is being executed), while event time is based on timestamps recorded in the events. Flink 和 Flink SQL 支持两种不同的时间概念:处理时间处理事件的时间(或者换句话说,执行查询的时间),而事件时间基于事件中记录的时间戳。 How this distinction is reflected in the Table and SQL APIs is described here in the documentation . 此处的文档中描述这种区别如何反映在 Table 和 SQL API

To get what you want, you'll first need to arrange for whatever process is creating the data in the two tables to include an event time timestamp in each record.为了获得您想要的结果,您首先需要安排在两个表中创建数据的任何过程,以在每条记录中包含一个事件时间时间戳。 Then you'll need to configure your tables so that Flink SQL is aware of which field in each table is to be used as the rowtime attribute, and you'll also need to specify how watermarking is to be done.然后您需要配置您的表,以便 Flink SQL 知道每个表中的哪个字段将用作 rowtime 属性,您还需要指定如何进行水印

For example, if you are using the SQL client, then your schema might look something like this to indicate that the rideTime field should be used as event time timestamps along with a periodic bounded-out-of-orderness watermarking strategy using a delay of 60 seconds:例如,如果您使用的是 SQL 客户端,那么您的架构可能看起来像这样,以表明应该将rideTime 字段用作事件时间时间戳以及使用延迟 60 的周期性有界无序水印策略秒:

schema:
  - name: rowTime
    type: TIMESTAMP
    rowtime:
      timestamps:
        type: "from-field"
        from: "rideTime"
      watermarks:
        type: "periodic-bounded"
        delay: "60000"

If you're not using the SQL client, see the documentation for examples, whether using DataStream to Table conversion or TableSources .如果您不使用 SQL 客户端,请参阅文档以获取示例,无论是使用DataStream 到 Table 转换还是使用TableSources

Update:更新:

What you'd really prefer, I gather, is to work with ingestion time, but Flink SQL doesn't support ingestion time.我认为你真正喜欢的是使用摄取时间,但 Flink SQL 不支持摄取时间。 You'll have to configure the job to use TimeCharacteristic.EventTime , implement a timestamp extractor and watermark generator, and call assignTimestampsAndWatermarks .您必须将作业配置为使用TimeCharacteristic.EventTime ,实现时间戳提取器和水印生成器,并调用assignTimestampsAndWatermarks

If you don't want to bother with having a timestamp field in each event, your timestamp extractor can look like this:如果您不想为每个事件中的时间戳字段而烦恼,您的时间戳提取器可以如下所示:

AssignerWithPeriodicWatermarks<Event> assigner = new AscendingTimestampExtractor<Event> {
  @Override
  public long extractAscendingTimestamp(Event element) {
    return System.currentTimeMillis();
  }
};

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM