简体   繁体   中英

apache flink: how to interpret DataStream.print output?

I am a newbie in Flink, trying to understand how it is to be used most effectively.

I am trying with Window APIs, reading from a CSV file. The rows read are transformed into a case class, thus

case class IncomingDataUnit (
sensorUUID: String, radiationLevel: Int,photoSensor: Float,
humidity: Float,timeStamp: Long, ambientTemperature: Float)
  extends Serializable {

}

and, this is how I read the rows:

env.readTextFile(inputPath).map(datum => {
      val fields = datum.split(",")
      IncomingDataUnit(
        fields(0),              // sensorUUID
        fields(1).toInt,        // radiationLevel
        fields(2).toFloat,      // photoSensor
        fields(3).toFloat,      // humidity
        fields(4).toLong,       // timeStamp
        fields(5).toFloat       // ambientTemperature
      )
    })

Later, using a simple window, I am trying to print the maximum ambient temperature , thus:

env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime)

val readings =
      readIncomingReadings(env,"./sampleIOTTiny.csv")
      .map(e => (e.sensorUUID,e.ambientTemperature))
      .timeWindowAll(Time.of(5,TimeUnit.MILLISECONDS))
      .trigger(CountTrigger.of(5))
      .evictor(CountEvictor.of(4))
      .max(1)

readings.print

Output contains these (from a bunch of DEBUG log statements):

1> (probe-987f2cb6,29.43)
1> (probe-987f2cb6,29.43)
3> (probe-dccefede,30.02)
3> (probe-42a9ddca,22.07)
2> (probe-df2d4cad,22.87)
2> (probe-20c609fb,27.62)
4> (probe-dccefede,30.02)

What I am trying to understand is that how does one interpret this? What do repeated 1>s represent?

What is also baffling me is that probe-987f2cb6 doesn't correspond to ambient temperature 29.43 in my data set. It corresponds to a different value (14.72 to be exact).

Just for reference, here is the dataset:

probe-f076c2b0,201,842.53,75.5372,1448028160,29.37
probe-dccefede,199,749.25,78.6057,1448028160,27.46
probe-f29f9662,199,821.81,81.7831,1448028160,22.35
probe-5dac1d9f,195,870.71,83.1028,1448028160,15.98
probe-6c75cfbe,198,830.06,82.5607,1448028160,30.02
probe-4d78b545,204,778.42,78.412,1448028160,25.92
probe-400c5cdf,204,711.65,73.585,1448028160,22.18
probe-df2d4cad,199,820.8,72.936,1448028161,16.18
probe-f4ef109e,199,785.68,77.5647,1448028161,16.36
probe-3fac3350,200,720.12,78.2073,1448028161,19.19
probe-42a9ddca,193,819.12,74.3712,1448028161,22.07
probe-252a5bbd,197,710.32,80.6072,1448028161,14.64
probe-987f2cb6,200,750.4,76.0533,1448028161,14.72
probe-24444323,197,816.06,84.0816,1448028161,4.405
probe-6dd6fdc4,201,717.64,78.4031,1448028161,29.43
probe-20c609fb,204,804.37,84.5243,1448028161,22.87
probe-c027fdc9,195,858.61,81.7682,1448028161,24.47
probe-2c6cd3de,198,826.96,85.26,1448028162,18.99
probe-960906ca,197,797.63,77.4359,1448028162,27.62

There's probably a lot that I am missing. Please prod me.

Forgot to mention: I am using Flink 0.10.0.

The ">X" represent the task ID of the parallel task that does print the result tuple. I am just wondering why the output shows values 1 to 4 -- as you are using a non-parallel window (the data stream is not partitioned via .keyBy() ) and I would expect the print to be chained and non-parallel either. But maybe it is not and 4 parallel print tasks are running.

About your result: if the window fires, the max over field 1 is computed over all tuples and the head tuple's value of the window is used for field 0. If you want to return the complete tuple that contains the max value, you can use maxBy() instead of max() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM