Performance for selecting multiple out-params from deterministic SUDF

Question

I am about to test the deterministic flag for SUDFs that return multiple values (follow up question to this ). The DETERMINISTIC flag should cache the results for same inputs to improve performance. However, I can't figure out how to do this for multiple return values. My SUDF looks as following:

CREATE FUNCTION DET_TEST(IN col BIGINT)
RETURNS a int, b int, c int, d int DETERMINISTIC
AS BEGIN
  a = 1;
  b = 2;
  c = 3;
  d = 4;
END;

Now when I execute the following select statements:

1) select DET_TEST(XL_ID).a from XL;
2) select DET_TEST(XL_ID).a, DET_TEST(XL_ID).b from XL;
3) select DET_TEST(XL_ID).a, DET_TEST(XL_ID).b,
          DET_TEST(XL_ID).c, DET_TEST(XL_ID).d from XL;

I get the corresponding server processing times:

1) Statement 'select DET_TEST(XL_ID).a from XL'
   successfully executed in 1.791 seconds  (server processing time: 1.671 seconds)
2) Statement 'select DET_TEST(XL_ID).a, DET_TEST(XL_ID).b from XL' 
   successfully executed in 2.415 seconds  (server processing time: 2.298 seconds)
3) Statement 'select DET_TEST(XL_ID).a, DET_TEST(XL_ID).b, DET_TEST(XL_ID).c, ...' 
   successfully executed in 4.884 seconds  (server processing time: 4.674 seconds)

As you can see the processing time increases even though I call the function with the same input. So is this a bug or is it possible that only a single value is stored in cache but not the whole list of return parameters?

I will try out MAP_MERGE next.

Answer 1

I did some tests with your scenario and can confirm that the response time goes up considerably with every additional result parameter retrieved from the function. The DETERMINISTIC flag helps here, but not as much as one would hope for since only the result value for distinct input parameters are saved. So, if the same value(s) are entered into the function and it has been executed before with these value(s) then the result is taken from a cache. This cache, however, is only valid during a statement. That means: for repeated function evaluations with the same value during a single statement, the DETERMINISTIC function can skip the evaluation of the function and reuse the result.

This doesn't mean, that all output parameters get evaluated once and are then available for reuse. Indeed, with different output parameters, HANA practically has to executed different evaluation graphs. In that sense, asking for different parameters is closer to execute different functions than, say, calling a matrix operation.

So, sorry about raising the hope for a massive improvement with DETERMINISTIC functions in the other thread. At least for your use case, that doesn't really help a lot.

Concerning the MAP_MERGE function, it's important to see that this really helps with horizontal partitioning of data, like one would have it in eg classic map-reduce situations. The use case you presented is actually not doing that but tries to create multiple results for a single input.

During my tests, I actually found it quicker to just define four independent functions and call those in my SELECT statement against my source table.

Depending on the complexity of the calculations you like to do and the amount of data, I probably would look into using the Application Function Library (AFL) SDK for SAP HANA. For details on this, one has to check the relevant SAP notes.

Performance for selecting multiple out-params from deterministic SUDF

Question

1 answers

solution1
0 ACCPTED 2017-06-13 04:39:32

Performance for selecting multiple out-params from deterministic SUDF

Question

1 answers

solution1 0 ACCPTED 2017-06-13 04:39:32

solution1
0 ACCPTED 2017-06-13 04:39:32