How does CASE WHEN THEN treat NULL values

Question

Assuming that I have the following piece of code in the SELECT clause which is being executed on Spark:

...
MEAN(CASE
         WHEN (col1 = 'A'
               AND (col3 = 'A' OR col4 = 'B')) THEN col2
     END) AS testing,
...

What would be the output of this query when col2 is NULL ? Are the rows containing col2=NULL be ignored by the MEAN function?

Answer 1

The result will be NULL . It will have the type of col2 -- this might matter in some databases (or if you are saving the result to a table).

What is the MEAN() function? To calculate the average, use AVG() . This is the standard function for calculating averages in SQL.

Answer 2

Disclaimer - don't know Apache Spark!

If col2 is null, it is not included in the average, unless all the matching records are null.

Answer 3

I believe you meant AVG . It will ignore NULL values. So if the result of case expression is:

100
200
300
NULL

Then the result would be (100 + 200 + 300) / 3 = 200 instead of (100 + 200 + 300) / 4 . And in case the result of the case expression is:

NULL
NULL
NULL
NULL

Then the result will be NULL instead of 0.