Spark DataFrame - Get the average of two column combinations

Question

How I get the average of Price for each combination of 2 columns?

My DataFrame:

relevantTable = df.select(df['Price'], df['B'], df['A'])

looks like:

+-------+------------+------------------+
|  Price|     B      |          A       |
+-------+------------+------------------+
| 0.2947|   i3.xlarge|                 x|
|  0.105|    c4.large|                 x|
| 0.2179|   m4.xlarge|                 x|
| 2.2534| m4.10xlarge|                 x|
| 2.1801| m4.10xlarge|                 x|
|  0.108|    r4.large|                 x|
|  0.108|    r4.large|                 x|
| 0.0213|    i3.large|                 y|
| 0.5572|  i2.4xlarge|                 y|
| 0.1542|  c4.4xlarge|                 y|
| 0.3624| m4.10xlarge|                 y|
| 0.3596| m4.10xlarge|                 y|
|   0.11|    m4.large|                 x|
| 0.4436|  m4.2xlarge|                 x|
| 0.1458|  m4.2xlarge|                 y|

... and so on real huge set

What would be a simple and scalable solution to get the average for all combinations of A and B ?

Answer 1

How about:

df.groupBy("A", "B").avg("Price")

or if you want to include aggregates by single column:

df.cube("A", "B").avg("Price")

Spark DataFrame - Get the average of two column combinations

Question

1 answers

solution1
2 ACCPTED 2017-12-07 19:38:20

Spark DataFrame - Get the average of two column combinations

Question

1 answers

solution1 2 ACCPTED 2017-12-07 19:38:20

solution1
2 ACCPTED 2017-12-07 19:38:20