PySpark Pandas: Groupby Identifying Column and Sum Two Different Columns to Create New 2x2 Table

Question

I have the following sample dataset:

groupby prevoius    current
A       1           1
A       0           1
A       0           0
A       1           0
A       1           1
A       0           1

I want to create the following table by summing "previous" and "current" columns.

previous_total   current_total
3                4

I have tried all combinations of groupby with .agg and to try and achieve the table above, but wasn't able to get anything to run successfully.

I also know how to do this in Python Pandas but not Pyspark.

Answer 1

Use the sum and groupBy methods:

>>> df.groupBy().sum().select(col("sum(previous)").alias("previous_total"), col("sum(current)").alias("current_total")).show()
+--------------+--------------+
|previous_total|current_total)|
+--------------+--------------+
|             3|             4|
+--------------+--------------+

Additionally, you could register your dataframe as a temp table and use Spark SQL to query it, which will give identical results:

>>> df.registerTempTable("df")
>>> spark.sql("select sum(previous) as previous_total, sum(current) as current_total from df").show()

Answer 2

You can use and sum :

from pyspark.sql.functions import sum

df_result = df.select(sum("previous").alias("previous_total"),
                      sum("current").alias("current_total"))

df_result.show()

+--------------+--------------+
|previous_total|current_total)|
+--------------+--------------+
|             3|             4|
+--------------+--------------+

PySpark Pandas: Groupby Identifying Column and Sum Two Different Columns to Create New 2x2 Table

Question

2 answers

solution1
1 2018-10-29 22:52:49

solution2
0 2018-10-30 08:15:29

PySpark Pandas: Groupby Identifying Column and Sum Two Different Columns to Create New 2x2 Table

Question

2 answers

solution1 1 2018-10-29 22:52:49

solution2 0 2018-10-30 08:15:29

solution1
1 2018-10-29 22:52:49

solution2
0 2018-10-30 08:15:29