[英]How to output the count of all pairwise combination of two binary columns from a Spark dataframe even when it is zero count?
How to output the count of all pairwise combination of two binary(0/1) columns from a Spark dataframe even when the count is zero?即使计数为零,如何 output 计算来自 Spark dataframe 的两个二进制(0/1)列的所有成对组合的计数?
final_sdf.groupBy('actual', 'prediction').count().show()
Current output is当前 output 是
But my desired output includes the zero groups as below.但我想要的 output 包括如下零组。
Okay, the idea to do this, is first create the missing binary rows, allocate value count to 0, filter, then append the dataset.好的,这样做的想法是首先创建丢失的二进制行,将值计数分配给 0,过滤,然后 append 数据集。
Let's assume our main dataset is called df
and looks as below:假设我们的主数据集名为df
,如下所示:
+------+----------+-----+
|actual|prediction|count|
+------+----------+-----+
|1 |1.0 |944 |
|0 |1.0 |208 |
+------+----------+-----+
First, let's create a column called array
for example with value abs(actual - 1)
, this way, we get the missing binary value.首先,让我们创建一个名为array
的列,例如值为abs(actual - 1)
,这样,我们就得到了缺失的二进制值。 Then, we explode that back to prediction and we drop our array
column.然后,我们将其分解回预测并删除我们的array
列。
val df2 = df1
.withColumn("array", array(col("actual"), abs(col("actual") - 1)))
.withColumn("prediction", explode(col("array")))
.drop("array")
+------+----------+-----+
|actual|prediction|count|
+------+----------+-----+
|1 |1 |944 |
|1 |0 |944 |
|0 |0 |208 |
|0 |1 |208 |
+------+----------+-----+
Then we do an anti
join ( df1
and df2
) and overwrite count
value with 0.然后我们进行anti
连接( df1
和df2
)并用 0 覆盖count
数值。
val df3 = df2.join(df1, Seq("actual", "prediction", "count"), "anti")
.withColumn("count", lit(0))
+------+----------+-----+
|actual|prediction|count|
+------+----------+-----+
|1 |0 |0 |
|0 |0 |0 |
+------+----------+-----+
Finally, we union these two dataframes:最后,我们合并这两个数据框:
df1.union(df3).show(10)
+------+----------+-----+
|actual|prediction|count|
+------+----------+-----+
| 1| 1.0| 944|
| 0| 1.0| 208|
| 1| 0.0| 0|
| 0| 0.0| 0|
+------+----------+-----+
which is I hope what you need!这就是我希望你所需要的!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.