簡體   English   中英

在 PySpark Dataframe 中計算零次出現

[英]Count zero occurrences in PySpark Dataframe

如何計算每個PySpark Dataframe's行中0s的出現次數?

我想要這個結果,請注意n0列有按行計數:

+--------+-----+-----+----+-----+---+
|center  |var1 |var2 |var3|var4 |n0 |
+--------+-----+-----+----+-----+---+
|center_a|0    |1    |0   |0    |3  |
|center_b|1    |1    |2   |4    |0  |
|center_c|1    |0    |1   |0    |2  |
+--------+-----+-----+----+-----+---+ 

我試過這段代碼,但沒有成功。

x['n0'] = (x == 0).sum(axis=1)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-63-8a95da0a1861> in <module>()
----> 1 (x == 0).sum(axis=1)

AttributeError: 'bool' object has no attribute 'sum'

逐行0檢查和求和:

from pyspark.sql import functions as F

df.withColumn("n0", sum(F.when(df[col] == 0, 1).otherwise(0) for col in df.columns)).show()
+--------+----+----+----+----+---+
|  center|var1|var2|var3|var4| n0|
+--------+----+----+----+----+---+
|center_a|   0|   1|   0|   0|  3|
|center_b|   1|   1|   2|   4|  0|
|center_c|   1|   0|   1|   0|  2|
+--------+----+----+----+----+---+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM