[英]How to count frequency of each categorical variable in a column in pyspark dataframe?
Say I have a pyspark dataframe: 假设我有一个pyspark数据框:
df.show()
+-----+---+
| x | y|
+-----+---+
|alpha| 1|
|beta | 2|
|gamma| 1|
|alpha| 2|
+-----+---+
I want to count how many occurrence alpha
, beta
and gamma
there are in column x
. 我想计算x
栏中有多少个alpha
, beta
和gamma
出现。 How do I do this in pyspark? 如何在pyspark中做到这一点?
使用pyspark.sql.DataFrame.cube()
:
df.cube("x").count().show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.