简体   繁体   English

如何计算pyspark数据框中一列中每个分类变量的频率?

[英]How to count frequency of each categorical variable in a column in pyspark dataframe?

Say I have a pyspark dataframe: 假设我有一个pyspark数据框:

df.show()
+-----+---+
|  x  |  y|
+-----+---+
|alpha|  1|
|beta |  2|
|gamma|  1|
|alpha|  2|
+-----+---+

I want to count how many occurrence alpha , beta and gamma there are in column x . 我想计算x栏中有多少个alphabetagamma出现。 How do I do this in pyspark? 如何在pyspark中做到这一点?

使用pyspark.sql.DataFrame.cube()

df.cube("x").count().show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何为多列在pyspark数据框中的一列中计算每个分类变量的频率? - How do I count frequency of each categorical variable in a column in pyspark dataframe for multiple columns? Pyspark DataFrame - 如何将一列从分类值转换为整数? - Pyspark DataFrame - How to convert one column from categorical values to int? 自动计算python pandas中分类变量每列有多少个类别 - Count how many categories each column of categorical variable in python pandas automatically 每个分类变量的计数图 - count plot for each categorical variable dataframe 列中字符串的计数频率 - dataframe count frequency of a string in a column 如何获取列值的频率计数,按另一列中的分类值排序 - How to get frequency count for a column value, sorted by aa categorical value in another column 查找 dataframe 中所有分类列中每个值的频率 - Finding frequency of each value in all categorical columns across a dataframe 如何计算dataframe列中重复值的频率? - How can I count the frequency of repeated values in dataframe column? 如何从具有频率计数的值创建数据框列? - How to create a dataframe column from values with frequency count? 如何计算 pyspark dataframe 中多个分类列中出现的唯一数据 - How to count unique data occuring in multiple categorical columns from a pyspark dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM