[英]How to obtain row percentages of crosstab from a spark dataframe using python?
I used python code:我使用了 python 代码:
df.stat.crosstab("age", "y").orderBy("age_y").show()
to create a crosstab from a spark dataframe as follows:从火花 dataframe 创建交叉表,如下所示:
However, I cannot find a code to obtain the row percentages.但是,我找不到获取行百分比的代码。 For example, age 18 row percentages should be
5/12 = 41.7%
for 'no' and 7/12 = 58.3%
for 'yes'.例如,对于“否”,18 岁的行百分比应为
5/12 = 41.7%
,对于“是”,应为7/12 = 58.3%
。 The sum of 2 percentages is 100%. 2 个百分比之和为 100%。
May someone advise me in this case?在这种情况下有人可以给我建议吗? Many thanks in advance.
提前谢谢了。
Simply add 2 columns using using withColumn
and your formula to calculate the percentages:只需使用
withColumn
和您的公式添加 2 列来计算百分比:
from pyspark.sql import functions as F
df1 = df.stat.crosstab("age", "y").orderBy("age_y")
result = df1.withColumn(
"no_rp",
F.round(F.col("no") / (F.col("no") + F.col("yes")) * 100, 2)
).withColumn(
"yes_rp",
F.round(F.col("yes") / (F.col("no") + F.col("yes")) * 100, 2)
)
result.show()
#+-----+---+---+-----+------+
#|age_y| no|yes|no_rp|yes_rp|
#+-----+---+---+-----+------+
#| 18| 5| 7|41.67| 58.33|
#| 19| 24| 11|68.57| 31.43|
#| 20| 35| 15| 70.0| 30.0|
#+-----+---+---+-----+------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.