簡體   English   中英

使用分組依據和計算百分比的不同列的計數

[英]count of distinct columns using group by and calculating percentage

嘗試編寫 sql 查詢:

select indicator, count(distinct tid) as tidcount
from coa
group by indicator

下面是正常的 output

indicator tidcount
M             6219
Z             411424
S             1
I             1

對於 tidcounts,我需要按行百分比 output:

我正在嘗試的查詢如下

spark.sql(""" select indicator ,count(tid) as tidcount , round(round(count(indicator)/sum(count(indicator)) over (), 4)* 100, 4) as PERCENTAGE_TOTALS from coa group by indicator """)
indicator tidcount    Percentage_total
M             6219        0.72
Z             411424      98.78
S             1           .49
I             1           .02

預計 output 是:

indicator tidcount    Percentage_total
M             6219        1.4
Z             411424      98.5
S             1           .0002
I             1           .0002

請建議我是否缺少任何內容,它應該在 spark-sql 或 pyspark 中

使用spark.sql的解決方案

spark.sql(
    """select 
           indicator,
           COUNT(DISTINCT tid) AS tidcount,
           COUNT(DISTINCT tid) / sum(COUNT(DISTINCT tid)) over () * 100 AS PCT 
       from coa 
       group by indicator"""
)

pyspark解決方案

w = Window.partitionBy()

(
    df
    .groupby('indicator')
    .agg(F.count_distinct('tid').alias('tidcount'))
    .withColumn('PCT', F.col('tidcount') / F.sum('tidcount').over(w) * 100)
)

例子

df.show()

+---------+---+
|indicator|tid|
+---------+---+
|        a| 10|
|        a| 25|
|        a|  7|
|        b| 10|
|        b| 10|
|        c| 25|
|        c|  7|
|        d|  1|
|        a|  2|
|        a|  3|
+---------+---+

結果

+---------+--------+-----------------+
|indicator|tidcount|              PCT|
+---------+--------+-----------------+
|        d|       1|11.11111111111111|
|        c|       2|22.22222222222222|
|        b|       1|11.11111111111111|
|        a|       5|55.55555555555556|
+---------+--------+-----------------+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM