簡體   English   中英

SQL 到 pandas: DENSE_RANK() OVER (PARTITION BY)

[英]SQL to pandas: DENSE_RANK() OVER (PARTITION BY )

我正在嘗試將以下 SQL 代碼轉換為 pandas 等效代碼

SELECT
    t.company,
    t.topic,
    t.statement
FROM
    (
        SELECT
            e.company,
            e.topic,
            e.probability,
            e.distance,
            LOWER(e.statement) AS statement,
            dense_rank() OVER (PARTITION BY e.company,e.topic ORDER BY e.distance DESC) as rank
        FROM
            esg.group_dist e
    ) t
WHERE
    t.rank = 1
    AND t.topic IN ('green energy')
ORDER BY
    company,
    topic,
    rank

我做到了

esg_group_dist['rank'] = esg_group_dist[['company', 'topic', 'probability', 'distance', 'sentence']] \
    .sort_values(by=['distance']) \
    .groupby(['company', 'topic']) \
    

我發現以下 SO 線程應該包含一個解決方案,但我無法成功地為我的用例實現它

Pandas密集排名

謝謝!

groupby.rank

esg_group_dist['rank'] = (esg_group_dist.groupby(['company', 'topic'])
                             ['disance'].rank(method='dense', ascending=False)
                         )

但是,查看您的整個查詢,您似乎正在嘗試提取distance最大的信息最低限度每個組內。 你可以更快地做到這一點

(esg_group_dist[['company', 'topic', 'probability', 'distance', 'sentence']]
     .sort_values('distance')                            # sort values
     .drop_duplicates(['company','topic'], keep='last')  # keep the first rows
     .query('topic=="green energy"')                     # filter topic
)

注意:要查找最小行,請刪除ascending=Falsekeep='last' 還有groupby().idxmin/idxmax()選項`。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM