简体   繁体   English

SQL 到 pandas: DENSE_RANK() OVER (PARTITION BY)

[英]SQL to pandas: DENSE_RANK() OVER (PARTITION BY )

I am trying to translate the following piece of SQL code to a pandas equivalent我正在尝试将以下 SQL 代码转换为 pandas 等效代码

SELECT
    t.company,
    t.topic,
    t.statement
FROM
    (
        SELECT
            e.company,
            e.topic,
            e.probability,
            e.distance,
            LOWER(e.statement) AS statement,
            dense_rank() OVER (PARTITION BY e.company,e.topic ORDER BY e.distance DESC) as rank
        FROM
            esg.group_dist e
    ) t
WHERE
    t.rank = 1
    AND t.topic IN ('green energy')
ORDER BY
    company,
    topic,
    rank

I got as far as我做到了

esg_group_dist['rank'] = esg_group_dist[['company', 'topic', 'probability', 'distance', 'sentence']] \
    .sort_values(by=['distance']) \
    .groupby(['company', 'topic']) \
    

I found the following SO thread that should contain a solution but I can't manage to successfully implement it for my usecase我发现以下 SO 线程应该包含一个解决方案,但我无法成功地为我的用例实现它

Pandas DENSE RANK Pandas密集排名

Thanks!谢谢!

There is groupby.rank :groupby.rank

esg_group_dist['rank'] = (esg_group_dist.groupby(['company', 'topic'])
                             ['disance'].rank(method='dense', ascending=False)
                         )

However, looking at your entire query, it looks like you're trying to extract info where distance is maximum但是,查看您的整个查询,您似乎正在尝试提取distance最大的信息minimum最低限度within each group.每个组内。 You can do so faster with你可以更快地做到这一点

(esg_group_dist[['company', 'topic', 'probability', 'distance', 'sentence']]
     .sort_values('distance')                            # sort values
     .drop_duplicates(['company','topic'], keep='last')  # keep the first rows
     .query('topic=="green energy"')                     # filter topic
)

Note : to find minimum rows, remove ascending=False and keep='last' .注意:要查找最小行,请删除ascending=Falsekeep='last' Also there is groupby().idxmin/idxmax() option`.还有groupby().idxmin/idxmax()选项`。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM