简体   繁体   English

熊猫groupby并在以每组1开头的组中排名

[英]pandas groupby and rank within groups that start with 1 for each group

I have a dataframe: 我有一个数据框:

import pandas as pd 将熊猫作为pd导入

df = pd.DataFrame([[1, 'a'],
                    [1, 'a'],
                    [1, 'b'],
                    [1, 'a'],
                    [2, 'a'],
                    [2, 'b'],
                    [2, 'a'],
                    [2, 'b'],
                    [3, 'b'],
                    [3, 'a'],
                    [3, 'b'],

                   ], columns=['session', 'issue'])
df

在此处输入图片说明

I would like to rank issues within sessions. 我想在会议中对问题进行排名。 I tried with: 我尝试过:

df.groupby(['session', 'issue']).size().rank(ascending=False, method='dense')

session  issue
1        a        1.0
         b        3.0
2        a        2.0
         b        2.0
3        a        3.0
         b        2.0
dtype: float64

What I need is result like this one: 我需要的是这样的结果:

  1. for group session=1, there are three a issues and one b issue, so for group 1, ranks are a = 1 and b = 2 对于小组会议= 1,有3个问题和1个b问题,因此对于小组1,排名是a = 1和b = 2
  2. for group session=2, both ranks are equal so their rank should be the same = 1 对于小组会话= 2,两个等级均相等,因此它们的等级应相同= 1
  3. for group session=3, there are to b issues and one a issue, so ranks should be b=1 and a=2 对于小组会议= 3,有b个问题,一个a问题,因此等级应为b = 1和a = 2

Anyway, why for each group ranks don't start from 1, 2, 3...? 无论如何,为什么每个组的排名都不从1、2、3 ...开始?

Use DataFrameGroupBy.rank by first level of MultiIndex ( session ): 通过MultiIndexsession )的第一级使用DataFrameGroupBy.rank

s = (df.groupby(['session', 'issue'])
        .size()
        .groupby(level=0)
        .rank(ascending=False, method='dense'))
print (s)
session  issue
1        a        1.0
         b        2.0
2        a        1.0
         b        1.0
3        a        2.0
         b        1.0
dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM