简体   繁体   English

如何以累积方式从唯一的 groupby 组合生成列?

[英]How to generate a column from the unique groupby combinations in a cumulative way?

My data looks like this:我的数据如下所示:

df_dict = {
    'Year' : [2021, 2021, 2021, 2021, 2022, 2022, 2022, 2022],
    'Week of Year' : [1, 1, 2, 2, 10, 10, 11, 11]
}
  
df = pd.DataFrame(df_dict)

How can I generate a new column, say Week Order that shows the unique Year, Week of Year combinations in a cumulative way.我如何生成一个新列,比如以累积方式显示独特的Year, Week of Year组合的Week Order The resulting data set will be like this:结果数据集将是这样的:

    Year    Week of Year    Week Order
0   2021    1               1
1   2021    1               1
2   2021    2               2
3   2021    2               2
4   2022    10              3
5   2022    10              3
6   2022    11              4
7   2022    11              4

You can use pandas.factorize :您可以使用pandas.factorize

df['Week Order'] = df.agg(tuple, axis=1).factorize()[0]+1

# Output: #Output:

print(df)

   Year  Week of Year  Week Order
0  2021             1          1
1  2021             1          1
2  2021             2          2
3  2021             2          2
4  2022            10          3
5  2022            10          3
6  2022            11          4
7  2022            11          4

here is one way to do it这是一种方法

df['week order']=1
df['week order']=df['week order'].mask(df.duplicated()).cumsum().ffill().astype(int) 
df

OR要么

df['week order'] = (df.duplicated()).cumsum().shift(-1).ffill().astype(int)
    Year    Week of Year    week order
0   2021               1        1
1   2021               1        1
2   2021               2        2
3   2021               2        2
4   2022              10        3
5   2022              10        3
6   2022              11        4
7   2022              11        4

Another option, sort_values + duplicated + cumsum , ie every non duplicated Year + Week increases the order by one:另一种选择, sort_values + duplicated + cumsum ,即每个非重复的 Year + Week 将订单增加一个:

cols = ['Year', 'Week of Year']
df['Week Order'] = (~df.sort_values(cols).duplicated(cols)).cumsum()

df
   Year  Week of Year  Week Order
0  2021             1           1
1  2021             1           1
2  2021             2           2
3  2021             2           2
4  2022            10           3
5  2022            10           3
6  2022            11           4
7  2022            11           4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM