[英]Creating a ranking in Python from a given value by id
I have this dataset:我有这个数据集:
dic = {'id':[1,1,1,1,1,2,2,2,2], 'sales': [100.00, 200.00, 300.00, 400.00, 500.00, 100.00, 200.00, 300.00, 400.00], 'year_month': [202201, 202202, 0, 202204, 202205, 202201, 202202, 202203, 0]}
df = pd.DataFrame(dic)
Output: Output:
id sales year_month
0 1 100.0 202201
1 1 200.0 202202
2 1 300.0 0
3 1 400.0 202204
4 1 500.0 202205
5 2 100.0 202201
6 2 200.0 202202
7 2 300.0 202203
8 2 400.0 0
I want to increases 1 after year_month zero and decreases 1 before zero, per ID, like that:我想在 year_month 零之后增加 1,并在每个 ID 之前减少 1,如下所示:
id sales year_month rank
0 1 100.0 202201 -2
1 1 200.0 202202 -1
2 1 300.0 0 0
3 1 400.0 202204 1
4 1 500.0 202205 2
5 2 100.0 202201 -3
6 2 200.0 202202 -2
7 2 300.0 202203 -1
8 2 400.0 0 0
How do I Create the rank column?如何创建排名列?
I came up with this.我想出了这个。 It seems more complicated than it's supposed to do, but it still works
它似乎比它应该做的要复杂,但它仍然有效
difference = [(df[(df.id == id) & (df.year_month == year)].index - df[(df.id == id) & (df.year_month == 0)].index)[0] for id in df.id.unique() for year in df[df.id == id].year_month]
df['new'] = difference
which indeed gives这确实给了
0 1 100.0 202201 -2
1 1 200.0 202202 -1
2 1 300.0 0 0
3 1 400.0 202204 1
4 1 500.0 202205 2
5 2 100.0 202201 -3
6 2 200.0 202202 -2
7 2 300.0 202203 -1
8 2 400.0 0 0
Given default index, sorted vals in id
, and sorted vals in year_month
( 0
replacing a sorted val & always min
for each group), you can simply do:给定默认索引,
id
中的排序值和year_month
中的排序值( 0
替换每个组的排序值和始终为min
),您可以简单地执行以下操作:
df['rank'] = df.index - df.groupby('id')['year_month'].transform('idxmin')
print(df)
id sales year_month rank
0 1 100.0 202201 -2
1 1 200.0 202202 -1
2 1 300.0 0 0
3 1 400.0 202204 1
4 1 500.0 202205 2
5 2 100.0 202201 -3
6 2 200.0 202202 -2
7 2 300.0 202203 -1
8 2 400.0 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.