[英]How to groupby 4 columns and rank based on another column?
我有一个 Pandas 数据框df
其中包含从源到目标的源、目标和成本。
SRCLAT SRCLONG DESTLAT DESTLONG PRICE
43.5 47.5 103.5 104 50
43.5 47.5 103.5 104 100
43.5 47.5 103.5 104 100
43.5 30 90 80 300
43.5 30 90 80 400
90 80
我正在尝试对具有相同源到目标坐标的行进行价格的百分位数排名,其中最高百分位数是最低价格,忽略 nans
我想要的输出:
SRCLAT SRCLONG DESTLAT DESTLONG PRICE PERCENTILE
43.5 47.5 103.5 104 50 100% (best price out of 3)
43.5 47.5 103.5 104 100 67% (tied for 2nd out of 3)
43.5 47.5 103.5 104 100 67% (tied for 2nd out of 3)
43.5 30 90 80 300 100% (best out of 2)
43.5 30 90 80 400 50% (worst out of 2)
90 80
我该怎么做?
我尝试将 4 列与
df.groupby([SRCLAT, SRCLONG, DESTLAT, DESTLONG)].size()
获得每个独特组的大小,但我对从这里去哪里感到困惑
使用rank
with method='max'
c = ['SRCLAT', 'SRCLONG', 'DESTLAT', 'DESTLONG']
d = {'pct': True, 'ascending': False, 'method': 'max'}
df.assign(PERCENTILE=df.groupby(c)['PRICE'].rank(**d))
SRCLAT SRCLONG DESTLAT DESTLONG PRICE PERCENTILE
0 43.5 47.5 103.5 104 50 1.000000
1 43.5 47.5 103.5 104 100 0.666667
2 43.5 47.5 103.5 104 100 0.666667
3 43.5 30.0 90.0 80 300 1.000000
4 43.5 30.0 90.0 80 400 0.500000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.