[英]How to groupby 4 columns and rank based on another column?
我有一個 Pandas 數據框df
其中包含從源到目標的源、目標和成本。
SRCLAT SRCLONG DESTLAT DESTLONG PRICE
43.5 47.5 103.5 104 50
43.5 47.5 103.5 104 100
43.5 47.5 103.5 104 100
43.5 30 90 80 300
43.5 30 90 80 400
90 80
我正在嘗試對具有相同源到目標坐標的行進行價格的百分位數排名,其中最高百分位數是最低價格,忽略 nans
我想要的輸出:
SRCLAT SRCLONG DESTLAT DESTLONG PRICE PERCENTILE
43.5 47.5 103.5 104 50 100% (best price out of 3)
43.5 47.5 103.5 104 100 67% (tied for 2nd out of 3)
43.5 47.5 103.5 104 100 67% (tied for 2nd out of 3)
43.5 30 90 80 300 100% (best out of 2)
43.5 30 90 80 400 50% (worst out of 2)
90 80
我該怎么做?
我嘗試將 4 列與
df.groupby([SRCLAT, SRCLONG, DESTLAT, DESTLONG)].size()
獲得每個獨特組的大小,但我對從這里去哪里感到困惑
使用rank
with method='max'
c = ['SRCLAT', 'SRCLONG', 'DESTLAT', 'DESTLONG']
d = {'pct': True, 'ascending': False, 'method': 'max'}
df.assign(PERCENTILE=df.groupby(c)['PRICE'].rank(**d))
SRCLAT SRCLONG DESTLAT DESTLONG PRICE PERCENTILE
0 43.5 47.5 103.5 104 50 1.000000
1 43.5 47.5 103.5 104 100 0.666667
2 43.5 47.5 103.5 104 100 0.666667
3 43.5 30.0 90.0 80 300 1.000000
4 43.5 30.0 90.0 80 400 0.500000
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.