Pandas 在每個組中獲得最高的不同記錄

Question

在這種情況下，我想帶來每個 id 的最高值，但數量不同。 也就是說，我正在尋找 'id'=1 的 5 個最高值，'id'=2 的 3 個最高值，等等。我有這個代碼，每組只給我帶來固定數量的值。

import random

df = pd.DataFrame({'id':[1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4]})
df['value'] = np.random.randint(0, 99, df.shape[0])
df.groupby(['id']).apply(lambda x: x.nlargest(2,['value'])).reset_index(drop=True)

id = 1 --> 5
id = 2 --> 3
id = 3 --> 2
id = 4 --> 2

Answer 1

IUC：

def my_largest(d):
    # define a dictionary with the specific
    # number of largest rows to grab for
    # each `'id'`
    nlim = {1: 5, 2: 3, 3: 2, 4: 2}

    # When passing a dataframe from a
    # `groupby` to the callable used in
    # the `apply`, Pandas will attach an
    # attribute `name` to that dataframe
    # whose value is the disctint group
    # the dataframe represents.  In this
    # case, that will be the `'id'` because
    # we grouped by `'id'`
    k = nlim[d.name]
    return d.nlargest(k, ['value'])

df.groupby('id').apply(my_largest).reset_index(drop=True)

    id  value
0    1     96
1    1     83
2    1     58
3    1     49
4    1     43
5    2     66
6    2     40
7    2     33
8    3     90
9    3     54
10   4     83
11   4     23

同樣的事情，但具有更通用的功能

現在這個函數可以接受任何規范字典。 此外，我還包含了一個參數，用於在規范字典中不存在'id'的情況下使用默認值。

def my_largest(d, nlrg_dict, nlrg_dflt=5, **kw):
    k = nlrg_dict.get(d.name, nlrg_dflt)
    return d.nlargest(k, **kw)

現在，您可以看到我們在函數外部定義了字典......

nlim = {1: 5, 2: 3, 3: 2, 4: 2}

...並通過apply將其傳遞給函數

df.groupby('id').apply(
    my_largest, nlrg_dict=nlim, columns=['value']
).reset_index(drop=True)

Pandas 在每個組中獲得最高的不同記錄

問題描述

1 個解決方案

解決方案1
2 2020-03-27 16:10:07

Pandas 在每個組中獲得最高的不同記錄

問題描述

1 個解決方案

解決方案1 2 2020-03-27 16:10:07

解決方案1
2 2020-03-27 16:10:07