Pandas Dataframe 得到两列之间的最小差异

Question

Dataframe

d = {'Resource': ['A','A','A','B','B','B'], 'User': ['1','2','3','4','5','6'], 'earliestSlot': [1,2,3,5,4,6], 'latestSlot': [1.2,2.5,3.9,6,5,6.1]}
pd.DataFrame(data=d)

我想以一种计算同一资源的不同用户的 latestSlot 和 earliestSlot 之间的最小差异的方式聚合数据。 基本上我想计算每个资源在下一个用户访问它之前的最小空闲时间。

目标数据

资源	最小车头距离
一种	0.5
乙	0

我有以下代码，但我确信有更快的方法。

´´´

def get_min_headway(resource_id):
    latestSlots = d[d['Resource'] == resource_id].latestSlot
    earliestSlots = d[d['Resource'] == resource_id].earliestSlot
    min_headway = float('inf')
    for time in latestSlots:
        headways = earliestSlots - time
        for headway in headways:
            if headway >= 0:
                if headway < min_headway:
                    min_headway = headway
    return min_headway

d['min_headway'] = d['Resource'].apply(get_min_headway)

´´´

Answer 1

目前还不清楚你想要的结果是什么，但根据minimum idle time ，我假设你想要前一个用户的“latestSlot”和下一个用户的“earliestSlot”之间的最小差异（因为未使用的时间是空闲时间） . 因此，在这种情况下，您可以使用以下内容。

我们按“earliestSlot”排序； 然后groupby “资源”分组并完全按照上面的说明进行操作。

out = (df.sort_values(by='earliestSlot')
       .groupby('Resource')
       .apply(lambda x: (x['earliestSlot']-x['latestSlot'].shift()).min())
       .reset_index()
       .rename(columns={0:'MinHeadway'}))

Output：

  Resource  MinHeadway
0        A         0.5
1        B         0.0

不应用 lambda 也可以获得相同的结果：

tmp = df.sort_values(by='earliestSlot')
out = (tmp.groupby('Resource')['latestSlot'].shift()
       .rsub(tmp['earliestSlot'])
       .groupby(x['Resource']).min()
       .reset_index()
       .rename(columns={0:'MinHeadway'}))

Pandas Dataframe 得到两列之间的最小差异

问题描述

Dataframe

目标数据

1 个解决方案

解决方案1
1

Pandas Dataframe 得到两列之间的最小差异

问题描述

Dataframe

目标数据

1 个解决方案

解决方案1 1

解决方案1
1