简体   繁体   English

Pandas Dataframe 得到两列之间的最小差异

[英]Pandas Dataframe get minimum difference between two columns

Dataframe Dataframe

d = {'Resource': ['A','A','A','B','B','B'], 'User': ['1','2','3','4','5','6'], 'earliestSlot': [1,2,3,5,4,6], 'latestSlot': [1.2,2.5,3.9,6,5,6.1]}
pd.DataFrame(data=d)

I want to aggregate the data in a way that calculates the minimum difference between latestSlot and earliestSlot of different users of the same resource.我想以一种计算同一资源的不同用户的 latestSlot 和 earliestSlot 之间的最小差异的方式聚合数据。 Basically I want to calculate the minimum idle time of each resource before the next user accesses it.基本上我想计算每个资源在下一个用户访问它之前的最小空闲时间。

Target Data目标数据

Resource资源 MinHeadway最小车头距离
A一种 0.5 0.5
B 0 0

I have the following code but I am sure there is a faster method.我有以下代码,但我确信有更快的方法。

´´´ ´´´

def get_min_headway(resource_id):
    latestSlots = d[d['Resource'] == resource_id].latestSlot
    earliestSlots = d[d['Resource'] == resource_id].earliestSlot
    min_headway = float('inf')
    for time in latestSlots:
        headways = earliestSlots - time
        for headway in headways:
            if headway >= 0:
                if headway < min_headway:
                    min_headway = headway
    return min_headway

d['min_headway'] = d['Resource'].apply(get_min_headway)

´´´ ´´´

It's not exactly clear what your desired outcome is, but by minimum idle time , I assumed you wanted the minimum difference between "latestSlot" of the previous user and the "earliestSlot" of the next user (since an unused time is an idle time ).目前还不清楚你想要的结果是什么,但根据minimum idle time ,我假设你想要前一个用户的“latestSlot”和下一个用户的“earliestSlot”之间的最小差异(因为未使用的时间是空闲时间) . So in that case, you can use the following.因此,在这种情况下,您可以使用以下内容。

We sort by "earliestSlot";我们按“earliestSlot”排序; then groupby "Resource" and do exactly what's explained above.然后groupby “资源”分组并完全按照上面的说明进行操作。

out = (df.sort_values(by='earliestSlot')
       .groupby('Resource')
       .apply(lambda x: (x['earliestSlot']-x['latestSlot'].shift()).min())
       .reset_index()
       .rename(columns={0:'MinHeadway'}))

Output: Output:

  Resource  MinHeadway
0        A         0.5
1        B         0.0

The same result could also be obtained without applying a lambda:不应用 lambda 也可以获得相同的结果:

tmp = df.sort_values(by='earliestSlot')
out = (tmp.groupby('Resource')['latestSlot'].shift()
       .rsub(tmp['earliestSlot'])
       .groupby(x['Resource']).min()
       .reset_index()
       .rename(columns={0:'MinHeadway'}))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM