加快在大 pandas dataframe 中搜索最近的上限值和下限值

Question

My dataframe looks similar to this example below (just with way more entries).我的 dataframe 看起来类似于下面的这个例子（只是有更多的条目）。 I want to obtain the nearest upper and lower number for a given value, for each group.我想为每个组获取给定值的最接近的上限和下限。

For example for a value of 13. I would like to obtain a new dataframe similar to:例如值为 13。我想获得一个新的 dataframe 类似于：

I already tried the solution from Ivo Merchiers in How do I find the closest values in a Pandas series to an input number?我已经在如何找到 Pandas 系列中与输入数字最接近的值中尝试了 Ivo Merchiers 的解决方案？ using groupby and apply to run it for the different groups.使用 groupby 并申请为不同的组运行它。

def find_neighbours(value):
  exactmatch=df[df.num==value]
  if !exactmatch.empty:
      return exactmatch.index
  else:
      lowerneighbour_ind = df[df.num<value].num.idxmax()
      upperneighbour_ind = df[df.num>value].num.idxmin()
      return [lowerneighbour_ind, upperneighbour_ind]

df=df.groupby('a').apply(find_neighbours, 13)

But since my dataset has around 16 million lines this procedure takes extremely long.但是由于我的数据集大约有 1600 万行，所以这个过程需要很长时间。 Is there possibly a faster way to obtain a solution?是否有更快的方法来获得解决方案？

Edit Thanks for your answers.编辑感谢您的回答。 I forgot to add some info.我忘了添加一些信息。 If a close number appears multiple times I would like to have all lines transfered to the new dataframe.如果多次出现关闭数字，我希望将所有行转移到新的 dataframe。 And when there is only one upper (lower) and no lower (upper) neighbour, this lines should be ignored.而当只有一个上（下）邻居而没有下（上）邻居时，这条线应该被忽略。

Leads for 13 to this:导致 13 到此：

Thanks for your help!谢谢你的帮助！

Answer 1

Yes we can speed it up是的，我们可以加快速度

v=13

s=(df.b-v)
t=s.abs().groupby([df.a,np.sign(s)]).transform('min')
df1=df.loc[s.abs()==t]
df1=df1[df1.b.sub(v).groupby(df.a).transform('nunique')>1]
df1
Out[102]: 
      a   b
1   600  12
2   600  15
5   700  11
6   700  19
9   900  12
10  900  14
11  900  14

Answer 2

try this尝试这个

def neighbours(x):
    d = (df.b-x)
    return df.loc[[d[d==d[d>0].min()].index[0], d[d==d[d<0].max()].index[0]]]
neighbours(13)

加快在大 pandas dataframe 中搜索最近的上限值和下限值

问题描述

2 个解决方案

解决方案1
3 已采纳 2020-08-14 18:27:44

解决方案2
1 2020-08-14 18:36:30

加快在大 pandas dataframe 中搜索最近的上限值和下限值

问题描述

2 个解决方案

解决方案1 3 已采纳 2020-08-14 18:27:44

解决方案2 1 2020-08-14 18:36:30

解决方案1
3 已采纳 2020-08-14 18:27:44

解决方案2
1 2020-08-14 18:36:30