[英]efficient way to find the max displacement between a repeating integer in a pandas dataframe
我想以有效的方式找到相同整数的两次连续出现之间的最大差异。 我可以尝试一个循环,但我的数据集超过 100,000 行,这非常麻烦。 有没有人有什么建议?
data = np.random.randint(5,30,size=100000)
df = pd.DataFrame(data, columns=['random_numbers'])
示例:在我的示例中,连续出现的5
之间的最大差异是29 - 5 = 24
。
df.loc[79:93].values
array([[ 5],
[17],
[ 7],
[15],
[25],
[23],
[24],
[22],
[21],
[29],
[25],
[28],
[13],
[19],
[ 5]])
你可以试试这个:
g = df['random_numbers'].eq(5).cumsum()
df.groupby(g).max() - 5
数据较小的输出:
data = np.random.randint(5,30,size=30)
# array([28, 19, 29, 22, 10, 18, 13, 14, 25, 24, 21, 24, 10, 20, 20, 5, 23,
# 8, 29, 22, 24, 24, 24, 19, 12, 5, 6, 14, 5, 15])
df = pd.DataFrame(data, columns=['rand_nums'])
g = df['rand_nums'].eq(5).cumsum()
# Look at both df and g
# print(pd.concat([df, g], axis=1) # just for explanation.
rand_nums rand_nums
0 28 0 ⟶ group 1 starts here
1 19 0
2 29 0
3 22 0
4 10 0
5 18 0
6 13 0
7 14 0 # we take max from here i.e. 29.
8 25 0
9 24 0
10 21 0
11 24 0
12 10 0
13 20 0
14 20 0 ⟶ group1 ends here
15 5 1 ⟶ group2 starts here
16 23 1
17 8 1
18 29 1
19 22 1
20 24 1 # take max from here i.e 29
21 24 1
22 24 1
23 19 1
24 12 1 ⟶ group2 ends here.
25 5 2 ⟶ grp 3 starts here.
26 6 2 # take max from here i.e. 14
27 14 2 ⟶ grp 3 ends here.
28 5 3 ⟶ grp4 starts here. # take max from here i.e. 15
29 15 3 ⟶ grp4 ends here.
这给了我们:
df.groupby(g).max() - 5
rand_nums
rand_nums
0 24
1 24
2 9
3 10
df.loc[79:93].max() - df.loc[79:93].min()
编辑:
index_integer = df.index[df['random_numbers'] == 5] # change 5 for your
max_disp = []
for i in index[:-1]:
max_displ.append(df[index[i]:index[i+1].max() - df[index[i]:index[i+1].mmin())
使用理解列表:
index_integer = df.index[df['random_numbers'] == 5] # change 5 for your number
max_displ = [df[l[i]:l[i+1]].max() - df[l[i]:l[i+1]].min() for i in range(0,len(l[:-1]))]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.