Pandas：根据数据的斜率分割数据框

Question

I have this data frame我有这个数据框

x = pd.DataFrame({'entity':[5,7,5,5,5,6,3,2,0,5]})

Update: I want a function If the slope is negetive and the length of the group is more than 2 then it should return True, index of start and end of the group.更新：我想要一个函数如果斜率为负并且组的长度大于 2 那么它应该返回 True，组的开始和结束的索引。 for this case it should return: result= True , index= 5 , index= 8对于这种情况，它应该返回： result= True , index= 5 , index= 8

1- I want to split the data frame based on the slope. 1- 我想根据斜率拆分数据框。 This example should have 6 groups.这个例子应该有 6 个组。

2- how can I check the length of groups? 2-如何检查组的长度？

I tried to get groups by the below code but I don't know how can split the data frame and how can check the length of each part我试图通过以下代码获取组，但我不知道如何拆分数据框以及如何检查每个部分的长度

New update : Thanks Matt W. for his code.新更新：感谢 Matt W. 的代码。 finally I found the solution.最后我找到了解决方案。

df = pd.DataFrame({'entity':[5,7,5,5,5,6,3,2,0,5]})
df['diff'] = df.entity.diff().fillna(0)
df.loc[df['diff'] < 0, 'diff'] = -1

init = [0]
for x in df['diff'] == df['diff'].shift(1):
    if x:
        init.append(init[-1])
    else:
        init.append(init[-1]+1)
def get_slope(df):
    x=np.array(df.iloc[:,0].index)
    y=np.array(df.iloc[:,0])
    X = x - x.mean()
    Y = y - y.mean()
    slope = (X.dot(Y)) / (X.dot(X))
    return slope
df['g'] = init[1:]

df.groupby('g').apply(get_slope)

Result结果

0    NaN
1    NaN
2    NaN
3    0.0
4    NaN
5   -1.5
6    NaN

Answer 1

Take the difference and bfill() the start so that you have the same number in the 0th element.取差异并bfill()开始，以便您在第 0 个元素中具有相同的数字。 Then turn all negatives the same so we can imitate them being the same "slope".然后把所有的底片都一样，这样我们就可以模仿它们是相同的“斜率”。 Then I shift it to check to see if the next number is the same and iterate through giving us a list of when it changes, assigning that to g .然后我将它移动以检查下一个数字是否相同，并通过给我们一个它何时更改的列表进行迭代，并将其分配给g 。

df = pd.DataFrame({'entity':[5,7,5,5,5,6,3,2,0,5]})
df['diff'] = df.entity.diff().bfill()
df.loc[df['diff'] < 0, 'diff'] = -1

init = [0]
for x in df['diff'] == df['diff'].shift(1):
    if x:
        init.append(init[-1])
    else:
        init.append(init[-1]+1)
df['g'] = init[1:]
df
   entity  diff  g
0       5   2.0  1
1       7   2.0  1
2       5  -1.0  2
3       5   0.0  3
4       5   0.0  3
5       6   1.0  4
6       3  -1.0  5
7       2  -1.0  5
8       0  -1.0  5
9       5   5.0  6

Answer 2

Just wanted to present another solution that doesn't require a for-loop:只是想提出另一个不需要 for 循环的解决方案：

df = pd.DataFrame({'entity':[5,7,5,5,5,6,3,2,0,5]})
df['diff'] = df.entity.diff().bfill()
df.loc[diff < 0, 'diff'] = -1
df['g'] = (~(df['diff'] == df['diff'].shift(1))).cumsum()
df

Pandas：根据数据的斜率分割数据框

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-03-13 04:21:46

解决方案2
1 2020-12-07 04:04:02

Pandas：根据数据的斜率分割数据框

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-03-13 04:21:46

解决方案2 1 2020-12-07 04:04:02

解决方案1
1 已采纳 2019-03-13 04:21:46

解决方案2
1 2020-12-07 04:04:02