[英]Pandas: splitting data frame based on the slope of data
I have this data frame我有这个数据框
x = pd.DataFrame({'entity':[5,7,5,5,5,6,3,2,0,5]})
Update: I want a function If the slope is negetive and the length of the group is more than 2 then it should return True, index of start and end of the group.更新:我想要一个函数如果斜率为负并且组的长度大于 2 那么它应该返回 True,组的开始和结束的索引。 for this case it should return: result=
True
, index= 5
, index= 8
对于这种情况,它应该返回: result=
True
, index= 5
, index= 8
1- I want to split the data frame based on the slope. 1- 我想根据斜率拆分数据框。 This example should have 6 groups.
这个例子应该有 6 个组。
2- how can I check the length of groups? 2-如何检查组的长度?
I tried to get groups by the below code but I don't know how can split the data frame and how can check the length of each part我试图通过以下代码获取组,但我不知道如何拆分数据框以及如何检查每个部分的长度
New update : Thanks Matt W. for his code.新更新:感谢 Matt W. 的代码。 finally I found the solution.
最后我找到了解决方案。
df = pd.DataFrame({'entity':[5,7,5,5,5,6,3,2,0,5]})
df['diff'] = df.entity.diff().fillna(0)
df.loc[df['diff'] < 0, 'diff'] = -1
init = [0]
for x in df['diff'] == df['diff'].shift(1):
if x:
init.append(init[-1])
else:
init.append(init[-1]+1)
def get_slope(df):
x=np.array(df.iloc[:,0].index)
y=np.array(df.iloc[:,0])
X = x - x.mean()
Y = y - y.mean()
slope = (X.dot(Y)) / (X.dot(X))
return slope
df['g'] = init[1:]
df.groupby('g').apply(get_slope)
Result结果
0 NaN
1 NaN
2 NaN
3 0.0
4 NaN
5 -1.5
6 NaN
Take the difference and bfill()
the start so that you have the same number in the 0th element.取差异并
bfill()
开始,以便您在第 0 个元素中具有相同的数字。 Then turn all negatives the same so we can imitate them being the same "slope".然后把所有的底片都一样,这样我们就可以模仿它们是相同的“斜率”。 Then I shift it to check to see if the next number is the same and iterate through giving us a list of when it changes, assigning that to
g
.然后我将它移动以检查下一个数字是否相同,并通过给我们一个它何时更改的列表进行迭代,并将其分配给
g
。
df = pd.DataFrame({'entity':[5,7,5,5,5,6,3,2,0,5]})
df['diff'] = df.entity.diff().bfill()
df.loc[df['diff'] < 0, 'diff'] = -1
init = [0]
for x in df['diff'] == df['diff'].shift(1):
if x:
init.append(init[-1])
else:
init.append(init[-1]+1)
df['g'] = init[1:]
df
entity diff g
0 5 2.0 1
1 7 2.0 1
2 5 -1.0 2
3 5 0.0 3
4 5 0.0 3
5 6 1.0 4
6 3 -1.0 5
7 2 -1.0 5
8 0 -1.0 5
9 5 5.0 6
Just wanted to present another solution that doesn't require a for-loop:只是想提出另一个不需要 for 循环的解决方案:
df = pd.DataFrame({'entity':[5,7,5,5,5,6,3,2,0,5]})
df['diff'] = df.entity.diff().bfill()
df.loc[diff < 0, 'diff'] = -1
df['g'] = (~(df['diff'] == df['diff'].shift(1))).cumsum()
df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.