标记n天之内行是否符合条件

Question

In a previous question , @KartikeySingh, got me very close. 在上一个问题中，@ KartikeySingh使我非常接近。 But I need to refine it further by flagging only the positive inflow when at least 90% of it flows out (negative flow) within a 5 day period. 但是我需要通过在5天的时间内至少有90％的流量流出（负流量）时仅标记正流入来进一步完善它。 So in the example below, index 4 and 5 should not get flagged, but index items 7, 10, 17 and 19 should get flagged because the inflows and outflows meet those parameters. 因此，在下面的示例中，不应标记索引4和5，但是应该标记索引项7、10、17和19，因为流入和流出满足这些参数。 So how would I flag only the inflow and outflows that are within 90% (negative value) of each other and the outflow occurs within 5 days of the inflow. 因此，如何仅标记彼此相差90％以内（负值）的流入和流出，以及流出发生在流入的5天内。

stream = [2, 0, 1, 0, 3, 2, 100, 0, 0, -95, 3, 0, 2, -1, 0, 2, 93, -2, -89]
date = [
'2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04', '2019-01-05',
'2019-01-06', '2019-01-07', '2019-01-08', '2019-01-09', '2019-01-10',
'2019-01-11', '2019-01-13', '2019-01-14', '2019-01-15', '2019-01-16',
'2019-01-17', '2019-01-18', '2019-01-19', '2019-01-20'
]

df = pd.DataFrame({'date': date, 'stream': stream})

def process(row):
if row['stream'] > 20*row['stream_mean']:
    return 1
else:
    return 0
df['stream_mean'] = df['stream'].rolling(5).mean()
df['stream_mean'] = df['stream_mean'].shift(periods=1)
df['flag'] = df.apply(process,axis=1)
df

The code above flags all incoming flows regardless of the outflow criteria. 上面的代码标记所有传入流，而不管流出标准如何。

Answer 1

The solution to this problem is easier with .loc , you can use the following code,comments in the code are explaining the logic: 使用.loc可以更轻松地解决此问题，您可以使用以下代码，代码中的注释说明了逻辑：

This part is just copy of your code from the question: 这部分只是问题代码的副本：

import pandas as pd
import numpy as np
stream = [2, 0, 1, 0, 3, 2, 100, 0, 0, -95, 3, 0, 2, -1, 0, 2, 93, -2, -89]
date = [
'2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04', '2019-01-05',
'2019-01-06', '2019-01-07', '2019-01-08', '2019-01-09', '2019-01-10',
'2019-01-11', '2019-01-13', '2019-01-14', '2019-01-15', '2019-01-16',
'2019-01-17', '2019-01-18', '2019-01-19', '2019-01-20'
]
df = pd.DataFrame({'date': date, 'stream': stream})

This is the code that solves the problem: 这是解决问题的代码：

p_list=[n for n in df.stream if n >0] # we are getting positive values from stream column
p_mean=sum(p_list)/len(p_list) # finding mean (as unusual value threshold) 
n_list=[n for n in df.stream if n <0] #similarly getting negative values
n_mean=sum(n_list)/len(n_list) #finding threshold on the negative side.

After getting the threshold values(do see that you can manually set these values if you like this is just an attempt at automating the entire process). 在获得阈值后（请注意，如果您愿意，可以手动设置这些值，这只是使整个过程自动化的一种尝试）。

p_flags=df.index[(df.stream > p_mean) &((df.stream.shift(-1) <=-.9*df.stream ) | (df.stream.shift(-2)<=-0.9*df.stream
                                                                      )|(df.stream.shift(-3)<=-0.9*df.stream
                                                                        )|(df.stream.shift(-4)<=-0.9*df.stream
                                                                          )|(df.stream.shift(-5)<=-0.9*df.stream
                                                                            ))]

what this will do is return the index of the rows with matching criteria, the logic of the criteria is very simply that we are checking if a row is greater than the p_mean value, if it is then we check if any of the next five rows have value that is less than 90% of such a value (that is outflow) | 这将返回具有匹配条件的行的索引，条件的逻辑非常简单，我们将检查一行是否大于p_mean值，如果是，则检查接下来的五行中是否有行的值小于该值的90％（即流出） | operator means or so any outflow in the next 5 will mean the part returns true. 运算子的意思是，接下来5个中的任何流出都将表示该部分返回true。

in order to find the negative flags index we have a similar thing only in revers: 为了找到负标记索引，我们只有在反转时才有类似的东西：

n_flags=df.index[(df.stream < n_mean) &((df.stream.shift(1) >=0.9*df.stream ) | (df.stream.shift(2)>=0.9*df.stream
                                                                  )|(df.stream.shift(3)>=0.9*df.stream
                                                                    )|(df.stream.shift(4)>=0.9*df.stream
                                                                      )|(df.stream.shift(5)>=0.9*df.stream
                                                                        ))]

Now you have the indices if the rows that matches the criteria in order to add this as a column in the dataframe simply do a: 现在，如果将符合条件的行添加到数据框中的列中，那么您将拥有索引，只需执行以下操作即可：

flags=np.zeros(len(df))
for i1,i2 in zip(n_flags,p_flags):
    flags[i1]=1
    flags[i2]=1

df["flags"]=flags  
print(df)

The output will be: 输出将是：

    date    stream  flags
0   2019-01-01  2   0.0
1   2019-01-02  0   0.0
2   2019-01-03  1   0.0
3   2019-01-04  0   0.0
4   2019-01-05  3   0.0
5   2019-01-06  2   0.0
6   2019-01-07  100 1.0
7   2019-01-08  0   0.0
8   2019-01-09  0   0.0
9   2019-01-10  -95 1.0
10  2019-01-11  3   0.0
11  2019-01-13  0   0.0
12  2019-01-14  2   0.0
13  2019-01-15  -1  0.0
14  2019-01-16  0   0.0
15  2019-01-17  2   0.0
16  2019-01-18  93  1.0
17  2019-01-19  -2  0.0
18  2019-01-20  -89 1.0

标记n天之内行是否符合条件

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-03-02 22:17:01

标记n天之内行是否符合条件

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-03-02 22:17:01

解决方案1
1 已采纳 2019-03-02 22:17:01