简体   繁体   English

标记n天之内行是否符合条件

[英]Flag if row meets criteria within n days

In a previous question , @KartikeySingh, got me very close. 在上一个问题中 ,@ KartikeySingh使我非常接近。 But I need to refine it further by flagging only the positive inflow when at least 90% of it flows out (negative flow) within a 5 day period. 但是我需要通过在5天的时间内至少有90%的流量流出(负流量)时仅标记正流入来进一步完善它。 So in the example below, index 4 and 5 should not get flagged, but index items 7, 10, 17 and 19 should get flagged because the inflows and outflows meet those parameters. 因此,在下面的示例中,不应标记索引4和5,但是应该标记索引项7、10、17和19,因为流入和流出满足这些参数。 So how would I flag only the inflow and outflows that are within 90% (negative value) of each other and the outflow occurs within 5 days of the inflow. 因此,如何仅标记彼此相差90%以内(负值)的流入和流出,以及流出发生在流入的5天内。

stream = [2, 0, 1, 0, 3, 2, 100, 0, 0, -95, 3, 0, 2, -1, 0, 2, 93, -2, -89]
date = [
'2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04', '2019-01-05',
'2019-01-06', '2019-01-07', '2019-01-08', '2019-01-09', '2019-01-10',
'2019-01-11', '2019-01-13', '2019-01-14', '2019-01-15', '2019-01-16',
'2019-01-17', '2019-01-18', '2019-01-19', '2019-01-20'
]

df = pd.DataFrame({'date': date, 'stream': stream})

def process(row):
if row['stream'] > 20*row['stream_mean']:
    return 1
else:
    return 0
df['stream_mean'] = df['stream'].rolling(5).mean()
df['stream_mean'] = df['stream_mean'].shift(periods=1)
df['flag'] = df.apply(process,axis=1)
df

The code above flags all incoming flows regardless of the outflow criteria. 上面的代码标记所有传入流,而不管流出标准如何。

The solution to this problem is easier with .loc , you can use the following code,comments in the code are explaining the logic: 使用.loc可以更轻松地解决此问题,您可以使用以下代码,代码中的注释说明了逻辑:

This part is just copy of your code from the question: 这部分只是问题代码的副本:

import pandas as pd
import numpy as np
stream = [2, 0, 1, 0, 3, 2, 100, 0, 0, -95, 3, 0, 2, -1, 0, 2, 93, -2, -89]
date = [
'2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04', '2019-01-05',
'2019-01-06', '2019-01-07', '2019-01-08', '2019-01-09', '2019-01-10',
'2019-01-11', '2019-01-13', '2019-01-14', '2019-01-15', '2019-01-16',
'2019-01-17', '2019-01-18', '2019-01-19', '2019-01-20'
]
df = pd.DataFrame({'date': date, 'stream': stream})

This is the code that solves the problem: 这是解决问题的代码:

p_list=[n for n in df.stream if n >0] # we are getting positive values from stream column
p_mean=sum(p_list)/len(p_list) # finding mean (as unusual value threshold) 
n_list=[n for n in df.stream if n <0] #similarly getting negative values
n_mean=sum(n_list)/len(n_list) #finding threshold on the negative side.

After getting the threshold values(do see that you can manually set these values if you like this is just an attempt at automating the entire process). 在获得阈值后(请注意,如果您愿意,可以手动设置这些值,这只是使整个过程自动化的一种尝试)。

p_flags=df.index[(df.stream > p_mean) &((df.stream.shift(-1) <=-.9*df.stream ) | (df.stream.shift(-2)<=-0.9*df.stream
                                                                      )|(df.stream.shift(-3)<=-0.9*df.stream
                                                                        )|(df.stream.shift(-4)<=-0.9*df.stream
                                                                          )|(df.stream.shift(-5)<=-0.9*df.stream
                                                                            ))]

what this will do is return the index of the rows with matching criteria, the logic of the criteria is very simply that we are checking if a row is greater than the p_mean value, if it is then we check if any of the next five rows have value that is less than 90% of such a value (that is outflow) | 这将返回具有匹配条件的行的索引,条件的逻辑非常简单,我们将检查一行是否大于p_mean值,如果是,则检查接下来的五行中是否有行的值小于该值的90%(即流出) | operator means or so any outflow in the next 5 will mean the part returns true. 运算子的意思是,接下来5个中的任何流出都将表示该部分返回true。

in order to find the negative flags index we have a similar thing only in revers: 为了找到负标记索引,我们只有在反转时才有类似的东西:

n_flags=df.index[(df.stream < n_mean) &((df.stream.shift(1) >=0.9*df.stream ) | (df.stream.shift(2)>=0.9*df.stream
                                                                  )|(df.stream.shift(3)>=0.9*df.stream
                                                                    )|(df.stream.shift(4)>=0.9*df.stream
                                                                      )|(df.stream.shift(5)>=0.9*df.stream
                                                                        ))]

Now you have the indices if the rows that matches the criteria in order to add this as a column in the dataframe simply do a: 现在,如果将符合条件的行添加到数据框中的列中,那么您将拥有索引,只需执行以下操作即可:

flags=np.zeros(len(df))
for i1,i2 in zip(n_flags,p_flags):
    flags[i1]=1
    flags[i2]=1

df["flags"]=flags  
print(df)

The output will be: 输出将是:

    date    stream  flags
0   2019-01-01  2   0.0
1   2019-01-02  0   0.0
2   2019-01-03  1   0.0
3   2019-01-04  0   0.0
4   2019-01-05  3   0.0
5   2019-01-06  2   0.0
6   2019-01-07  100 1.0
7   2019-01-08  0   0.0
8   2019-01-09  0   0.0
9   2019-01-10  -95 1.0
10  2019-01-11  3   0.0
11  2019-01-13  0   0.0
12  2019-01-14  2   0.0
13  2019-01-15  -1  0.0
14  2019-01-16  0   0.0
15  2019-01-17  2   0.0
16  2019-01-18  93  1.0
17  2019-01-19  -2  0.0
18  2019-01-20  -89 1.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在下一行符合条件后仅保留一行 - keep only row right after subsequent row meets criteria 计算一个值在滚动窗口内满足条件的次数 - Count the number of times a value meets a criteria within a rolling window 如何遍历行中的列以查找满足某些条件的第一个列 - How to iterate through columns in a row to find the first that meets some criteria 如何查找 Pandas 中每一行的哪一列首先满足条件? - How to find which column meets a criteria first for each row in Pandas? Python Pandas - 查找满足每行条件的值 - Python Pandas - Find value that meets criteria for each row 当行元素符合条件时查找列名 - Find column names when row element meets a criteria Pandas 如何在满足某些条件的列表中找到第 n 个连续值的索引? - How can I find the index of the n-th sequential value in a list that meets some criteria? Python Pandas:如果groupby中任何前面的行中的值满足特定条件,则从数据框中删除一行 - Python Pandas: Eliminate a row from a dataframe if a value in a any preceding row in a groupby meets a certain criteria 将过去N天的日期行转换为列 - Converting date row to column for last N days 从 SQLite 数据库中选择一些唯一的行,其中每一行都满足单独的条件 - Selecting some unique rows from an SQLite database, where each row meets a separate criteria
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM