简体   繁体   English

检查值是否在两个值之间 pandas

[英]check if values are between two values pandas

I have a two values that are being found in a for loop like so:我在 for 循环中找到了两个值,如下所示:

for i in range(df_zones.shape[0]):

   filter_max = df_labels[df_labels['Labels'] == i].sort_values(by='level').iloc[-1]
   filter_min = df_labels[df_labels['Labels'] == i].sort_values(by='level').iloc[0]

I have another dataframe with 4 columns of measurements with a timeseries index, like so:我有另一个 dataframe 具有 4 列测量值和时间序列索引,如下所示:

DateTime约会时间 meas1测量1 meas2测量2 meas3测量3 meas4测量4
2022-1-1 2022-1-1 1.1 1.1 1.2 1.2 1.3 1.3 1.1 1.1

There are 1000's of rows of data.有1000行数据。

What I am trying to do is have another column that is labeled as 'isZone', where this means, are any of the values in the row between filter_max and filter_min.我想要做的是有另一列标记为“isZone”,这意味着,filter_max 和 filter_min 之间的行中的任何值。

DateTime约会时间 meas1测量1 meas2测量2 meas3测量3 meas4测量4 isZone isZone
2022-1-1 2022-1-1 1.1 1.1 1.5 1.5 1.5 1.5 1.7 1.7 0 0
2022-1-2 2022-1-2 2.2 2.2 1.4 1.4 1.5 1.5 1.7 1.7 0 0
2022-1-3 2022-1-3 3.1 3.1 1.2 1.2 1.3 1.3 1.1 1.1 1 1
2022-1-4 2022-1-4 4.1 4.1 1.2 1.2 1.3 1.3 1.1 1.1 1 1
2022-1-5 2022-1-5 5.1 5.1 1.2 1.2 1.3 1.3 1.1 1.1 1 1

I have read about the pandas between function.我已经阅读了 function 之间的 pandas。 But I really can't figure out how to make this work.但我真的不知道如何使这项工作。 Is there a quicker way to do this in numpy?在 numpy 中是否有更快的方法来执行此操作? any guidance would be appreciated.任何指导将不胜感激。

You can solve this with apply and pandas' between :您可以使用apply和 pandas' between解决这个问题:

df_zones['Flag'] = df_zones.apply(lambda x: 1 if x.between(filter_min,filter_max).any() else 0,axis=1)

How about trying with .T and using a list-comprehension this way?如何尝试使用.T并以这种方式使用列表理解?

df_zones['Flag'] = [1 if df_zones.T[x].between(min_,max_).any() else 0 for x in df_zones.T]

Or without the transposing:或者没有转置:

df_zones['Flag'] = [1 if df_zones.loc[x,:].between(min_,max_).any() else 0 for x in df_zones.index]

The method above took 30 minutes to compute, the below method is done in under 2 seconds.上面的方法需要 30 分钟计算,下面的方法在 2 秒内完成。

In the end, the best method was to append all the items to their own lists, and make a function that combines all the pandas between checks and checks row-wise if there is a one or not.最后,最好的方法是将 append 所有项目添加到自己的列表中,并制作一个 function 组合所有 pandas 是否存在单行检查和检查。

''' '''

    def arrayBoolCheck(arrays):
        df = pd.DataFrame(arrays).T
        df_new = (df.iloc[:, :] == 1).any(axis=1).astype(int)
        return df_new

    isZone1, isZone2, isZone3, isZone4 = [], [], [], [], []
    for i in range(df_zones.shape[0]):

        filter_max = df_labels[df_labels['Labels'] == i].sort_values(by='level').iloc[-1]
        filter_min = df_labels[df_labels['Labels'] == i].sort_values(by='level').iloc[0]
    
        isZone1.append(df_instrument[f"meas1"].between(filter_min, filter_max, inclusive='both').astype(int).values)
        isZone2.append(df_instrument[f"meas2"].between(filter_min, filter_max, inclusive='both').astype(int).values)
        isZone3.append(df_instrument[f"meas3"].between(filter_min, filter_max, inclusive='both').astype(int).values)
        isZone4.append(df_instrument[f"meas4"].between(filter_min, filter_max, inclusive='both').astype(int).values)

# Zone Labels Dataframe
     df = pd.DataFrame(data=[
            self.arrayBoolCheck(np.array(isZone1)),
            self.arrayBoolCheck(np.array(isZone2)),
            self.arrayBoolCheck(np.array(isZone3)),
            self.arrayBoolCheck(np.array(isZone4))],

        index=[f"isZone1",
               f"isZone2",
               f"isZone3",
               f"isZone4"]).T

''' '''

I found that keeping the measurements in their respective columns was better for the analysis.我发现将测量值保存在各自的列中对分析更好。 but the same function could be used to combine them all into one column if needed.但如果需要,可以使用相同的 function 将它们全部组合成一列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM