简体   繁体   中英

Python/Pandas: Aggregate with multiple function variables (columns) as soon as an intervall condition is met

I have a DataFrame with segments,timestamps and different columns

Segment    Timestamp     Value1    Value2    Value2_mean 
0          2018-11...    180       156       135
0                        170       140       135
0                                            135
1
1
...

I want to aggregate/group this DataFrame with 'Segment' and get the first Timestamp for a segment as soon as this intervall condition is met and then the time intervall in seconds for this segment. Because there are more values for a function, aggregate does not work I think.

value2_mean-std(value2) <= value1 <= value2_mean+std(value2)

It should look like this:

Segment    Intervall[s]
0          10
1          19
2          6
3          ...

I tried something like this:

grouped = dataSeg.groupby(['Segment'])

def grouping(df)

    a = np.array(df['Value_1'])
    b = np.array(df['Value2'])
    c = np.array(df['Value2_mean'])
    d = np.array(df['Timestamp'])

    for x in a:
        categories = np.logical_and(
            (c-np.std(b)<= x),
            (c+np.std(b)>= x))

        if np.any(categories):
            return d[categories]-d[0]

grouped.apply(grouping)

This does not work the way I want it to. Any suggestions would be appreciated!

Something like this? I didn't test it thoroughly.

    def calc(grp): 

        if grp.Value1.sub(grp.Value2_mean).abs().lt(grp.Value2.std()).any(): 
              return grp["Timestamp"].iloc[-1] - grp["Timestamp"].iloc[0] 
        return np.nan 


    df.groupby("Segment").apply(calc)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM