简体   繁体   中英

Remove outliers in Pandas dataframe with groupby

I have a dataframe of Report Date, Time Interval and Total Volume for a full year. I would like to be able to remove outliers within each Time Interval.

This is as far as I've been able to get...


    Report Date Time Interval   Total Volume
5784    2016-03-01  24  467.0
5785    2016-03-01  25  580.0
5786    2016-03-01  26  716.0
5787    2016-03-01  27  803.0
5788    2016-03-01  28  941.0

So i calculate the quantile's

low = .05
high = .95
dfq = dft.groupby(['Time Interval']).quantile([low, high])

                    Total Volume
Time Interval                   
24            0.05        420.15
              0.95        517.00
25            0.05        521.90
              0.95        653.55
26            0.05        662.75

And then I'd like to be able to use them to remove outliers within each Time Interval using something like this...

dft = dft.apply(lambda x: x[(x>dfq.loc[low,x.name]) & (x < dfq.loc[high,x.name])], axis=0)

Any pointers/advice much appreciated.

One way is to filter out as follows:

In [11]: res = df.groupby("Date")["Interval"].quantile([0.05, 0.95]).unstack(level=1)

In [12]: res
             0.05   0.95
2016-03-01  489.6  913.4

Now we can lookup these values for each row using loc and filter:

In [13]: (res.loc[df.Date, 0.05] < df.Interval.values) & (df.Interval.values < res.loc[df.Date, 0.95])
2016-03-01    False
2016-03-01     True
2016-03-01     True
2016-03-01     True
2016-03-01    False
dtype: bool

In [14]: df.loc[((res.loc[df.Date, 0.05] < df.Interval.values) & (df.Interval.values < res.loc[df.Date, 0.95])).values]
   Report        Date  Time  Interval  Total Volume
1    5785  2016-03-01    25     580.0           NaN
2    5786  2016-03-01    26     716.0           NaN
3    5787  2016-03-01    27     803.0           NaN

Note: grouping by 'Time Interval' will work the same, but in your example doesn't filter any rows!

      transform(lambda x : (x<x.quantile(0.95))&(x>(x.quantile(0.05)))).eq(1)]
      ReportDate  TimeInterval  TotalVolume
5785  2016-03-01            25        580.0
5786  2016-03-01            26        716.0
5787  2016-03-01            27        803.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM