Need to compare one Pandas (Python) dataframe with values from another dataframe

Question

So I've pulled data from an sql server, and inputted into a dataframe. All the data is of discrete form, and increases in a 0.1 step in one direction (0.0, 0.1, 0.2... 9.8, 9.9, 10.0), with multiple power values for each step (eg 1000, 1412, 134.5, 657.1 at 0.1), (14.5, 948.1, 343.8 at 5.5) - hopefully you see what I'm trying to say.

I've managed to group the data into these individual steps using the following, and have then taken the mean and standard deviation for each group.

group = df.groupby('step').power.mean() group2 = df.groupby('step').power.std().fillna(0)

This results in two data frames (group and group2) which have the mean and standard deviation for each of the 0.1 steps. It's then easy to create an upper and lower limit for each step using the following:

upperlimit = group + 3*group2 lowerlimit = group - 3*group2 lowerlimit[lowerlimit<0] = 0

Now comes the bit I'm confused about! I need to go back into the original dataframe and remove rows/instances where the power value is outside these calculated limits (note there is a different upper and lower limit for each 0.1 step).

Here's 50 lines of the sample data:

Index    Power              Step
0        106.0              5.0
1        200.4              5.5
2        201.4              5.6
3        226.9              5.6
4        206.8              5.6
5        177.5              5.3
6        124.0              4.9
7        121.0              4.8
8         93.9              4.7
9        135.6              5.0
10       211.1              5.6
11       265.2              6.0
12       281.4              6.2
13       417.9              6.9
14       546.0              7.4
15       619.9              7.9
16       404.4              7.1
17       241.4              5.8
18        44.3              3.9
19        72.1              4.6
20        21.1              3.3
21         6.3              2.3
22         0.0              0.8
23         0.0              0.9
24         0.0              3.2
25         0.0              4.6
26        33.3              4.2
27        97.7              4.7
28        91.0              4.7
29       105.6              4.8
30        97.4              4.6
31       126.7              5.0
32       134.3              5.0
33       133.4              5.1
34       301.8              6.3
35       298.5              6.3
36       312.1              6.5
37       505.3              7.5
38       491.8              7.3
39       404.6              6.8
40       324.3              6.6
41       347.2              6.7
42       365.3              6.8
43       279.7              6.3
44       351.4              6.8
45       350.1              6.7
46       573.5              7.9
47       490.1              7.5
48       520.4              7.6
49       548.2              7.9

Answer 1

To put you goal another way, you want to perform some manipulations on grouped data, and then project the results of those manipulations back to the ungrouped rows so you can use them for filtering those rows. One way to do this is with transform :

The transform method returns an object that is indexed the same (same size) as the one being grouped. Thus, the passed transform function should return a result that is the same size as the group chunk.

You can then create the new rows directly:

df['upper'] = df.groupby('step').power.transform(lambda p: p.mean() + 3*p.std().fillna(0))
df['lower'] = df.groupby('step').power.transform(lambda p: p.mean() - 3*p.std().fillna(0))
df.loc[df['lower'] < 0, 'lower'] = 0

And sort accordingly:

df = df[(df.power <= df.upper) & (df.power >= df.lower())]

Need to compare one Pandas (Python) dataframe with values from another dataframe

Question

1 answers

solution1
1 ACCPTED 2016-12-09 16:48:25

Need to compare one Pandas (Python) dataframe with values from another dataframe

Question

1 answers

solution1 1 ACCPTED 2016-12-09 16:48:25

solution1
1 ACCPTED 2016-12-09 16:48:25